5.12.15

How to check if value within a record are unique in OpenRefine

When working in record mode, it is possible to compare all the value within one record and see if those are unique. To do that we will introduce the function forEach() that let turn the record into an array and count unique value.



GREL Expression

First, for this tutorial you need to be in record mode. On the column, you want to check 

forEach(row.record.cells[columnName].value,v,v).uniques().length()

where:
  • forEach(row.record.cells['column_name'].value,v,v) create an array for each record.
  • uniques() remove all duplicate within the array
  • length() count how many elements there is in the array
 In the result of the expression:
  • 0 means there is no value in the record
  • 1 means only one value has been assigned per person in the full record, this value is unique.
  • 2 or more means more than one value has been assigned and therefore the value is not unique (the number indicating how many unique values is present)

Example:

In the mock example below we want to check if one person have been assigned different gender (M, F or U for Unknown). We have created a new column named Record Check using the expression:

forEach(row.record.cells['gender'].value,v,v).uniques().length()




Note that instead of a new column you can create a custom facet to easily explore the results. 


0 comments:

Post a Comment