5.12.15

How to check if value within a record are unique in OpenRefine

When working in record mode, it is possible to compare all the value within one record and see if those are unique. To do that we will introduce the function forEach() that let turn the record into an array and count unique value.



GREL Expression

First, for this tutorial you need to be in record mode. On the column you want to check 

forEach(row.record.cells['column_name'].value,v,v).uniques().length()

where:
  • forEach(row.record.cells['column_name'].value,v,v) create an array for each record.
  • uniques() remove all duplicate within the array
  • length() count how many element there is in the array
 In the result of the expression:
  • 0 means there is no value in the record
  • 1 means only one value have been assigned per person in the full record, this the value is unique.
  • 2 or more means more than one value have been assigned and therefore the value are not unique (the number indicate how many unique value is present)

Example:

In the mock example below we want to check if one person have been assigned different gender (M, F or U for Unknown). We have created a new column named Record Check using the expression:

forEach(row.record.cells['gender'].value,v,v).uniques().length()




Note that instead of a new column you can create a custom facet to easily explore the results. 


This post was

0 comments:

Post a Comment