facet by facet count

Google refine offers the possibilities to facet by name or choice count. This can be useful to focus an analysis or transformation only on value having more than twenty records for example.

Watch the video tutorial:

A graphical interface exists to set up easily your selection and is available from the Text facet:

From there you can either facet by name or by count. Faceting by count will present at the top value with the more count. At the bottom of the list of value google refine offers the option to Facet by choice count. This open the graphical interface where you can set your selection. In this example we display all last name having between 15 and 26 records.

However if the count array is to large, the graphical interface does not allow precise selection. In the below screenshot the smallest selection we can do is between 0 and 5.

The way to goes around this limitation is to use the grel expression facetCount. Create a new custom facet and update the expression with:
  • facetCount(value, "value", "last_name")  > 3 for all the records between 0 and 3 choice count
  • facetCount(value, "value", "last_name")  == 2 for all the records with exactly 2 choice count
Of course, don't forget to update your column name instead of last_name. In both case google refine will return a true / false answer.