27.10.15

Using facet to cluster a subset of your data when you have a large number of rows in OpenRefine?


You sometimes when working in OpenRefine have to deal with a large data set that refer to the same entity (person, city, book or any other entries) but using different spelling. When working with a large data set the clustering function can become unresponsive, due to the amount of computing done in order to run the different algorithms.


In this tutorial we will see how you can create subset of records to cluster using facets to better manage the compute load on your machine. If you want to cluster your full data set, you just need to run the cluster function on the different subset created with your facet.


6.10.15

Online OpenRefine Foundation Course Now Available


Learn the basics of data science with the new OpenRefine Foundation course available on the tranzform course platform.

The OpenRefine Foundation course is a progressive program that provides structure and direction for students new to data cleaning and preparation. Each lesson comes with a comprehensive overview of its goals and content, five video tutorials, and a lab to put your learning into practice. The course ends with a quiz to test your knowledge.