This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

Showing posts with label duplicate. Show all posts
Showing posts with label duplicate. Show all posts

27.10.15

Using facet to cluster a subset of your data when you have a large number of rows in OpenRefine?


You sometimes when working in OpenRefine have to deal with a large data set that refer to the same entity (person, city, book or any other entries) but using different spelling. When working with a large data set the clustering function can become unresponsive, due to the amount of computing done in order to run the different algorithms.


In this tutorial we will see how you can create subset of records to cluster using facets to better manage the compute load on your machine. If you want to cluster your full data set, you just need to run the cluster function on the different subset created with your facet.


6.11.12

Chit Chat with New Datasets – Facets in OpenRefine (Was /Google Refine/) (from around the web)

A good review of faceting capability including text, numeric, timeline customized and scatterplot facet.

Read the full article 

26.4.12

Data exploration tutorial with google refine

Recently, Hugh Stimson published a great article: Data Mining My Old Radio Playlists. His post mix tutorials on php scripting, data cleaning with google refine and data analysis with PostgreSQL.

This answer post demonstrate that data analysis is fully doable in google refine using really basic function (I'll be using GREL function only once for the long tail analysis). I guess also this post is a good illustration of my previous post on data exploration using google refine.

15.8.11

Remove duplicate rows

This is a quick tutorial to remove duplicate rows or records based on one field. This turial is adapated (add screenshot) from David Huynh answer on the google refine mailing list.

18.7.11

Compare values from two columns

To compare strings from 2 differents column and present the results in a third one, use the following expression: