This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

28.4.12

Field format change accidentally to Number and how to add leading 0

By inadvertence one can transform quickly a field containing number in a text format to number format. This mainly happen during the project creation (import) or when creating new column. This conversion to number can lead to a loss of data like leading 0. Here is how to get them back and avoid this to happen again.


26.4.12

Data exploration tutorial with google refine

Recently, Hugh Stimson published a great article: Data Mining My Old Radio Playlists. His post mix tutorials on php scripting, data cleaning with google refine and data analysis with PostgreSQL.

This answer post demonstrate that data analysis is fully doable in google refine using really basic function (I'll be using GREL function only once for the long tail analysis). I guess also this post is a good illustration of my previous post on data exploration using google refine.

25.4.12

Data-Mining My Old Radio Playlists (from around the web)




Posted: 24 Apr 2012 07:00 AM PDT
An example of web scraping and data analysis using google refine. In this tutorial, the cluster function is used to clean up the data set. The analysis part could also have been done in google refine using the facet option.
You might be interested to read also the Data Exploration Tutorial with OpenRefine that show, based on the same database, how to use OpenRefine to analyse the data (and not only clean them).

11.4.12

How to enhance your data set with freebase and google refine.The Lawrence Collection example.


The National Library of Ireland used google refine to improve the access to the Lawrence Collection (photography collection) by using freebase reconciliation service to map where pictures have been taken!

Using Google Refine to clean mortgage data (from around the web)


Using Google Refine to clean mortgage data (from around the web)

Posted: 10 Apr 2012 07:30 AM PDT
A nice tutorial explaining how to clean and facet data. This example is based on bank mortage data.

10.4.12

Fusion Table, map multiple items with the same location


When you want to map multiple items with the same location in Fusion Table, only one item is displayed and all the others are ignored. There is several workaround to this  major limitation, and the most common is to change slightly your coordinate (longitude / latitude) so your point will appear close to each other on the map (tip from the google fusion team itself). 

When working with large data set, identifying and manually correcting all records sharing the same location can become time consuming. So I've been looking how to deal with this in Google refine and ends up with this straight forward process.

8.4.12

Social Interest Positioning – Visualising Facebook Friends’ Likes With Data Grab

Social Interest Positioning – Visualising Facebook Friends' Likes With Data Grab...



Complete tuturial including, cleaning the data with grefine and visualization with Gephi.