This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.


Limitation when splitting and joining multi-valued cells

Split multi-valued cells function helps to transpose data stored in one cells into multiple rows, while keeping the relationship with the other columns in the data set. In this article we will see some of the limitation of the function when splitting and joining back a data set and how you can work around it.


Using facet to cluster a subset of your data when you have a large number of rows in OpenRefine?

You sometimes when working in OpenRefine have to deal with a large data set that refer to the same entity (person, city, book or any other entries) but using different spelling. When working with a large data set the clustering function can become unresponsive, due to the amount of computing done in order to run the different algorithms.

In this tutorial we will see how you can create subset of records to cluster using facets to better manage the compute load on your machine. If you want to cluster your full data set, you just need to run the cluster function on the different subset created with your facet.


Online OpenRefine Foundation Course Now Available

Learn the basics of data science with the new OpenRefine Foundation course available on Big Data university. The OpenRefine Foundation course is a progressive program that provides structure and direction for students new to data cleaning and preparation. Each lesson comes with a comprehensive overview of its goals and content, five video tutorials, and a lab to put your learning into practice. The course ends with a quiz to test your knowledge.


Exploring Toronto 311 calls Part1: Why people are calling?

Toronto 311Service Request - Customer Initiated contains information on on customer initiated service requests received by the city of Toronto for Solid Waste Management, Transportation Services, Toronto Water, Municipal Licensing & Standards, and Urban Forestry related request.

In the April session of the Toronto OpenRefine Meetup we took an hour to explore the data set and prepare it to map in using Google Fusion Table. For this example we will use only the 2015 calls available to this date:  from January 1st to March 31. Download it from here:

The 311 Service Request - Customer Initiated data set have three fields:
In this first article we will explore the relationship between the type of request and its creation date.


[Video] Introduction to GREL

In partnership with the OKFN School of Data , we recorded last November an hour Skillshare on Refine GREL Language.  Always wonder how to get started with the Generic Refine Expression Language (GREL)?

Watch this one hour tutorial and learn the basic of the language along with simple expression. Further tutorial will be published to go through each functions presented in this video.


Parsing and extracting HTML tag and links in Refine

I recently helped someone on stackoverflow to parse and extract information from an HTML page.  Refine with GREL offer multiple ways to select specific element and contant. This article will review the main functions and specific use cases to illustrate when to use them.


Increase Set Facet Choice Count

The default maximum number of values displayed in a facet is 2,000. This limitation has been set to prevent Refine to slow down when working with large datasets.

You can override this parameter by changing this value. Click on Set Choice Count Limit and define the new maximum (Refine prompt automatically a value matching the maximum number of value is your current facet).