This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

23.10.11

Fetch City and Province / State based on the postal code


In the US, Canada and UK postal code are pretty good code to retrieve information on a location. In this tutorial we will use the yahoo place finder API to add geographical content to a data set based on the postal code. This tutorial can be easily turned around and used to run a query based on a  latitude and longitude (see the end of this post).

19.10.11

Reconcile against open corporates database

Here is a great video tutorial on reconciliation. It also introduce Open Corporates, an reconciliation source that contains more than 26 millions companies across 31 jurisdiction.

18.10.11

Parse mark up language (JSON, html, xml ...)


In this tutorial we will see how to parse mark up language like JSON, html or xml. Those language are great to parse because there is often an easily identifiable markup right before or after the content you want to extract.  In this tutorial we will use a JSON language and extract relevant information by following a six steps process.

On a similar topic:



Starts or ends with a number

This is a quick and dirty tips to facet cells starting or ending with a number. Regex will be much cleaner for this but unfortunately the grel expression startsWith and endsWith does not support regex :-(

13.10.11

Update phone number format

This post is a quick adaptation to phone number based on the method presented in the add a space to postal code (splitByLength and Merge function).

Extract number from a string

To extract a string of number with a particular length (for example a string of 3 numbers) from a cells we will use the expression match and regex language. This is an easy four steps process

5.10.11

Extract from twitter hastag and reference


This case has been brought to me by cosmin who wanted to extract hastag from tweets for some analysis and data visualization. Data have been gather using ScraperWiki and their ability to scrap twitter data into one single document (see the video tutorial).

4.10.11

Video tutorial to clean up your dataset (by free your metadata)

A great video tutorial from free your metadata which show you how to: