This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

24.2.12

Tutorial: From pdf to searchable, sortable table

Selecting a string within a cell using smartSplit

The function smartSplit is a variation on split function that allow you to split the cell content based on any string of character and then select the leg you want to work on. This function is very useful to extract or remove string within cells without creating multiple columns and then merging them back.

20.2.12

Google Refine for Investigative Journalism

http://dannguyen.github.com/NICAR-Google-Refine/

Good introduction to grefine to navigate and clean data.

16.2.12

Count how often a character occurs in a cell

Did you know that Refine can count how often an string or character appears in a cell?

To achieve this, I first recommend that you store the count result in a separate column (so you do not write over your initial content). Select your reference column (where you want to do the count per cells) and create a new column based on this column. An other option is to store the result in a custom text facet.

We will use the Grel expression value.split(" ").length().

However if the cells does not contains the value Refine will still return '1'. I found two ways to work around this issue.

15.2.12

Google Refine tips

 Google Refine tips

Google Refine is currently the best free software tool for cleaning up messy data. It's perfect to correct unescaped HTML strings, catch an odd typo or fetch additional data about entities from Freebase.We use it extensively at Zemanta to clean up and reconcile customer's datasets before importing ...

A video tutorial to parse JSON string

A video tutorial to parse JSON string

This tutorial explain how to populate species pages in the BDRS using Google refine. A JSON string is generated from a souce, parsed and cleaned in Google refine and exported back in JSON format.

13.2.12

How to: convert easting/northing into lat/long for an interactive map

How to: convert easting/northing into lat/long for an interactive map 
Google Fusion Tables is great for creating interactive maps from a spreadsheet – but it isn't too keen on easting and northing. That can be a problem as many government and local authority datasets use easting and northing to describe the geographical position of things – for example, speed ...

7.2.12

Data Clustering With The Google.

Data Clustering With The Google.


Nice introduction starts slide 12.
Presentation by Bob Lannon Senior NLP Analyst, Verilogue

6.2.12

Create a project based on a url (xml)

This video tutorials show how to create a project in google refine 2.5 based on a online xml file. The full tutorial is available here.

4.2.12

Visualisation on Top 100 Chemical Companies with Google Refine and Google Fusion Table.

Visualisation on Top 100 Chemical Companies with Google Refine and Google Fusion Table.

On 29th November, Plant Life team (3 Developers Hackers and 4 Journalists Hacks) managed to implement a visualisation task on top 100 Chemical Companies in less than 7 hours and won the runner up prize for the first RBI Hacks and Hacker Day eventThe result look like those on ...


1.2.12

Free Your Metadata : a Concrete Action Plan

Free Your Metadata : a Concrete Action Plan

1h13 tutorial of by Free Your Metadata @ Columbia University

Visualizing French Tax Data using grefine and tableau

Visualizing French Tax Data using grefine and tableau

A nice tutorial mixing methodology and concrete action to gather, clean, harmonized, merge and visualize data (through tableau software)