This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

1.12.11

add extra rows / records in google refine

Google refine does not offer a native way to add rows in a project. As said in the discussion list, the purpose of google refine is not to be a spreadsheet editor and should not be consider for records creation but more analysis and mass edit.

29.11.11

facet by facet count


Google refine offers the possibilities to facet by name or choice count. This can be useful to focus an analysis or transformation only on value having more than twenty records for example.

Sort facet by name using toTitlecase(value) expression


When using the text facet option, google refine, sort all available value either by choice count or by name. When sorting by name, all values will be sorted by number first and then by alphabetical order (capital first and then lower case) into something like this:

28.11.11

Google refine 2.5 RC 2 overview

The 2.5 release candidate is available since Nov 18 and give some interesting insight of what's coming next for google refine. Main change have been done on the welcome interface with three screens to create, open or import an project and a project creation process deeply reworked.

December 12 update
the final release of google refine 2.5 is out. Have fun!

3.11.11

Replace month name by month number

An other quick JSON code to cut and past in your project and gain some time. This one replace all English month by their number so January becomes 01 ; February 02 and so on. It is based on the function replace and will avoid you to repeat the same action 12 times. Here a simple replace in bloc note (to update your column name) and apply and the trick is done.

23.10.11

Fetch City and Province / State based on the postal code


In the US, Canada and UK postal code are pretty good code to retrieve information on a location. In this tutorial we will use the yahoo place finder API to add geographical content to a data set based on the postal code. This tutorial can be easily turned around and used to run a query based on a  latitude and longitude (see the end of this post).

19.10.11

Reconcile against open corporates database

Here is a great video tutorial on reconciliation. It also introduce Open Corporates, an reconciliation source that contains more than 26 millions companies across 31 jurisdiction.

18.10.11

Parse mark up language (JSON, html, xml ...)


In this tutorial we will see how to parse mark up language like JSON, html or xml. Those language are great to parse because there is often an easily identifiable markup right before or after the content you want to extract.  In this tutorial we will use a JSON language and extract relevant information by following a six steps process.

On a similar topic:



Starts or ends with a number

This is a quick and dirty tips to facet cells starting or ending with a number. Regex will be much cleaner for this but unfortunately the grel expression startsWith and endsWith does not support regex :-(

13.10.11

Update phone number format

This post is a quick adaptation to phone number based on the method presented in the add a space to postal code (splitByLength and Merge function).

Extract number from a string

To extract a string of number with a particular length (for example a string of 3 numbers) from a cells we will use the expression match and regex language. This is an easy four steps process

5.10.11

Extract from twitter hastag and reference


This case has been brought to me by cosmin who wanted to extract hastag from tweets for some analysis and data visualization. Data have been gather using ScraperWiki and their ability to scrap twitter data into one single document (see the video tutorial).

4.10.11

Video tutorial to clean up your dataset (by free your metadata)

A great video tutorial from free your metadata which show you how to:

26.9.11

Transpose columns across rows (grefine 2.5)


This article was previously name JSON code to transpose important number of columns. With Google Refine 2.5, the transpose function have been remodel with nice usability improvement. If you are using google refine 2.1 or an earlier version please refer to this article.

JSON code to transpose important number of columns (grefine 2.1)

The code below transpose Column 1 to 19 into one single column. One can quickly edit and adapt this code to your project, open it in NotePade, and call the replace function (keyboard shortcut: ctrl+H). Then replace the term "column" by your column name.

Action describes in this is post are valid only for google refine 2.1 and previous version. If you are using google refine 2.5, please refer to the transpose column across rows article.

19.9.11

Use google refine to navigate data solely (facet, filter, flag)

Google refine can also be used to solely navigate and explore data without editing them. I find it very useful to explore large data-set thanks to its good user interface, this avoid to develop a specific one to visualize data. In this post I will present the three mains option to interact with the data: facet, text filer and flags.

18.9.11

Google Refine 2.0 Training video

In this video you will learn to:

8.9.11

countif in google refine with facetCount

Countif is an expression in Excel that count every time a value appears in a determine zone of your spreadsheet. Google refine support the same function to count every time a value appears in a column.

1.9.11

vlookup in google refine

vlookup is a very useful formula in Excel and it has is equivalent in google refine under the name cell.cross. Using this expression you can import data from a google refine project to an other based on a key value.

19.8.11

Merge records spread on multiple rows

This article will go through steps to gather on the same rows content spread on two rows. This method can be extend when a single records is hosted on multiple rows.

15.8.11

Remove duplicate rows

This is a quick tutorial to remove duplicate rows or records based on one field. This turial is adapated (add screenshot) from David Huynh answer on the google refine mailing list.

28.7.11

remove " (quotation) mark

Hard time removing the " (quote sign) from your expression. Instead of quoting your quote mark with double quote, do it with simple like this :

22.7.11

Remove or replace a specific character in a column

You want to remove a space or a specific character from your column like the sign # before some number.

21.7.11

merge 2 columns that have both blank cells

The basic function to merge or concatenate data from two different column is the following:
cells["col1"].value + " " + cells["col2"].value




19.7.11

Add a space to postal code (splitByLength and Merge function)

This short tips explains how to convert postal code store on 6 characters to 7 by adding a space after 3 digits. We will use splitByLength (see related video) and merge multiple column into one functions.

18.7.11

Compare values from two columns

To compare strings from 2 differents column and present the results in a third one, use the following expression:

5.7.11

Move data from a column to an other

Some data are messy and column can be a mix of different type of data. For example if you work on an address database, you might have a ZIP code column and a State column. And some ZIP code can be stored in the State Column.

2.7.11

Faceting with Freebase Gridworks

Freebase is the name of google refine before google took control of the solution. The two following videos present how to facet (filter) in google refine. The interface and options did not evolve too much over the time and version making those videos still up to date.

29.6.11

Split cell content into multiple column, non fixed field length

I recently get a file to work on generated by crystal report and I had to deal with this format as no other were available. In my case, data were supposed to be split into 11 columns, in the original file there were all in 1, data were separated by a variable number of space. This post will present a process to split cell content when you have no markup. JSON code is provided for reference below.

28.6.11

How to merge records from different columns

This tutorial have been merge with the How to merge cells that have blank cells.

You can also refer to the following video tutorials:

25.6.11

Using "splitByLengths" in Google Refine

Learn how to use the "splitByLengths" function in Google Refine to split a single column into multiple columns based.

24.6.11