This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.


Increase Set Facet Choice Count

The default maximum number of values displayed in a facet is 2,000. This limitation has been set to prevent Refine to slow down when working with large datasets.

You can override this parameter by changing this value. Click on Set Choice Count Limit and define the new maximum (Refine prompt automatically a value matching the maximum number of value is your current facet).


Parsing Apache log using OpenRefine

Recently I was looking for a quick way to explore some apache log file. I didn't want to set up any software and I wanted to analyze some very precise path for a specific user, or what happen after a specific error. So I thought about OpenRefine and its parsing capabilities.

The recipe doesn't replace an analytical tool to understand your traffic but help to go behind the curtain and drill down to analyze specific IP address or user, type of error code and patterns


Announcing RefinePro

Less than two weeks ago we announced the start up of RefinePro as a new participant of the OpenRefine ecosystem. This post provides background on where we come from and how we see  the position of RefinePro within OpenRefine community.

Why a company? 

I have been actively involved with the OpenRefine community since summer 2011 when I opened this blog to document tricks and tutorials regarding OpenRefine usage. I am also active on the mailing list to provide support and I maintain the twitter account to share project news and answer questions..

Over the last year I have witnessed how the community has grown. Today OpenRefine is downloaded over 1,000 times per week! MOOC (Massive Open Online Courses) are being dedicated to OpenRefine. OpenRefine twitter feed is buzzing daily with new tutorials, workshop and blog articles.

Data wrangling and cleaning tools are getting in demand and we see a bright future for OpenRefine. We want to take Refine to the next level and to commit a full time dedicated team to close existing issues, develop new functionality and improve the level of support.

What next?

We plan first to provide a hosted access to OpenRefine to

  • Make it easier for people to start with Refine by removing the installation process,
  • Offer access to your project from multiple devices,
  • Power up your project with more compute power.

We will be testing our business model and architecture through a beta access.

Reserve your spot for RefinePro beta today

If you already haven't, you can help us by spreading the word about our initiative by email friends and coworkers and sharing the news on twitter, Facebook, Linkedin or Google+

The content from this blog will migrate to RefinePro in order to keep branding consistent. We will notify our readers when this happens.

If you are just curious and want to stay in the loop you can subscribe to our mailing list (beta user are already registered) or follow us on twitter or google +.


OpenRefine Usage survey (2014)

Earlier this year the community had a discussion regarding the substainibility of the community and draft for a bounty model and governance model have been proposed with little feedback from the community.


Prepare SQL update where query in OpenRefine

Following the article regarding how to prepare SQL SELECT, INSERT INTO, DELETE query using OpenRefine, we will now see how the template function can be used to prepare UPDATE WHERE statement.

UPDATE WHERE statement are slightly different from a select or an insert statement since we want to define a different where clause for each record. Thus we cannot wrap all rows we want to update into a single query. We will need to write a separate statement for each record in our OpenRefine project and define the SET and WHERE conditions.


Prepare SQL SELECT, INSERT INTO, DELETE query using OpenRefine

I know that a lot of us export data from MySQL / SQL databases to clean them in OpenRefine before loading them back in their original database. Before, I was exporting my project to csv and loading the csv using some command utilities for MySQL, it worked by that was a painful process with a lot of details to pay attention to (encoding, field separator ...). But all this was was before I found a new way to use the template option of OpenRefine to prepare large select, update, insert or delete SQL statement

So instead of exporting to csv and importing through an other interface / tool like phpmyadmin you can use the template function of OpenRefine to preparethat will iterate through all the row of your project.


Padding left and right

Padding is the action of adding 0 to the left or the right of a text value until you reach a certain string lenght.

The padding function can be useful in OpenRefine if you lost leading zero while importing or transforming your data.

Of couse you can hack the example below to add letters or any other type of charactere.

padding left up to four digit
"0000"[0,4-value.length()] + value

padding right up to four digit
value + "0000"[0,4-value.length()]