This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

Your subscription could not be saved. Please try again.
Your subscription has been successful.

Subscribe to receive our monthly OpenRefine roundups with new tutorials, release updates and community announcements.

4.5.20

OpenRefine April 2020 review


April was a busy month for the OpenRefine community with new reconciliation services and plugin updates! Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox. 

30.3.20

OpenRefine March 2020 review

The latest edition of the OpenRefine review is ready. Through March the community published a LOT of new video tutorials in six languages! 

Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox. 

29.3.20

Concatenate Column in OpenRefine 3.0 and 3.3


We all know the pain of merging different columns in OpenRefine when you have null values. Before version 3.0, it required writing a complex GREL expression or managing multiple filters to ensure we are not losing any data. 

Those shortcomings have been addressed in the latest version! 

Starting OpenRefine 3.0, we have the coalesce() function:  which natively handles the null correctly. 

But evermore importantly, OpenRefine 3.3 introduced a user interface that offers tons of flexibility, including defining how you want to concatenate one or multiple columns together. 

I recorded a quick video demonstration: 

27.3.20

Solving Google’s reCAPTCHA v2 with ParseHub Agent


ParseHub is a great point and click web scraping software. While projects run on ParseHub servers, you can connect with third party proxies like Luminati or captcha resolution service like 2Captcha

In this tutorial, we will show you how to bypass Google Recaptcha v2 test page with ParseHub Agent and 2Captcha service. You will need to create an account with 2Captcha and have an API key to complete this tutorial. 

Don't hesitate to contact us if you have questions or need help to implement web scraping projects.


28.2.20

OpenRefine February 2020 Review

The February edition of our OpenRefine news rounds-up is ready. For this month, we did some digging on YouTube for the best OpenRefine video tutorial in your language.

Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox. 

4.2.20

OpenRefine January 2020 Review

With the new year, we decided to start a monthly review of what happened in the OpenRefine community.  We listed below a summary of what happened in December 2019 and January 2020. Let us know if we missed something.

Do not forget to subscribe to our newsletter on the right to never miss an update. 

20.1.20

How to use Columnize by Key/Value


Based on the Wordpress export file in XML format shared by Adam K on the user mailing list the column item - wp:postmeta - wp:meta_key and item - wp:postmeta - wp:meta_value store data in a key:value format. The column meta_key indicates the value type and the column meta_value store the field value.