This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

Your subscription could not be saved. Please try again.
Your subscription has been successful.

Subscribe to receive our monthly OpenRefine roundups with new tutorials, release updates and community announcements.

27.3.20

Solving Google’s reCAPTCHA v2 with ParseHub Agent


ParseHub is a great point and click web scraping software. While projects run on ParseHub servers, you can connect with third party proxies like Luminati or captcha resolution service like 2Captcha

In this tutorial, we will show you how to bypass Google Recaptcha v2 test page with ParseHub Agent and 2Captcha service. You will need to create an account with 2Captcha and have an API key to complete this tutorial. 


28.2.20

February 2020 Review

The February edition of our OpenRefine news rounds-up is ready. For this month, we did some digging on YouTube for the best OpenRefine video tutorial in your language.

Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox. 

4.2.20

January 2020 Review

With the new year, we decided to start a monthly review of what happened in the OpenRefine community.  We listed below a summary of what happened in December 2019 and January 2020. Let us know if we missed something.

Do not forget to subscribe to our newsletter on the right to never miss an update. 

20.1.20

How to use Columnize by Key/Value


Based on the Wordpress export file in XML format shared by Adam K on the user mailing list the column item - wp:postmeta - wp:meta_key and item - wp:postmeta - wp:meta_value store data in a key:value format. The column meta_key indicates the value type and the column meta_value store the field value.

17.9.18

(Part 2) Update Records in NationBuild API using OpenRefine



This article is part of a series of tutorials to use NationBuilder API. We wrote them in collaboration with Campaign Gears. In the previous article, we explained how to retrieve people information via NationBuilder API. This article explains how to use OpenRefine to bulk update information via NationBuilder API.

(Part 1) Collecting data from NationalBuilder API with OpenRefine


This article is part of a series of tutorials to use NationBuilder API. We wrote them in collaboration with Campaign GearsNationalBuilder is an online software that helps organizations to coordinate their community. It supports members management, web pages, finances, and online communications. NationalBuilder has an API that gives access to NationalBuilder's core features. 

The NationalBuilder API has extensive documentation and options to explore all available features. Using the API Explorer, you can explore the different API endpoints. These endpoints allow you to review all data available in the NationalBuilder API. You can learn more about the API in the Developer Blog and API Documentation.


The NationalBuilder API documentation provides examples of how to access the API with Ruby, PHP, and Python. In this article, we explain how non-developer can use the API thanks to OpenRefine.  We describe two techniques: one using cURL and OpenRefine and one using only OpenRefine. 

6.6.18

How to call Content Grabber API

Content Grabber is a very powerful and easy to use software developed by Sequentum for web scraping. Its point and click interface allows you to develop a scraper and retrieve data from any website quickly.

In this tutorial, we will describe how to call the Content Grabber API to trigger an agent and pass input parameters. Thanks to Content Grabber API you can embed the scraper in a more complex workflow and configure it on-demand. We will first discuss Content grabber API then I will create a simple example to show step by step how it works