This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.


How to use Columnize by Key/Value

Based on the Wordpress export file in XML format shared by Adam K on the user mailing list the column item - wp:postmeta - wp:meta_key and item - wp:postmeta - wp:meta_value store data in a key:value format. The column meta_key indicates the value type and the column meta_value store the field value.


(Part 2) Update Records in NationBuild API using OpenRefine

This article is part of a series of tutorials to use NationBuilder API. We wrote them in collaboration with Campaign Gears. In the previous article, we explained how to retrieve people information via NationBuilder API. This article explains how to use OpenRefine to bulk update information via NationBuilder API.

(Part 1) Collecting data from NationalBuilder API with OpenRefine

This article is part of a series of tutorials to use NationBuilder API. We wrote them in collaboration with Campaign GearsNationalBuilder is an online software that helps organizations to coordinate their community. It supports members management, web pages, finances, and online communications. NationalBuilder has an API that gives access to NationalBuilder's core features. 

The NationalBuilder API has extensive documentation and options to explore all available features. Using the API Explorer, you can explore the different API endpoints. These endpoints allow you to review all data available in the NationalBuilder API. You can learn more about the API in the Developer Blog and API Documentation.

The NationalBuilder API documentation provides examples of how to access the API with Ruby, PHP, and Python. In this article, we explain how non-developer can use the API thanks to OpenRefine.  We describe two techniques: one using cURL and OpenRefine and one using only OpenRefine. 


How to call Content Grabber API

Content Grabber is a very powerful and easy to use software developed by Sequentum for web scraping. Its point and click interface allows you to develop a scraper and retrieve data from any website quickly.

In this tutorial, we will describe how to call the Content Grabber API to trigger an agent and pass input parameters. Thanks to Content Grabber API you can embed the scraper in a more complex workflow and configure it on-demand. We will first discuss Content grabber API then I will create a simple example to show step by step how it works


How to check if value within a record are unique in OpenRefine

When working in record mode, it is possible to compare all the value within one record and see if those are unique. To do that we will introduce the function forEach() that let turn the record into an array and count unique value.


Limitation when splitting and joining multi-valued cells

Split multi-valued cells function helps to transpose data stored in one cells into multiple rows, while keeping the relationship with the other columns in the data set. In this article we will see some of the limitation of the function when splitting and joining back a data set and how you can work around it.


Using facet to cluster a subset of your data when you have a large number of rows in OpenRefine?

You sometimes when working in OpenRefine have to deal with a large data set that refer to the same entity (person, city, book or any other entries) but using different spelling. When working with a large data set the clustering function can become unresponsive, due to the amount of computing done in order to run the different algorithms.

In this tutorial we will see how you can create subset of records to cluster using facets to better manage the compute load on your machine. If you want to cluster your full data set, you just need to run the cluster function on the different subset created with your facet.