This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

22.12.20

OpenRefine December 2020 Review

Here is a summary of all the interesting tutorials and videos published about OpenRefine through December. Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox.


30.11.20

OpenRefine November 2020 Review


Here is a summary of all the interesting tutorials and videos published about OpenRefine through November. Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox.

30.10.20

OpenRefine October 2020 Review

Through September and October, the BD Guidance team in Colombia ran numerous workshops on data quality and preparation with OpenRefine (all in Spanish). Fortunately, they recorded everything if you want to catch up. 

Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox.



2.10.20

OpenRefine September 2020 Review

Here is a summary of all the interesting tutorials and videos published about OpenRefine through September. Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox.


17.9.20

Data Operations for CRM and Marketing - OpenRefine Demo - Webinar Recording

Webinar banner

Watch the webinar recording


In case you missed our webinar with Macro on September 10, 2020, you can find here the recording and the slides.

Data management is a core concern facing many companies today. Good data can have a tremendous positive impact on organizational efficiency, productivity, and revenue. Messy data, on the other hand, can have the opposite effect. It can lead to financial losses and disorganization. 


What Is the Business Impact of Data Operations?

  • Better Data-Driven Decisions

  • Improved Lead Management & Qualification

  • Less Frustration


In this webinar, Dan and Martin will discuss the business impact of good data and will touch upon how companies can better manage their data and enhance their data operations strategy.

Martin will also be presenting a demo of OpenRefine, a free open-source data clean up tool.

Dan is the president and founder of Macro. His professional background includes B2B demand generation consulting for Microsoft Dynamics CRM and various international marketing roles in Europe and North America.


Martin is the founder and CEO of RefinePro, a Canadian company focused on data processing and normalization. He created RefinePro to make data within reach of small and medium-sized businesses or departments in larger corporations. Passionate about open source, Martin is a core contributor to the OpenRefine project.


Watch the webinar recording



2.9.20

The Best Online Courses for OpenRefine 3.3- 2020 version

September often means back to school. If OpenRefine is on your list of new software to learn, we listed below (in no particular order) the best courses released in 2020 for OpenRefine 3.3. They all provide a complete overview of OpenRefine from installation to more advanced features like GREL and reconciliation. All courses combine detailed walkthroughs in text and video with hands-on tutorials to get your new skills in practice. Courses are available in 🇬🇧 English, 🇪🇸Spanish and 🇳🇱 Dutch. 


Of course, the Library, Social Science, Ecology Carpentries and Programming Historian lessons are still relevant and being kept up to date. You can also consult our OpenRefine Foundation class (using OpenRefine 2.6).


We listed their curriculum below to ease the comparison. Let us know in the comment if we missed something! Happy learning. 

Map & Data Library, University of Toronto


https://mdl.library.utoronto.ca/tools/openrefine

Course Content: 

  1. (Optional) OpenRefine Installation Instructions
  2. OpenRefine Tutorial 1. Survey of Household Spending Activity
  3. OpenRefine Tutorial 2. Citizen Science Activity
  4. OpenRefine Tutorial 3. Regular Expressions (Regex) Activity
  5. OpenRefine Tutorial 4. 311 Calls Activity
  6. OpenRefine Augmenting Activity 1: Preparing the data
  7. OpenRefine Augmenting Activity 2: Using Reconciliation Services
  8. OpenRefine Augmenting Activity 3: Using Add Column by Fetching URLs
  9. OpenRefine Augmenting Activity 4: Using Python

Griffith Library

With direct access to their video:


🇪🇸 GBIF | Global Biodiversity Information Facility: Guía de Uso Básico de OpenRefine para la limpieza de datos sobre biodiversidad


Under Progress documentation in Spanish: https://doi.org/10.15468/doc-gzjg-af18

Course Content: 
  • 1. Carga de datos y creación de un proyecto
  • 2. Limpieza de datos
  • 2.1. Manejo básico de columnas
    • 2.2. Uso de Facetas
    • 2.3. Uso de Filtros
    • 2.4. Uso de Agrupamientos
    • 2.5. Deshacer y rehacer cambios
    • 2.6. Marcado de registros: banderas y estrellas
  • 3. Guardado y exportación de datos y proyectos
    • 3.1. Guardado de datos y proyectos
    • 3.2. Exportación de datos y proyectos
  • 4. Consultas a servicios externos
    • 4.1. Consultas externas a través de URLs
  • Epílogo
    • Agradecimientos
  • Apéndice 1: instalación de OpenRefine

🇳🇱 Platform ZelfDoen: online cursus OpenRefine


A Dutch translation of the Library Carpentries course: https://www.zelfdoeninzh.nl/info-tips/online-cursus-openrefine/

Course Content: 

  • hoofdstuk 1: OpenRefine: tool voor het opschonen van collectie data
  • hoofdstuk 2: Voorbereiding: download het programma en een voorbeeld dataset
  • hoofdstuk 3: Importeren van data in OpenRefine
  • hoofdstuk 4: De lay out van OpenRefine (rijen en records)
  • hoofdstuk 5: Over facetten en filters
  • hoofdstuk 6: Over clusteren van gegevens
  • hoofdstuk 7: Werken met kolommen en sorteren
  • hoofdstuk 8: Transformaties: introductie
  • hoofdstuk 9: Transformaties: redo en undo
  • hoofdstuk 10: Transformaties: tekst, getallen, data en booleaans
  • hoofdstuk 11: Transformaties: arrays of reeksen
  • Hoofdstuk 12: Exporteren van je dataset

28.8.20

OpenRefine August 2020 Review

 Here is a summary of all the interesting tutorials and videos published about OpenRefine through August. Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox.


29.7.20

OpenRefine July 2020 Review

Here is a summary of all the interesting tutorials and videos published about OpenRefine through July. Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox.


30.6.20

OpenRefine June 2020 Review

Here is a summary of all the interesting tutorials and videos published about OpenRefine through June. Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox.

1.6.20

OpenRefine May 2020 Review

The May edition of our OpenRefine news rounds-up is ready.

Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox. 

4.5.20

OpenRefine April 2020 review


April was a busy month for the OpenRefine community with new reconciliation services and plugin updates! Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox. 

30.3.20

OpenRefine March 2020 review

The latest edition of the OpenRefine review is ready. Through March the community published a LOT of new video tutorials in six languages! 

Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox. 

29.3.20

Concatenate Column in OpenRefine 3.0 and 3.3


We all know the pain of merging different columns in OpenRefine when you have null values. Before version 3.0, it required writing a complex GREL expression or managing multiple filters to ensure we are not losing any data. 

Those shortcomings have been addressed in the latest version! 

Starting OpenRefine 3.0, we have the coalesce() function:  which natively handles the null correctly. 

But evermore importantly, OpenRefine 3.3 introduced a user interface that offers tons of flexibility, including defining how you want to concatenate one or multiple columns together. 

I recorded a quick video demonstration: 

27.3.20

Solving Google’s reCAPTCHA v2 with ParseHub Agent


ParseHub is a great point and click web scraping software. While projects run on ParseHub servers, you can connect with third party proxies like BrightData or captcha resolution service like 2Captcha

In this tutorial, we will show you how to bypass Google Recaptcha v2 test page with ParseHub Agent and 2Captcha service. You will need to create an account with 2Captcha and have an API key to complete this tutorial. 

Don't hesitate to contact us if you want to access the ParseHub project, have questions or need help to implement web scraping projects.


28.2.20

OpenRefine February 2020 Review

The February edition of our OpenRefine news rounds-up is ready. For this month, we did some digging on YouTube for the best OpenRefine video tutorial in your language.

Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox. 

4.2.20

OpenRefine January 2020 Review

With the new year, we decided to start a monthly review of what happened in the OpenRefine community.  We listed below a summary of what happened in December 2019 and January 2020. Let us know if we missed something.

Do not forget to subscribe to our newsletter on the right to never miss an update. 

20.1.20

How to use Columnize by Key/Value


Based on the Wordpress export file in XML format shared by Adam K on the user mailing list the column item - wp:postmeta - wp:meta_key and item - wp:postmeta - wp:meta_value store data in a key:value format. The column meta_key indicates the value type and the column meta_value store the field value.