This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

27.6.12

Google Refine Reconciliation Service support for Apache Standbol (from around the web)

Add support for the Reconciliation Service API to the Apache Stanbol

Entityhub RESTful API (see documentation). The Google Refine ReconciliationServiceApi allows to reconcile String values with Entities.  The Entityhub is very well suited for implementing this service as it can execute those queries very efficiently based on the SolrYard implementation.

Capturing Interactive Data Transformation Operations using Provenance Workflows (from around the web)

Capturing Interactive Data Transformation Operations using Provenance Workflows


Abstract:


The ready availability of data is leading to the increased opportunity of their re-use for new applications and for analyses. Most of these data are not necessarily in the format users want, are usually heterogeneous, and highly dynamic, and this necessitates data transformation efforts to repurpose them. Interactive data transformation (IDT) tools are becoming easily available to lower these barriers to data transformation efforts. This paper describes a principled way to capture data lineage of interactive data transformation processes. We provide a formal model of IDT, it's mapping to a provenance representation, and its implementation and validation on Google Refine. Provision of the data transformation process sequences allows assessment of data quality and ensures portability between IDT and other data transformation platforms. The proposed model showed a high level of coverage against a set of requirements used for evaluating systems that provide provenance management solutions.


25.6.12

Google refine ; JSON and my notepad or how to write script in google refine

One of the nice thing about google refine is that every action you do generate a JSON code. If we want to do a comparison with Excel, the JSON code generated can be compared to record a macro. The sweet spot of Google Refine is that you don't need to click on the record button, it keep track of all your actions automatically and that can be easily exported for back up or editing purpose.

5.6.12

Creating row and record index

Google Refine provide the row index as information in the third column. Unfortunately GREL expression cannot call value in this column, you need to use one of the following expression to generate the value.


4.6.12

Sort by multiple criteria

Google Refine sort function allow a combination of several columns to sort by field A and field B. 


In my case, I used this method as a work extract the most recent title posted from a records in a list of radio show (using a timestamp field). As I am not aware for a way to select a specific row within a record, I used the sort function to present the record I wanted to extract at the top my the record group.

3.6.12

Google Refine + Perl (from around the web)




Make Google Refine and Perl transforms one-liners work together using the fetch by url (RESTful API)

2.6.12

Create records in Google Refine

This short tutorial describe how to create records in Google Refine. For the difference between a row can present a data set in row or record mode (see the difference between the two).