This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

21.12.12

The Named Entity Extractor extension by Free You Metadata (from around the web)

The Free Your Metadata Named Entity Extractor extension helps you to enrich your data in OpenRefine using AlchemyAPI, DBpedia Lookup and Zemanta. The extension works on plain text field and any unstructured (meta)data

15.11.12

Mining and OpenRefine(ing) JISCMail: (from around the web)

A look at OER-DISCUSS [Listserv] JISC CETIS MASHe: a complete tutorial to scrap data from a mailing list and analyse participant and contribution.

Read the full article.

Finding (Nearly) Duplicate Items in a Data Column (from around the web)

An other great article by Tony Hirst. This tutorial will show you how to use clustering function (ngram and fingerprint) directly in your facet. Really handy.

Read the full article.

7.11.12

From Excel file to RDF with links to DBpedia and Europeana (from around the web)

DERI Galway the author of the RDF extension (download and documentation here) show steps by steps how to use the RDF extension to reconcile your data against DBpedia and Europeana. This tutorial also go through the step to create an RDF schema

6.11.12

Chit Chat with New Datasets – Facets in OpenRefine (Was /Google Refine/) (from around the web)

A good review of faceting capability including text, numeric, timeline customized and scatterplot facet.

Read the full article 

31.10.12

Cleaning Date with Google Refine (from around the web)

Basic tutorial to clean up some date using OpenRefine. Great example of well structure GREL syntax to build complex transformation.

Read the full article on Hermanes Barbara's blog

26.10.12

Refine your EventBrite guestlist (from around the web)

This recipe shows you how to use Google Refine to fetch details from your EventBrite account, and to explore your guest lists in detail. This tutorial show how to - use the Create Project via Web Addresses (URLs)' and - retrieve guest information using the EventBrite API.

The full article: http://www.opendatacookbook.net/wiki/recipe/a_refined_guestlist

22.10.12

A framework for the OpenRefine community

Following the results from the Google Refine Usage Survey, I would like to share a more personal vision of the birth of the OpenRefine community. The code and all issues have been recently moved to Github, the wiki will close soon and the project would have left the Google code environment.

However while a clear consensus have been found to go for GitHub (GitHub got voted 35 out of 43 responses, see results here) to host the code and issue tracker, I am not sure that GitHub is the right place to for the documentation. In this post I'll try to explain the reason why. Please note that I am open to comments and suggestions regarding analysis and proposition I'll do in this post. OpenRefine is now within the community hands and everyone voice count.

19.10.12

Google Refine Administrativa Survey Results

Following the Usage Survey (see first results), we open a survey to understand the community preference regarding tool to administrate OpenRefine. Thanks to the 43 participants. 

I blow provide a first flat analysis of the results with decision that have already been made based on this survey. You can access details answers here and the survey form here.

18.10.12

Google Refine Recipe (from around the web)

Keith Maguire provide a list of short and sweet recipe to Capitalise the first letter, isolating value, faceted browsing or Comparing two columns with Refine. Enjoy!

via Delicious http://www.keithmaguire.net/blog/categories/refine/

Google Refine Usage Survey Results

Following the survey on Google Refine usage we distribute last week I would like first thanks the 99 persons who participated. Thanks to your answer, we now have a better understanding of who use Google Refine, how and what's the community expectations are.

Thanks again for spreading the word and providing detailed and insightful answers. Here is a first flat analysis of answer collected. You can access details answers here (email addresses have been removed) and the survey questions here.

10.10.12

Google Refine project administrivia survey

The survey is now closed. For archive purpose please find a copy of the questions below. Thanks to all the participants.

5.10.12

Open Refine Survey

The survey is now closed. For archive purpose please find a copy of the questions below. Thanks to all the participants.

3.10.12

From Freebase Gridworks to Google Refine and now OpenRefine

Yesterday David Huynh announced that Google will soon stop its active support of Google Refine and count of community to get more involved to growth Refine.

Refine is already a mature data cleaning tool, this change in leadership will be a major challenge for the tool continuity. But first I'd like to clarify what I have read on twitter yesterday night. Google Refine has always been an open source tool and anyone can commit changes, develop an extension or update the wiki.

Through this post I'd like to give my insight on the reason of this decision and what will be the short terms consequences of it.


Grabbing Twitter Search Results into Google Refine And Exporting Conversations into Gephi (from around the web)

Grabbing Twitter Search Results into Google Refine And Exporting Conversations into Gephi 

This neat tutorial explained how to import data directly from the twitter API at the project creation stage using JSON language. The second part of the tutorial explains how to prepare the data to import the in Gephi for data visualization purpose.

via Delicious 

29.9.12

Use Google Refine to clean your data for Fulcrum (from around the web)

Use Google Refine to clean your data for Fulcrum

Fulcrum allow to create location-based data collection apps and deploy them to your mobile device. This tutorial show how to use google refine to take advantages of the data you have collected using fulcrum

via Delicious http://docs.fulcrumapp.com/guides/cleaning-up-data-with-google-refine/

27.9.12

THATCamp Paris 2012 PiratePad (from around the web)

THATCamp Paris 2012 PiratePad 

PiratePad in french written during THATCamp Paris 2012 presenting a reflexion on how to use google refine or other tools to clean and work on data from a research worker perspective.

via Delicious http://piratepad.net/google-refine

10.9.12

Error: smartSplit error: Un-terminated quoted field at end of CSV line

I am a big fan of the smartSplit function. It is really easy to understand and help to extract quickly part of a string based on any character. However if while using the smartSplit function a cells contains a double quote - " - sign, google refine will return the following error message
Error: smartSplit error: Un-terminated quoted field at end of CSV line

Here is my work around.

5.9.12

Google Refine Workshop (from around the web)

This tutorial / exercise will walk you through all google refine main functionality. Through it's exercise so you can get your hand on quickly!

Data Journalism Workshop - New York (from around the web)

Google Hangout of HHNew York presenting Google Refine

11.8.12

Google Refine Uploader and Stats Extension



The Google Refine Uploader Extension allows you to export datasets from Google Refine and post them as JSON to web servers! Intended for use with CouchDB. Please note that this extension is a work in progress. Feel free to join and help 


This extension is based on Chicago Tribune Stats extension. A tutorial is available on their blog. Please note that the extension does not work with Google Refine 2.5. It should be tested with the 2.0 version available here.

If you have installed and tried any of those two extensions, I`ll be pleased to heard from you!

Data Shaping in Google Refine – Generating New Rows from Multiple Values in a Single Column


Data Shaping in Google Refine – Generating New Rows from Multiple Values in a Single Column


Great tutorial to reshape data set using transpose and fill down function. This article also introduce the split multi-valued cells function to split and transpose in one shot.

19.7.12

Count with google refine

Count and perform basic operation in google refine? Yes, that's possible and we will see how. This article is a translation / adaptation of rechnen mig google refine published by cosmin on databeast.org. All images are from the original article.



27.6.12

Google Refine Reconciliation Service support for Apache Standbol (from around the web)

Add support for the Reconciliation Service API to the Apache Stanbol

Entityhub RESTful API (see documentation). The Google Refine ReconciliationServiceApi allows to reconcile String values with Entities.  The Entityhub is very well suited for implementing this service as it can execute those queries very efficiently based on the SolrYard implementation.

Clean data is the best weapon against the monkey insurrection (from around the web)

Clean data is the best weapon against the monkey insurrection

An entry level and fun tutorial for data journalist or open data people covering all the mains aspects to clean a data set.

Capturing Interactive Data Transformation Operations using Provenance Workflows (from around the web)

Capturing Interactive Data Transformation Operations using Provenance Workflows


Abstract:


The ready availability of data is leading to the increased opportunity of their re-use for new applications and for analyses. Most of these data are not necessarily in the format users want, are usually heterogeneous, and highly dynamic, and this necessitates data transformation eff orts to repurpose them. Interactive data transformation (IDT) tools are becoming easily available to lower these barriers to data transformation eff orts. This paper describes a principled way to capture data lineage of interactive data transformation processes. We provide a formal model of IDT, its mapping to a provenance representation, and its implementation and validation on Google Re fine. Provision of the data transformation process sequences allows assessment of data quality and ensures portability between IDT and other data transformation platforms. The proposed model showed a high level of coverage against a set of requirements used for evaluating systems that provide provenance management solutions.


26.6.12

Transforming spreadsheets into SKOS with Google Refine (from around the web)

Transforming spreadsheets into SKOS with Google Refine

This article go through the step to transform an Excel document to Simple Knowledge Organization System Reference (SKOS) using the RDF extension.

25.6.12

Google refine ; JSON and my notepad or how to write script in google refine

One of the nice thing about google refine is that every action you do generate a JSON code. If we want to do a comparison with Excel, the JSON code generated can be compared to record a macro. The sweet spot of Google Refine is that you don't need to click on the record button, it keep track of all your actions automatically and that can be easily exported for back up or editing purpose.

5.6.12

Creating row and record index

Google Refine provide the row index as information in the third column. Unfortunately GREL expression cannot call value in this column, you need to use one of the following expression to generate the value.


4.6.12

Sort by multiple criteria

Google Refine sort function allow a combination of several columns to sort by field A and field B. 


In my case, I used this method as a work extract the most recent title posted from a records in a list of radio show (using a timestamp field). As I am not aware for a way to select a specific row within a record, I used the sort function to present the record I wanted to extract at the top my the record group.

3.6.12

Google Refine + Perl (from around the web)




Make Google Refine and Perl transforms one-liners work together using the fetch by url (RESTful API)

2.6.12

Create records in Google Refine

This short tutorial describe how to create records in Google Refine. For the difference between a row can present a data set in row or record mode (see the difference between the two).


17.5.12

Institutional locations as Linked Data through Google Refine (from around the web)



Posted: 02 May 2012 07:26 AM PDT
A complete example on how to use the RDF extension in google refine.


16.5.12

University buildings as Linked Data with ScraperWiki (from around the web)



Example of unlock JSON call made via a Google Refine column transform. This post also explain how to use the value.parseJson() function

28.4.12

Field format change accidentally to Number and how to add leading 0

By inadvertence one can transform quickly a field containing number in a text format to number format. This mainly happen during the project creation (import) or when creating new column. This conversion to number can lead to a loss of data like leading 0. Here is how to get them back and avoid this to happen again.


26.4.12

Data exploration tutorial with google refine

Recently, Hugh Stimson published a great article: Data Mining My Old Radio Playlists. His post mix tutorials on php scripting, data cleaning with google refine and data analysis with PostgreSQL.

This answer post demonstrate that data analysis is fully doable in google refine using really basic function (I'll be using GREL function only once for the long tail analysis). I guess also this post is a good illustration of my previous post on data exploration using google refine.

25.4.12

Data-Mining My Old Radio Playlists (from around the web)




Posted: 24 Apr 2012 07:00 AM PDT
An example of web scraping and data analysis using google refine. In this tutorial, the cluster function is used to clean up the data set. The analysis part could also have been done in google refine using the facet option.
You might be interested to read also the Data Exploration Tutorial with OpenRefine that show, based on the same database, how to use OpenRefine to analyse the data (and not only clean them).

11.4.12

How to enhance your data set with freebase and google refine.The Lawrence Collection example.


The National Library of Ireland used google refine to improve the access to the Lawrence Collection (photography collection) by using freebase reconciliation service to map where pictures have been taken!

Using Google Refine to clean mortgage data (from around the web)


Using Google Refine to clean mortgage data (from around the web)

Posted: 10 Apr 2012 07:30 AM PDT
A nice tutorial explaining how to clean and facet data. This example is based on bank mortage data.

10.4.12

Fusion Table, map multiple items with the same location


When you want to map multiple items with the same location in Fusion Table, only one item is displayed and all the others are ignored. There is several workaround to this  major limitation, and the most common is to change slightly your coordinate (longitude / latitude) so your point will appear close to each other on the map (tip from the google fusion team itself). 

When working with large data set, identifying and manually correcting all records sharing the same location can become time consuming. So I've been looking how to deal with this in Google refine and ends up with this straight forward process.

8.4.12

Social Interest Positioning – Visualising Facebook Friends’ Likes With Data Grab

Social Interest Positioning – Visualising Facebook Friends' Likes With Data Grab...



Complete tuturial including, cleaning the data with grefine and visualization with Gephi.

31.3.12

Free (and rebuild) the tweets! Export TwapperKeeper archives using Google Refine.

Free (and rebuild) the tweets! Export TwapperKeeper archives using Google Refine...

So here's a way you can make a copy of a Twapper Keeper archive and rebuild the data using Google Refine.

30.3.12

Working with Organisation XML files in Google Refine : IATI Support

Working with Organisation XML files in Google Refine : IATI Support

Example to open an xml file with the google refine 2.5

25.3.12

Looking up Images Trademarked By Companies Using OpenCorporates and Google Refine

Looking up Images Trademarked By Companies Using OpenCorporates and Google Refine
Listening to Chris Taggart talking about OpenCorporates at netzwerk recherche conf – data, research, stories, I figured I really should start to have a play…Looking through the example data available from an opencorporates company ID via the API, I spotted that registered trademark data was ...

20.3.12

Rejex: the JavaScript regular expression editor

Rejex: the JavaScript regular expression editor
Google refine support regex. This online regular expression editor is quite handy to test the expression before using it on grefine.

15.3.12

LOD2 extension for Grefine · GitHub

LOD2 extension for Grefine · GitHub

LOD2 Google Refine is a version of Google Refine, which includes some extensions
to help you deal with Linked Open Data. With these extensions you can:
- reconcile your data with DBpedia or RDF file or SPARQL endpoint
- to extend your reconciled data with data from DBpedia
- to export data into RDF
- to extract entities from full text descriptions in your data
- and more...

13.3.12

Using Google Refine and taxonomic databases (EOL, NCBI, uBio, WORMS)

Using Google Refine and taxonomic databases (EOL, NCBI, uBio, WORMS)

Tutorial to use grefine and reconciliate against 4 taxonomic databases

11.3.12

Using Google Refine to add administrative geography

Using Google Refine to add administrative geography

I've recently been pulling a list of the 92 top-level football grounds together - as I'm interested to play around with linking this with various aspects of administrative geography and census-type data. It's a niche!So - I compiled a list and the grounds and their addresses via.. Wikipedia. Took ...

9.3.12

Difference between a record and a row

Google refine make a clear distinction between a row and a record. We will see what's the difference between the two and advantages to works in records mode.

Fill down the right and secure way

The fill down function consist to taking the content of a cells and copying down following blank cells. This is done based on the rows number. When you perform this action using the fill down function, Google refine does not take into account if rows belong to different records or not, if the following rows is blank, it will fill it down with the previous rows content. If you do not use this function with extra care you can easily corrupt the integrity of your data set. Here is why, and how to avoid that.

6.3.12

Google refine extension for linkedgov.org


LinkedGov is a community project to collaboratively clean and make usable data from local authorities and other public bodies.
See the documentation and the code on GitHub

24.2.12

Tutorial: From pdf to searchable, sortable table

Selecting a string within a cell using smartSplit

The function smartSplit is a variation on split function that allow you to split the cell content based on any string of character and then select the leg you want to work on. This function is very useful to extract or remove string within cells without creating multiple columns and then merging them back.

20.2.12

Google Refine for Investigative Journalism

http://dannguyen.github.com/NICAR-Google-Refine/

Good introduction to grefine to navigate and clean data.

16.2.12

Count how often a character occurs in a cell

Did you know that Refine can count how often an string or character appears in a cell?

To achieve this, I first recommend that you store the count result in a separate column (so you do not write over your initial content). Select your reference column (where you want to do the count per cells) and create a new column based on this column. An other option is to store the result in a custom text facet.

We will use the Grel expression value.split(" ").length().

However if the cells does not contains the value Refine will still return '1'. I found two ways to work around this issue.

15.2.12

Google Refine tips

 Google Refine tips

Google Refine is currently the best free software tool for cleaning up messy data. It's perfect to correct unescaped HTML strings, catch an odd typo or fetch additional data about entities from Freebase.We use it extensively at Zemanta to clean up and reconcile customer's datasets before importing ...

A video tutorial to parse JSON string

A video tutorial to parse JSON string

This tutorial explain how to populate species pages in the BDRS using Google refine. A JSON string is generated from a souce, parsed and cleaned in Google refine and exported back in JSON format.

13.2.12

How to: convert easting/northing into lat/long for an interactive map

How to: convert easting/northing into lat/long for an interactive map 
Google Fusion Tables is great for creating interactive maps from a spreadsheet – but it isn't too keen on easting and northing. That can be a problem as many government and local authority datasets use easting and northing to describe the geographical position of things – for example, speed ...

7.2.12

Data Clustering With The Google.

Data Clustering With The Google.


Nice introduction starts slide 12.
Presentation by Bob Lannon Senior NLP Analyst, Verilogue

6.2.12

Create a project based on a url (xml)

This video tutorials show how to create a project in google refine 2.5 based on a online xml file. The full tutorial is available here.

4.2.12

Visualisation on Top 100 Chemical Companies with Google Refine and Google Fusion Table.

Visualisation on Top 100 Chemical Companies with Google Refine and Google Fusion Table.

On 29th November, Plant Life team (3 Developers Hackers and 4 Journalists Hacks) managed to implement a visualisation task on top 100 Chemical Companies in less than 7 hours and won the runner up prize for the first RBI Hacks and Hacker Day eventThe result look like those on ...


1.2.12

Free Your Metadata : a Concrete Action Plan

Free Your Metadata : a Concrete Action Plan

1h13 tutorial of by Free Your Metadata @ Columbia University

Visualizing French Tax Data using grefine and tableau

Visualizing French Tax Data using grefine and tableau

A nice tutorial mixing methodology and concrete action to gather, clean, harmonized, merge and visualize data (through tableau software)

28.1.12

Merging Datasets with Common Columns in Google Refine


Merging Datasets with Common Columns in Google Refine

It's an often encountered situation, but one that can be a pain to address – merging data from two sources around a common column. Here's a way of doing it in Google Refine…Here are a couple of example datasets to import into separate Google Refine projects if you want to play along, both courtesy ...

27.1.12

Fragments: Glueing Different Data Sources Together With Google Refine

Fragments: Glueing Different Data Sources Together With Google Refine 
I'm working on a new pattern using Google Refine as the hub for a data fusion experiment pulling together data from different sources. I'm not sure how it'll play out in the end, but here are some fragments….Grab Data into Google Refine as CSV from a URL (Proxied Google Spreadsheet Query via Yahoo ...

24.1.12

Chapter 1. Using Google Refine to Clean Messy Data


Chapter 1. Using Google Refine to Clean Messy Data 

Google Refine (the program formerly known as Freebase Gridworks) is described by its creators as a "power tool for working with messy data" but could very well be advertised as "remedy for eye fatigue, migraines, depression, and other symptoms of prolonged data-cleaning."Even journalists with ...