This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

Showing posts with label cluster. Show all posts
Showing posts with label cluster. Show all posts

27.10.15

Using facet to cluster a subset of your data when you have a large number of rows in OpenRefine?


You sometimes when working in OpenRefine have to deal with a large data set that refer to the same entity (person, city, book or any other entries) but using different spelling. When working with a large data set the clustering function can become unresponsive, due to the amount of computing done in order to run the different algorithms.


In this tutorial we will see how you can create subset of records to cluster using facets to better manage the compute load on your machine. If you want to cluster your full data set, you just need to run the cluster function on the different subset created with your facet.


15.11.12

Mining and OpenRefine(ing) JISCMail: (from around the web)

A look at OER-DISCUSS [Listserv] JISC CETIS MASHe: a complete tutorial to scrap data from a mailing list and analyse participant and contribution.

Read the full article.

Finding (Nearly) Duplicate Items in a Data Column (from around the web)

An other great article by Tony Hirst. This tutorial will show you how to use clustering function (ngram and fingerprint) directly in your facet. Really handy.

Read the full article.

5.9.12

Google Refine Workshop (from around the web)

This tutorial / exercise will walk you through all google refine main functionality. Through it's exercise so you can get your hand on quickly!

4.10.11

Video tutorial to clean up your dataset (by free your metadata)

A great video tutorial from free your metadata which show you how to:

8.9.11

countif in google refine with facetCount

Countif is an expression in Excel that count every time a value appears in a determine zone of your spreadsheet. Google refine support the same function to count every time a value appears in a column.

2.7.11

Faceting with Freebase Gridworks

Freebase is the name of google refine before google took control of the solution. The two following videos present how to facet (filter) in google refine. The interface and options did not evolve too much over the time and version making those videos still up to date.