5.12.15

How to check if value within a record are unique in OpenRefine

When working in record mode, it is possible to compare all the value within one record and see if those are unique. To do that we will introduce the function forEach() that let turn the record into an array and count unique value.

22.11.15

Limitation when splitting and joining multi-valued cells

Split multi-valued cells function helps to transpose data stored in one cells into multiple rows, while keeping the relationship with the other columns in the data set. In this article we will see some of the limitation of the function when splitting and joining back a data set and how you can work around it.

27.10.15

Using facet to cluster a subset of your data when you have a large number of rows in OpenRefine?


You sometimes when working in OpenRefine have to deal with a large data set that refer to the same entity (person, city, book or any other entries) but using different spelling. When working with a large data set the clustering function can become unresponsive, due to the amount of computing done in order to run the different algorithms.


In this tutorial we will see how you can create subset of records to cluster using facets to better manage the compute load on your machine. If you want to cluster your full data set, you just need to run the cluster function on the different subset created with your facet.


6.10.15

Online OpenRefine Foundation Course Now Available


Learn the basics of data science with the new OpenRefine Foundation course available on the tranzform course platform.

The OpenRefine Foundation course is a progressive program that provides structure and direction for students new to data cleaning and preparation. Each lesson comes with a comprehensive overview of its goals and content, five video tutorials, and a lab to put your learning into practice. The course ends with a quiz to test your knowledge.


23.4.15

Exploring Toronto 311 calls Part1: Why people are calling?


Toronto 311Service Request - Customer Initiated contains information on on customer initiated service requests received by the city of Toronto for Solid Waste Management, Transportation Services, Toronto Water, Municipal Licensing & Standards, and Urban Forestry related request.

In the April session of the Toronto OpenRefine Meetup we took an hour to explore the data set and prepare it to map in using Google Fusion Table. For this example we will use only the 2015 calls available to this date:  from January 1st to March 31. Download it from here: http://opendata.toronto.ca/311/service.request/SR2015.zip


The 311 Service Request - Customer Initiated data set have three fields:
  • CREATION DATE 
  • SERVICE REQUEST LOCATION 
  • SERVICE REQUEST TYPE
In this first article we will explore the relationship between the type of request and its creation date.

1.3.15

[Video] Introduction to GREL


In partnership with the OKFN School of Data , we recorded last November an hour Skillshare on Refine GREL Language.  Always wonder how to get started with the Generic Refine Expression Language (GREL)?

Watch this one hour tutorial and learn the basic of the language along with simple expression. Further tutorial will be published to go through each functions presented in this video.


17.2.15

Parsing and extracting HTML tag and links in Refine

I recently helped someone on stackoverflow to parse and extract information from an HTML page.  Refine with GREL offer multiple ways to select specific element and contant. This article will review the main functions and specific use cases to illustrate when to use them.