When working in record mode, it is possible to compare all the value within one record and see if those are unique. To do that we will introduce the function forEach() that let turn the record into an array and count unique value.
Build over time, the RefinePro knowledge base list tutorials, how to and tips for OpenRefine (formerly Google Refine)
22.11.15
Limitation when splitting and joining multi-valued cells
27.10.15
Using facet to cluster a subset of your data when you have a large number of rows in OpenRefine?
You sometimes when working in OpenRefine have to deal with a large data set that refer to the same entity (person, city, book or any other entries) but using different spelling. When working with a large data set the clustering function can become unresponsive, due to the amount of computing done in order to run the different algorithms.
In this tutorial we will see how you can create subset of records to cluster using facets to better manage the compute load on your machine. If you want to cluster your full data set, you just need to run the cluster function on the different subset created with your facet.
6.10.15
Online OpenRefine Foundation Course Now Available
The OpenRefine Foundation course is a progressive program that provides structure and direction for students new to data cleaning and preparation. Each lesson comes with a comprehensive overview of its goals and content, five video tutorials, and a lab to put your learning into practice. The course ends with a quiz to test your knowledge.
23.4.15
Exploring Toronto 311 calls Part1: Why people are calling?
Toronto 311Service Request - Customer Initiated contains information on on customer initiated service requests received by the city of Toronto for Solid Waste Management, Transportation Services, Toronto Water, Municipal Licensing & Standards, and Urban Forestry related request.
In the April session of the Toronto OpenRefine Meetup we took an hour to explore the data set and prepare it to map in using Google Fusion Table. For this example we will use only the 2015 calls available to this date: from January 1st to March 31. Download it from here: http://opendata.toronto.ca/311/service.request/SR2015.zip
The 311 Service Request - Customer Initiated data set have three fields:
- CREATION DATE
- SERVICE REQUEST LOCATION
- SERVICE REQUEST TYPE
1.3.15
[Video] Introduction to GREL
In partnership with the OKFN School of Data , we recorded last November an hour Skillshare on Refine GREL Language. Always wonder how to get started with the Generic Refine Expression Language (GREL)?
Watch this one hour tutorial and learn the basic of the language along with simple expression. Further tutorial will be published to go through each functions presented in this video.
17.2.15
Parsing and extracting HTML tag and links in Refine
I recently helped someone on stackoverflow to parse and extract information from an HTML page. Refine with GREL offer multiple ways to select specific element and contant. This article will review the main functions and specific use cases to illustrate when to use them.