This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

Showing posts with label fill down. Show all posts
Showing posts with label fill down. Show all posts

25.5.21

using columnName to bulk fill down columns

Thanks to this StackOverflow questionI finally found a great use case to introduce the columnName variable in OpenRefine. 

The columnName variable has been poorly documented (see previous discussions in SO and on the OpenRefine mailing list). The feature got really interesting with the All > Transform option available in OpenRefine 2.7 back in 2017 (yes, this blog post is long overdue!)


20.1.20

How to use Columnize by Key/Value


Based on the Wordpress export file in XML format shared by Adam K on the user mailing list the column item - wp:postmeta - wp:meta_key and item - wp:postmeta - wp:meta_value store data in a key:value format. The column meta_key indicates the value type and the column meta_value store the field value.

11.8.12

Data Shaping in Google Refine – Generating New Rows from Multiple Values in a Single Column


Data Shaping in Google Refine – Generating New Rows from Multiple Values in a Single Column


Great tutorial to reshape data set using transpose and fill down function. This article also introduce the split multi-valued cells function to split and transpose in one shot.

9.3.12

Fill down the right and secure way

The fill down function consists of taking the content of cells and copying down following blank cells. This is done based on the rows number. When you perform this action using the fill down function, Google refine does not take into account if rows belong to different records or not, if the following rows are blank, it will fill it down with the content of the previous row.

If you do not use this function with extra care you can easily corrupt the integrity of your data set. In a nutshell use  row.record.cells[columnName].value[0]   to fill down data within the same record. 

Here is why, and how to avoid that.