Count with google refine

Count and perform basic operation in google refine? Yes, that's possible and we will see how. This article is a translation / adaptation of rechnen mig google refine published by cosmin on databeast.org. All images are from the original article.

Before we start you should consider to untick the option parse cell text into numbers, dates ... as the multiplication and the division operation may lead to undesirable outcomes if google refine do not identify the cell content properly. To avoid this, I leave the option untick. We will arrange this manually later on.

We will use this basic data set as an example for this article, three columns and seven rows of numbers:

First, we must tell google refine that every column content is a number. For the column A select the drop down menu, edit cells then common transform, and finally to number. Repeat this step for all columns that contain numeric values.

Now the transformation done for all columns, we can start to count using google refine. First open the drop down menu for the column C and select Edit column, and then Add column based on this column.

A new windows will open. First define the name of your new column - like Ergebnis Divisions, which mean division result in german ;-) -  and then key the following GREL expression:

cells["A"].value / cells["C"].value

This will divided the content of the column A cells by the corresponding content of the column C cell and store it in a new column. Click OK to apply the transformation.

of course, you can also use the following compute operations and mix them (please note that Google Refine respects the standard order of operations):

cells["A"].value + cells["C"].value
cells["A"].value - cells['C'].value
cells["A"].value / cells["C"].value
cells["A"].value * cells["C"].value

Division in google refine returning 0 or wrong result

Google Refine might not returns the expected results when you perform a multiplication or division. See the screenshot below for an example where line 2 and should return 1.3333 and line 4 and 5:  0.2 

This comes from the way google refine handles integer arithmetic. As Tom Morris explained in the discussion list: 
"Integer arithmetic is used if all the operands are integers.
 If you divide by 10.0 instead of 10, you'll get the result that you expect."

So in our case, multiplying by 1.0 one value make all operands integers. One can use the following expression to fix the issue:

(cells["B"].value *1.0) / cells["A"].value

This post was


Post a Comment