Fill down the right and secure way

The fill down function consist to taking the content of a cells and copying down following blank cells. This is done based on the rows number. When you perform this action using the fill down function, Google refine does not take into account if rows belong to different records or not, if the following rows is blank, it will fill it down with the previous rows content. If you do not use this function with extra care you can easily corrupt the integrity of your data set. Here is why, and how to avoid that.

In the following example we have a set of languages attached to some cities and data. We are working in a records mode where different cities share the same languages. Please note same some language does not have data attached (ie Azebaijani, Bengali and Croatian.)

In our case we want to have on every rows to carry the language and the data attached to the city. At the first try we might want to use the fill down function:

As explained at the top of this post, fill down function really fill down all blank cells until the next one with content and does not take into account the notion of records. Our data get all mixed up:

Roll back your project using the history. Now we see the danger of using the fill down, we will use the function:

The Error: java.lang.ArrayIndexOutOfBoundsException: 0 indicates that there is no value to fill down for this cells. In this case this is the records matching Azebaijani and Bengali, which is normal.

This function recognized different records based on the first column and fill down only value within a record. This allow us to have a clean fill down and not having Arabic data going down to Azebaijani and Bengali rows! When using the row.record.cells function make sure to that:
  1. You have clearly identify your records using the first column ;
  2. You fill down the first column only at the end, once you took care of all the others.

based on Mekk comment on the merge records article

This post was


Post a Comment