Fill down the right and secure way

The fill down function consists of taking the content of cells and copying down following blank cells. This is done based on the rows number. When you perform this action using the fill down function, Google refine does not take into account if rows belong to different records or not, if the following rows are blank, it will fill it down with the content of the previous row.

If you do not use this function with extra care you can easily corrupt the integrity of your data set. In a nutshell use  row.record.cells[columnName].value[0]   to fill down data within the same record. 

Here is why, and how to avoid that.

In the following example, we have a set of languages attached to some cities and data. We are working in a records mode where different cities share the same languages. Please note the same some language does not have data attached (ie Azerbaijani, Bengali and Croatian.)

In our case, we want to have on every row to carry the language and the data attached to the city. At the first try we might want to use the fill down function:

As explained at the top of this post, fill down function really fill down all blank cells until the next one with content and does not take into account the notion of records. Our data get all mixed up:

Rollback your project using the history tab. Now we see the danger of using the fill down, we will use the function:

The Error: java.lang.ArrayIndexOutOfBoundsException: 0 indicates that there is no value to fill down for this cell. In this case, this is the records matching Azerbaijani and Bengali, which is normal.

This function recognized different records based on the first column and fill down only value within a record. This allows us to have a clean fill down and not having Arabic data going down to Azerbaijani and Bengali rows! When using the row.record.cells function make sure to that:
  1. You have clearly identified your records using the first column ;
  2. You fill down the first column only at the end, once you took care of all the others.

based on Mekk's comment on the merge records article

This post was


Post a Comment