Your subscription could not be saved. Please try again.
Your subscription has been successful.

Subscribe to receive our monthly OpenRefine roundups with new tutorials, release updates and community announcements.

20.1.20

How to use Columnize by Key/Value


Based on the Wordpress export file in XML format shared by Adam K on the user mailing list the column item - wp:postmeta - wp:meta_key and item - wp:postmeta - wp:meta_value store data in a key:value format. The column meta_key indicates the value type and the column meta_value store the field value.



This is an advanced tutorial that assumes you understand the difference between a row and a record.

For each post, we want to create one new column for each meta_key with their relative meta_value. To do so we will use the Columnize by Key/Value Columns feature of OpenRefine.

For the purpose of this tutorial I imported the XML and remove columns not relevant along with any row where meta_value is blank.  We end up with the following dataset.



I will use the meta_key = wpcf-resource-type as a control for my tutorial. In the screenshot below, I see that I have seven distinct values with their count.




I can call the Columnize by Key/Value Columns feature from any column and select the Key Column and Value Column.






Though, if I do the operation on my dataset as-is I notice that OpenRefine lost most of my meta_value  




To avoid this, we need to directly indicate for each row which record (or blogpost) they belong to.

The trick is to fill down all column before the field we want to columnize. We use the  row.record.cells[columnName].value[0]  function to ensure we fill down only within a record.  In our case, we fill down on all column before meta_key



Now we invoke again the  Columnize by Key/Value Columns feature and create a facet on the resource-type column to ensure we did not lose any data.

 


As a final control, I recommend creating a duplicate facet on your first column to ensure that OpenRefine did not leave any duplicate records.

In our case, we have 42 records that did not transpose properly with data mixed on two lines. This is due to missing value when we previously fill down all our columns. Go back to that steps and ensure there is no blank or empty cell in all your columns before the key:value pair.



If you want to manually fix it, please refer to the tutorial to merge records on multiple lines.

References




This post was

0 comments:

Post a Comment