Showing posts with label split. Show all posts
Showing posts with label split. Show all posts
22.11.15
Limitation when splitting and joining multi-valued cells
1:35 PM
index, join, multi-valued cells, record, split
10.10.14
Parsing Apache log using OpenRefine
Recently I was looking for a quick way to explore some apache log file. I didn't want to set up any software and I wanted to analyze some very precise path for a specific user, or what happen after a specific error. So I thought about OpenRefine and its parsing capabilities.
The recipe doesn't replace an analytical tool to understand your traffic but help to go behind the curtain and drill down to analyze specific IP address or user, type of error code and patterns
The recipe doesn't replace an analytical tool to understand your traffic but help to go behind the curtain and drill down to analyze specific IP address or user, type of error code and patterns
31.10.12
Cleaning Date with Google Refine (from around the web)
12:55 PM
around the web, if, project creation, regex, split, toDate
Basic tutorial to clean up some date using OpenRefine. Great example of well structure GREL syntax to build complex transformation.
Read the full article on Hermanes Barbara's blog.
Read the full article on Hermanes Barbara's blog.
10.9.12
Error: smartSplit error: Un-terminated quoted field at end of CSV line
8:20 AM
smartSplit, split
I am a big fan of the smartSplit function. It is really easy to understand and help to extract quickly part of a string based on any character. However if while using the smartSplit function a cells contains a double quote - " - sign, google refine will return the following error message
Here is my work around.
Error: smartSplit error: Un-terminated quoted field at end of CSV line
Here is my work around.
5.9.12
Google Refine Workshop (from around the web)
This tutorial / exercise will walk you through all google refine main functionality. Through it's exercise so you can get your hand on quickly!
24.2.12
Selecting a string within a cell using smartSplit
6:04 PM
extract, grel, remove, smartSplit, split
The function smartSplit is a variation on split function that allow you to split the cell content based on any string of character and then select the leg you want to work on. This function is very useful to extract or remove string within cells without creating multiple columns and then merging them back.
16.2.12
Count how often a character occurs in a cell
Did you know that Refine can count how often an string or character appears in a cell?
To achieve this, I first recommend that you store the count result in a separate column (so you do not write over your initial content). Select your reference column (where you want to do the count per cells) and create a new column based on this column. An other option is to store the result in a custom text facet.
We will use the Grel expression value.split(" ").length().
However if the cells does not contains the value Refine will still return '1'. I found two ways to work around this issue.
To achieve this, I first recommend that you store the count result in a separate column (so you do not write over your initial content). Select your reference column (where you want to do the count per cells) and create a new column based on this column. An other option is to store the result in a custom text facet.
We will use the Grel expression value.split(" ").length().
However if the cells does not contains the value Refine will still return '1'. I found two ways to work around this issue.
18.10.11
Parse mark up language (JSON, html, xml ...)
In this tutorial we will see how to parse mark up language like JSON, html or xml. Those language are great to parse because there is often an easily identifiable markup right before or after the content you want to extract. In this tutorial we will use a JSON language and extract relevant information by following a six steps process.
On a similar topic:
On a similar topic:
13.10.11
Update phone number format
This post is a quick adaptation to phone number based on the method presented in the add a space to postal code (splitByLength and Merge function).
5.10.11
Extract from twitter hastag and reference
This case has been brought to me by cosmin who wanted to extract hastag from tweets for some analysis and data visualization. Data have been gather using ScraperWiki and their ability to scrap twitter data into one single document (see the video tutorial).
18.9.11
19.7.11
Add a space to postal code (splitByLength and Merge function)
This short tips explains how to convert postal code store on 6 characters to 7 by adding a space after 3 digits. We will use splitByLength (see related video) and merge multiple column into one functions.
29.6.11
Split cell content into multiple column, non fixed field length
I recently get a file to work on generated by crystal report and I had to deal with this format as no other were available. In my case, data were supposed to be split into 11 columns, in the original file there were all in 1, data were separated by a variable number of space. This post will present a process to split cell content when you have no markup. JSON code is provided for reference below.
25.6.11
Using "splitByLengths" in Google Refine
1:30 PM
around the web, grel, split, video
Learn how to use the "splitByLengths" function in Google Refine to split a single column into multiple columns based.
Subscribe to:
Posts (Atom)