22.10.12

A framework for the OpenRefine community

Following the results from the Google Refine Usage Survey, I would like to share a more personal vision of the birth of the OpenRefine community. The code and all issues have been recently moved to Github, the wiki will close soon and the project would have left the Google code environment.

However while a clear consensus have been found to go for GitHub (GitHub got voted 35 out of 43 responses, see results here) to host the code and issue tracker, I am not sure that GitHub is the right place to for the documentation. In this post I'll try to explain the reason why. Please note that I am open to comments and suggestions regarding analysis and proposition I'll do in this post. OpenRefine is now within the community hands and everyone voice count.


The variety of usage call for a strong centralized documentation easy to update by anyone.


With no full time resources supporting the project I think we need to take the time to design a build the right framework to empower the majority of people interested to support OpenRefine. Every hands will be welcomed to update and maintain the code but also promote and document OpenRefine  functionality.

The variety of OpenRefine usage ...


In the Google Refine Usage Survey the question 3 show that OpenRefine is a not an every day tool for most of its user (52% use it a few times a month). I suppose that the effort and time spend for learning are within the same idea and thus the learning curve is longer than essential tool (like a email, calendar or Word processor).

Even if 67% of the survey respondents have been using OpenRefine for more than a year, on an average they rate their skills 2.82 out of 5 (see Q8 of the usage survey):



Moreover, the variety of user in terms of origin, usage and coding skills (see Q1, Q7, Q11 & Q12 of the usage survey) create a challenge to support the community in terms of level of documentation and knowledge transfer as expectations differs from user background.

... call for a strong centralized documentation ...


This variety of usage call for a strong user documentation where it is easy to find recipes, tricks and tutorials. However those information are today scatter across multiple platforms (current wiki, help embedded in the interface, this blog and a myriad of other specific tutorials) making it hard to find the right information when one need it.

This can be explained by two reasons:
  1. The author wants to develop his/her personal brand, so they published on their own support.
  2. The google branding may have prevent some of them to directly participate in the wiki or the code (see my explanation near the end of this article)
We should find way to encourage those two category to participate directly to a centralized platform where any new OpenRefine user make the most out of the tool as quickly as possible. Through this we will be able to keep the current user base growing by reducing the learning curve.

... easy to update by anyone.


Due to the variety of OpenRefine user base, knowledge regarding the tool capability to solve different business issues is spread in more than one head. New practice and recipe are developed by multiple people and we should encourage them collaborate in the open and document them publicly so anyone can benefit from it. 

When we look closer at the programming skills of the person willing to actively participate to the community, we can notate that people ready to invest time are below the average of the current user base. Count me in! and maybe answers (total of 75) score 2.97 when Count me in! only (total of 24 persons) have 3.44  and the global average is 3.73 out of 7.
Those results clearly call for a documentation tool with a low technical and process entry barrier.

Do you agree with those results? If no why? What's your thoughts to develop a strong OpenRefine documentation?