This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

31.5.22

OpenRefine May 2022 Review

We wish you a happy June! This month, we decided to share the most relevant and helpful OpenRefine May 2022 releases, tutorials, videos, and academic publications with you. Read on to discover them!


Don't forget to subscribe to our newsletter to get our monthly updates right in your inbox!

28.5.22

OpenRefine Community Metrics

Every year, I need to compile metrics regarding OpenRefine community, either for presentation or grant submission. Every year, I have to go back and check what I did and how I got those number. This year will be the last. Here are the metrics I track, and how I compile them. 


Slide from a presentation from August 2020


2.5.22

OpenRefine April 2022 Review

Happy May! For this month, we decided to share the most relevant and helpful OpenRefine April 2022 releases, tutorials, videos, and academic publications with you. Read on to discover them!


Don't forget to subscribe to our newsletter to get our monthly updates right in your inbox!

17.4.22

Hosting OpenRefine in 2022

Hosting OpenRefine is a long-asked feature from the community. Natively, OpenRefine is designed to run on the user's local machine. Therefore, the software does not include user management, permission, or sharing the compute resources (CPU, RAM) with other users. 

Hosting OpenRefine allows access to multiple users. Use cases include

  • Ease of access to the tool for users with limited permission on their computer or during events (training, hackathons, for example). 
  • Allow working on larger dataset using more powerful online server  - although OpenRefine 4.x with Spark is addressing this issue.
  • Collaboration with multiple user working on the same projects. 
  • Enable hosting in a secured environment to process sensitive data (ie. the data does not go on the user machine)

Hosted Instance with htpasswd 

or any other type of access control to the machine like RDP or shared user account on the machine. In that case OpenRefine is installed as a regular software on an machine that is access remotely. 

Back in 2014, at RefinePro we built a service to manage users and instances of OpenRefine hosted using AWS EC2 instances. Our platform was basically

  • logging users
  • starting their EC2 instance
  • loading their project workspace on the EC2
  • shutting down the EC2 instance once the user finished their work. The goal is to not incurred unnecessary hosting cost. 

We used htpasswd to protect the instance from being publicly available. At the time, we found a way to hide the extra logging from our users, but things remained hacky and not suitable for the long run. Therefore, we decided to stop the service in 2017.

Hosting with JupyterHub

In the last two years, we have seen an increasing number of hosted OpenRefine deployments based on Jupyter and JupyterHub (with Kubernetes). In those deployments, OpenRefine is one of the applications hosted via JupyterHub as part of a larger data science workbench. OpenRefine benefits from JupyterHub user and environments management. 

As of April 2022, there are several publicly advertised JupytherHub deployments, including:

RefinePro also released in collaboration with FAIRPlus one extension and docker customizable docker to help with hosted instances. 

Looking ahead 

Following on the idea from Felix Lohmeier and Tony Hirch on OpenRefine wiki ; I think it would be interesting to develop an official OpenRefine package for JupyterHub. Such package will include docker configuration specific to JupyterHub, dedicated OpenRefine extensions (like the local file extension) along with best practices. 

The creation and maintenance of the package will provide the community with an official way to host OpenRefine. It will also offer a point of contact for potential contributors to improve the package or OpenRefine itself. 

I am interested in your thoughts and potential interest in building this package. 

Introducing OpenRefine Authenticator and File Extensions

The RefinePro team is thrilled to release under the Apache License 2.0 two new extensions for the OpenRefine ecosystem. The extensions have been funded and developed in partnership with Novartis, with the technical help of Aridhia Informatics. They are released under the FAIRplus program. 


Thank you to Jiangbo Dang, Andrea Splendiani, and Rodrigo Barnes for your help. 


Feel free to reach out if you have questions. You can also open issue in each respective repository 


30.3.22

OpenRefine March 2022 Review

Happy April! For this month, we decided to share the most relevant and helpful OpenRefine March 2022 releases, tutorials, videos, and academic publications with you. Read on to discover them!

Don't forget to subscribe to our newsletter to get our monthly updates right in your inbox!

4.3.22

OpenRefine February 2022 Review

Happy March! For this month, we decided to share the most relevant and helpful OpenRefine February 2022 releases, tutorials, videos, and academic publications with you. Read on to discover them!

Don't forget to subscribe to our newsletter to get our monthly updates right in your inbox!

2.2.22

Visual Web Ripper (VWR) End of Life

 


Visual Web Ripper was one of the first point-and-click web scraping software released over ten years ago and developed by Sequentum. On June 30, 2022, Sequentum will deprecate the license server for VWR. After that date, Visual Web Ripper will stop working, and no projects will be executed. 



In this post, we highlight several key dates in VWR end of life. Sequentum provides a migration path from VWR to their latest technology.

31.1.22

OpenRefine January 2022 Review

Happy February! For this month, we decided to share the most relevant and helpful OpenRefine January 2022 releases, tutorials, videos, and academic publications with you. Read on to discover them!

Don't forget to subscribe to our newsletter to get our monthly updates right in your inbox!

4.1.22

OpenRefine December 2021 Review

Happy January! We wish you a very Happy New Year! For the first month of the year, we decided to share the most relevant and helpful OpenRefine December 2021 community announcements, releases, tutorials, videos, and academic publications with you.


Don't forget to subscribe to our newsletter to get our monthly updates right in your inbox!