This is the RefinePro knowledge base about OpenRefine. We build it over the years, and keep adding to it. From great tutorials and how-to, to handy GREL expressions and links to external resources, you will find here one of the most comprehensive list of resources to learn OpenRefine.

For a comprehensive documentation you should refer to the official OpenRefine wiki.

Don't where to get started? Search for a specific function below, or read our most popular article from the right side menu.

Showing posts with label web scraping. Show all posts
Showing posts with label web scraping. Show all posts

2.2.22

Visual Web Ripper (VWR) End of Life

 


Visual Web Ripper was one of the first point-and-click web scraping software released over ten years ago and developed by Sequentum. Since June 30, 2022, Sequentum deprecated the license server for VWR.  As a result, all Visual Web Ripper licenses are inactive, and users can no longer run their projects.


In this post, we highlight several key dates in VWR end of life. Sequentum provides a migration path from VWR to its latest technology.

27.3.20

Solving Google’s reCAPTCHA v2 with ParseHub Agent


ParseHub is a great point and click web scraping software. While projects run on ParseHub servers, you can connect with third party proxies like BrightData or captcha resolution service like 2Captcha

In this tutorial, we will show you how to bypass Google Recaptcha v2 test page with ParseHub Agent and 2Captcha service. You will need to create an account with 2Captcha and have an API key to complete this tutorial. 

Don't hesitate to contact us if you want to access the ParseHub project, have questions or need help to implement web scraping projects.


6.6.18

How to call Content Grabber API

Content Grabber is a very powerful and easy to use software developed by Sequentum for web scraping. Its point and click interface allows you to develop a scraper and retrieve data from any website quickly.

In this tutorial, we will describe how to call the Content Grabber API to trigger an agent and pass input parameters. Thanks to Content Grabber API you can embed the scraper in a more complex workflow and configure it on-demand. We will first discuss Content grabber API then I will create a simple example to show step by step how it works

Don't hesitate to contact us if you have questions or need help to implement web scraping projects.

23.10.11

Fetch City and Province / State based on the postal code


In the US, Canada and UK postal code are pretty good code to retrieve information on a location. In this tutorial we will use the yahoo place finder API to add geographical content to a data set based on the postal code. This tutorial can be easily turned around and used to run a query based on a  latitude and longitude (see the end of this post).

18.10.11

Parse mark up language (JSON, html, xml ...)


In this tutorial we will see how to parse mark up language like JSON, html or xml. Those language are great to parse because there is often an easily identifiable markup right before or after the content you want to extract.  In this tutorial we will use a JSON language and extract relevant information by following a six steps process.

On a similar topic: