4.5.20
30.3.20
OpenRefine March 2020 review
12:14 PM
around the web
The latest edition of the OpenRefine review is ready. Through March the community published a LOT of new video tutorials in six languages!
Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox.
Do not forget to subscribe to our newsletter to get our monthly update right in your mailbox.
29.3.20
Concatenate Column in OpenRefine 3.0 and 3.3
3:04 PM
concatenate, join, merge
We all know the pain of merging different columns in OpenRefine when you have null values. Before version 3.0, it required writing a complex GREL expression or managing multiple filters to ensure we are not losing any data.
Those shortcomings have been addressed in the latest version!
Starting OpenRefine 3.0, we have the coalesce() function: which natively handles the null correctly.
But evermore importantly, OpenRefine 3.3 introduced a user interface that offers tons of flexibility, including defining how you want to concatenate one or multiple columns together.
I recorded a quick video demonstration:
27.3.20
Solving Google’s reCAPTCHA v2 with ParseHub Agent
1:03 PM
captcha, parsehub, web scraping
ParseHub is a great point and click web scraping software. While projects run on ParseHub servers, you can connect with third party proxies like BrightData or captcha resolution service like 2Captcha.
In this tutorial, we will show you how to bypass Google Recaptcha v2 test page with ParseHub Agent and 2Captcha service. You will need to create an account with 2Captcha and have an API key to complete this tutorial.
Don't hesitate to contact us if you want to access the ParseHub project, have questions or need help to implement web scraping projects.
In this tutorial, we will show you how to bypass Google Recaptcha v2 test page with ParseHub Agent and 2Captcha service. You will need to create an account with 2Captcha and have an API key to complete this tutorial.
Don't hesitate to contact us if you want to access the ParseHub project, have questions or need help to implement web scraping projects.
28.2.20
4.2.20
OpenRefine January 2020 Review
10:36 AM
around the web
With the new year, we decided to start a monthly review of what happened in the OpenRefine community. We listed below a summary of what happened in December 2019 and January 2020. Let us know if we missed something.
Do not forget to subscribe to our newsletter on the right to never miss an update.
Do not forget to subscribe to our newsletter on the right to never miss an update.
20.1.20
How to use Columnize by Key/Value
Based on the Wordpress export file in XML format shared by Adam K on the user mailing list the column item - wp:postmeta - wp:meta_key and item - wp:postmeta - wp:meta_value store data in a key:value format. The column meta_key indicates the value type and the column meta_value store the field value.
Subscribe to:
Posts (Atom)