Screen scraping with Colly in Go

Programming Snapshot – Colly

Lead Image © Hannu Viitanen, 123RF.com

Lead Image © Hannu Viitanen, 123RF.com

Article from Issue 223/2019
Author(s):

The Colly scraper helps developers who work with the Go programming language to collect data off the web. Mike Schilli illustrates the capabilities of this powerful tool with a few practical examples.

As long as there are websites to view for the masses of browser customers on the web, there will also be individuals on the consumer side who want the data in a different format and write scraper scripts to automatically extract the data to fit their needs.

Many sites do not like the idea of users scraping their data. Check the website's terms of service for more information, and be aware of the copyright laws for your jurisdiction. In general, as long as the scrapers do not republish or commercially exploit the data, or bombard the website too overtly with their requests, nobody is likely to get too upset about it.

Different languages offer different tools for this. Perl aficionados will probably appreciate the qualities of WWW::Mechanize as a scraping tool, while Python fans might prefer the selenium package [1]. In Go, there are several projects dedicated to scraping that attempt to woo developers.

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Simile

    The Simile project jump starts the semantic web with a collection of tools for extending semantic information to existing websites.

  • LibreOffice Macros

    in addition to its comprehensive tool set, Libreoffice packs a built-in Basic-like scripting language that can be used to automate repetitive tasks and extend the suite’s default functionality.

  • Format Writer Documents with Any Markup
  • LinkChecker

    LinkChecker helps you keep your site free of broken links.

  • Wave

    Part email, part instant messenger, part bulletin board, and part wiki, Google’s new communication platform promises to transform correspondence into conversation.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News