Screen scraping with Colly in Go

Programming Snapshot – Colly

Lead Image © Hannu Viitanen, 123RF.com

Article from Issue 223/2019

Author(s): Mike Schilli

The Colly scraper helps developers who work with the Go programming language to collect data off the web. Mike Schilli illustrates the capabilities of this powerful tool with a few practical examples.

As long as there are websites to view for the masses of browser customers on the web, there will also be individuals on the consumer side who want the data in a different format and write scraper scripts to automatically extract the data to fit their needs.

Many sites do not like the idea of users scraping their data. Check the website's terms of service for more information, and be aware of the copyright laws for your jurisdiction. In general, as long as the scrapers do not republish or commercially exploit the data, or bombard the website too overtly with their requests, nobody is likely to get too upset about it.

Different languages offer different tools for this. Perl aficionados will probably appreciate the qualities of WWW::Mechanize as a scraping tool, while Python fans might prefer the selenium package [1]. In Go, there are several projects dedicated to scraping that attempt to woo developers.

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Download Article PDF now with Express Checkout

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subscriptions

Digital Subscriptions

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Introducing matrixOS, an Immutable Gentoo-Based Linux Distro

Gentoo Linux , matrixOS , Operating Systems

It was only a matter of time before a developer decided one of the most challenging Linux distributions needed to be immutable.
Chaos Comes to KDE in KaOS

KDE , Plasma

KaOS devs are making a major change to the distribution, and it all comes down to one system.
New Linux Botnet Discovered

botnet , Security

The SSHStalker botnet uses IRC C2 to control systems via legacy Linux kernel exploits.
The Next Linux Kernel Turns 7.0

Encryption , Kernel

Linus Torvalds has announced that after Linux kernel 6.19, we'll finally reach the 7.0 iteration stage.
Linux From Scratch Drops SysVinit Support

Linux From Scratch , Systemd

LFS will no longer support SysVinit.
LibreOffice 26.2 Now Available

libreoffice , office suite , open source

With new features, improvements, and bug fixes, LibreOffice 26.2 delivers a modern, polished office suite without compromise.
Linux Kernel Project Releases Project Continuity Document

Kernel , Linux , Linux Foundation

What happens to Linux when there's no Linus? It's a question many of us have asked over the years, and it seems it's also on the minds of the Linux kernel project.
Mecha Systems Introduces Linux Handheld

Fedora , Hardware , Linux

Mecha Systems has revealed its Mecha Comet, a new handheld computer powered by – you guessed it – Linux.
MX Linux 25.1 Features Dual Init System ISO

Desktop , MX Linux , Systemd

The latest release of MX Linux caters to lovers of two different init systems and even offers instructions on how to transition.
Photoshop on Linux?

graphics , Linux , Software

A developer has patched Wine so that it'll run specific versions of Photoshop that depend on Adobe Creative Cloud.

Screen scraping with Colly in Go

Programming Snapshot – Colly

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

Introducing matrixOS, an Immutable Gentoo-Based Linux Distro

Chaos Comes to KDE in KaOS

New Linux Botnet Discovered

The Next Linux Kernel Turns 7.0

Linux From Scratch Drops SysVinit Support

LibreOffice 26.2 Now Available

Linux Kernel Project Releases Project Continuity Document

Mecha Systems Introduces Linux Handheld

MX Linux 25.1 Features Dual Init System ISO

Photoshop on Linux?

Screen scraping with Colly in Go

Programming Snapshot – Colly

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters