Multilingual programming for retrieving web pages

Tower of Babylon

© Lead Image © Maksym Shevchenko,

© Lead Image © Maksym Shevchenko,

Article from Issue 201/2017

We show you how to whip up a script that pulls an HTTP document off the web and how to find out which language offers the easiest approach.

Few programming tasks illuminate the differences between commonly used languages as clearly as that of retrieving a web document. When it comes to shell scripts, admins often turn to the curl utility, which transfers the data behind a URL without much ado and sends them to the standard output.

But, what if the URL points to a black hole? Or the server denies access? And what if the server returns a redirect? For example, curl does not return the expected HTML page with the search form but just a note that the desired page may be available on Armed with the -L option, however, curl follows the reference and then returns the data from the source it finds there.

What happens with a huge file like a 4K movie containing many gigabytes of data? Will the process exhaust your RAM because it attempts to swallow everything in a single gulp? Does encryption work automatically for an HTTPS URL using the SSL protocol, and does the utility check the server's certificate correctly so that it does not fall victim to a man-in-the-middle attack? Similar to good old curl, popular programming languages offer all of this, although often only as an add-on package and often requiring quirky approaches.


Use Express-Checkout link below to read the full article (PDF).

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Ruby Web Spiders

    Ruby is a very elegant language,and it’s harmonious – the parts work together effectively. Ruby also significantly reduces a developer’s burden. We’ll show you how to use Ruby to build a quick and simple web spider application.

  • Building an IRC Bot

    Chat rooms aren't just for people. We'll show you how to access an IRC channel using an automated bot.

  • Perl: Barcodes

    Barcodes efficiently speed us through supermarket checkout lines, but the technology is also useful for totally different applications. An inexpensive barcode scanner can help you organize your private library, CD, or DVD collection.

  • Elixir 1.0

    Developers will appreciate Elixir's ability to build distributed, fault-tolerant, and scalable applications.

  • Perl: Network Monitoring

    To discover possibly undesirable arrivals and departures on their networks, a Perl daemon periodically stores the data from Nmap scans and passes them on to Nagios via a built-in web interface.

comments powered by Disqus

Direct Download

Read full article as PDF: