Multilingual programming for retrieving web pages

Tower of Babylon

© Lead Image © Maksym Shevchenko, 123RF.com

© Lead Image © Maksym Shevchenko, 123RF.com

Article from Issue 201/2017
Author(s):

We show you how to whip up a script that pulls an HTTP document off the web and how to find out which language offers the easiest approach.

Few programming tasks illuminate the differences between commonly used languages as clearly as that of retrieving a web document. When it comes to shell scripts, admins often turn to the curl utility, which transfers the data behind a URL without much ado and sends them to the standard output.

But, what if the URL points to a black hole? Or the server denies access? And what if the server returns a redirect? For example, curl http://google.com does not return the expected HTML page with the search form but just a note that the desired page may be available on www.google.com. Armed with the -L option, however, curl follows the reference and then returns the data from the source it finds there.

What happens with a huge file like a 4K movie containing many gigabytes of data? Will the process exhaust your RAM because it attempts to swallow everything in a single gulp? Does encryption work automatically for an HTTPS URL using the SSL protocol, and does the utility check the server's certificate correctly so that it does not fall victim to a man-in-the-middle attack? Similar to good old curl, popular programming languages offer all of this, although often only as an add-on package and often requiring quirky approaches.

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Ruby Web Spiders

    Ruby is a very elegant language,and it’s harmonious – the parts work together effectively. Ruby also significantly reduces a developer’s burden. We’ll show you how to use Ruby to build a quick and simple web spider application.

  • Perl: Barcodes

    Barcodes efficiently speed us through supermarket checkout lines, but the technology is also useful for totally different applications. An inexpensive barcode scanner can help you organize your private library, CD, or DVD collection.

  • Elixir 1.0

    Developers will appreciate Elixir's ability to build distributed, fault-tolerant, and scalable applications.

  • Building an IRC Bot

    Chat rooms aren't just for people. We'll show you how to access an IRC channel using an automated bot.

  • Perl: Network Monitoring

    To discover possibly undesirable arrivals and departures on their networks, a Perl daemon periodically stores the data from Nmap scans and passes them on to Nagios via a built-in web interface.

comments powered by Disqus

Direct Download

Read full article as PDF:

News

njobs Europe
What:
Where:
Country:
Njobs Netherlands Njobs Deutschland Njobs United Kingdom Njobs Italia Njobs France Njobs Espana Njobs Poland
Njobs Austria Njobs Denmark Njobs Belgium Njobs Czech Republic Njobs Mexico Njobs India Njobs Colombia