Multilingual programming for retrieving web pages

Functional Node.js

If you want to retrieve a URL in a snippet of JavaScript in the browser, or do something similar in Node.js code on the server side, for example, on an Amazon Lamda Server [2], you need to toggle your brain to functional programming mode. After all, event-based systems do not follow the paradigm of "Do this, wait until it is finished, then do that." Instead, they want to receive their instructions in the form of "Do this, then this, then this… and go."

The reason for this is the event loop, which can only perform short callbacks and then wants the control back. It then drops in again when the data slowly flutters in from external interfaces. This structure complicates the readability of your code and requires much experience in the design of software components so that they interact well and in an easily maintainable way.

The dreaded pyramid of doom [3], composed of nested callbacks, can be resolved by several helper constructs. Node 7.6 now even comes with support for the async and await keywords, which force asynchronous code into a synchronous straightjacket to make things look tidier [4].

Listing 4 shows a get call of the HTTP module in Node.js. In addition to the URL for the web document, it expects a function. This is called later with a response object and defines a closure with a variable (content) and three callbacks for the events data, error, and end.

Listing 4

http-get.js

 

The data event gets triggered whenever a bunch of data arrives from the server. It collects the data chunks one by one and reassembles them in the content variable. The error callback gets involved in case of an error and writes the reason to the log in Line 11. When the server signals the end of the transmission, the event loop jumps to the end callback, which in line 15 outputs the content of content, where all the body data in the HTTP response is now located. The Node.js http library automatically follows redirects.

Good Old Perl

Good Old Perl traditionally retrieves web documents with the CPAN LWP::UserAgent module. SSL support is not automatic but gets magically added if the admin retroactively installs the CPAN LWP::Protocol::https module, which depends on the availability of an OpenSSL installation and a list of root certificates.

Listing 5 shows also a peculiarity as well as correct error handling: Like some other libraries presented here, it automatically follows redirects and identifies the encoding of google.de as ISO-8859-1, but it returns a UTF-8 string from decoded_content() (as opposed to content()). That is a good thing, because processing the data in the program code often relies on UTF-8 and otherwise causes ugly-looking mangled text problems.

Listing 5

http-get.pl

 

To output a UTF-8 string as such without modification using print, the script first needs to tell stdout to select on UTF-8 mode with the help of binmode. This rather elaborate procedure is owed to compatibility reasons and at least ensures that old scripts from the early days of Perl's UTF-8 support don't freak out when they meet the new versions of Perl.

Yeah, old age is not a piece of cake, when all of your joints are aching and the young folks are turning somersaults!

Infos

  1. Listings for this article: ftp://ftp.linux-magazine.com/pub/listings/magazine/201
  2. "Equipping Alexa with Self-Programmed Skills" by Michael Schilli, Linux Magazine, issue 199, June 2017: http://www.linux-magazine.com/Issues/2017/199/Programming-Snapshot-Alexa
  3. "Pyramid of Doom" by Mike Schilli, Linux Magazine, issue 170, January 2015: http://www.linux-magazine.com/Issues/2015/170/Perl-Asynchronous-Code
  4. "Node 7.6 Brings Default Async/Await Support" by Sergio De Simone: https://www.infoq.com/news/2017/02/node-76-async-await

The Author

Mike Schilli works as a software engineer in the San Francisco Bay area of California. In his column, launched back in 1997, he focuses on short projects in Perl and various other languages. You can contact Mike at mailto:mschilli@perlmeister.com.

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Better Safe than Sorry

    Developers cannot avoid unit testing if they want their Go code to run reliably. Mike Schilli shows how to test even without an Internet or database connection, by mocking and injecting dependencies.

  • Simultaneous Runners

    In the Go language, program parts that run simultaneously synchronize and communicate natively via channels. Mike Schilli whips up a parallel web fetcher to demonstrate the concept.

  • Ruby Web Spiders

    Ruby is a very elegant language,and it’s harmonious – the parts work together effectively. Ruby also significantly reduces a developer’s burden. We’ll show you how to use Ruby to build a quick and simple web spider application.

  • Building an IRC Bot

    Chat rooms aren't just for people. We'll show you how to access an IRC channel using an automated bot.

  • Elixir 1.0

    Developers will appreciate Elixir's ability to build distributed, fault-tolerant, and scalable applications.

comments powered by Disqus

Direct Download

Read full article as PDF:

News