Multilingual programming for retrieving web pages
Functional Node.js
If you want to retrieve a URL in a snippet of JavaScript in the browser, or do something similar in Node.js code on the server side, for example, on an Amazon Lamda Server [2], you need to toggle your brain to functional programming mode. After all, event-based systems do not follow the paradigm of "Do this, wait until it is finished, then do that." Instead, they want to receive their instructions in the form of "Do this, then this, then this… and go."
The reason for this is the event loop, which can only perform short callbacks and then wants the control back. It then drops in again when the data slowly flutters in from external interfaces. This structure complicates the readability of your code and requires much experience in the design of software components so that they interact well and in an easily maintainable way.
The dreaded pyramid of doom [3], composed of nested callbacks, can be resolved by several helper constructs. Node 7.6 now even comes with support for the async
and await
keywords, which force asynchronous code into a synchronous straightjacket to make things look tidier [4].
Listing 4 shows a get
call of the HTTP module in Node.js. In addition to the URL for the web document, it expects a function. This is called later with a response object and defines a closure with a variable (content
) and three callbacks for the events data
, error
, and end
.
Listing 4
http-get.js
The data
event gets triggered whenever a bunch of data arrives from the server. It collects the data chunks one by one and reassembles them in the content
variable. The error
callback gets involved in case of an error and writes the reason to the log in Line 11. When the server signals the end of the transmission, the event loop jumps to the end
callback, which in line 15 outputs the content of content
, where all the body data in the HTTP response is now located. The Node.js http
library automatically follows redirects.
Good Old Perl
Good Old Perl traditionally retrieves web documents with the CPAN LWP::UserAgent module. SSL support is not automatic but gets magically added if the admin retroactively installs the CPAN LWP::Protocol::https module, which depends on the availability of an OpenSSL installation and a list of root certificates.
Listing 5 shows also a peculiarity as well as correct error handling: Like some other libraries presented here, it automatically follows redirects and identifies the encoding of google.de as ISO-8859-1
, but it returns a UTF-8 string from decoded_content()
(as opposed to content()
). That is a good thing, because processing the data in the program code often relies on UTF-8 and otherwise causes ugly-looking mangled text problems.
Listing 5
http-get.pl
To output a UTF-8 string as such without modification using print
, the script first needs to tell stdout to select on UTF-8 mode with the help of binmode
. This rather elaborate procedure is owed to compatibility reasons and at least ensures that old scripts from the early days of Perl's UTF-8 support don't freak out when they meet the new versions of Perl.
Yeah, old age is not a piece of cake, when all of your joints are aching and the young folks are turning somersaults!
Infos
- Listings for this article: ftp://ftp.linux-magazine.com/pub/listings/magazine/201
- "Equipping Alexa with Self-Programmed Skills" by Michael Schilli, Linux Magazine, issue 199, June 2017: http://www.linux-magazine.com/Issues/2017/199/Programming-Snapshot-Alexa
- "Pyramid of Doom" by Mike Schilli, Linux Magazine, issue 170, January 2015: http://www.linux-magazine.com/Issues/2015/170/Perl-Asynchronous-Code
- "Node 7.6 Brings Default Async/Await Support" by Sergio De Simone: https://www.infoq.com/news/2017/02/node-76-async-await
« Previous 1 2
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Plasma 6.3 Ready for Public Beta Testing
Plasma 6.3 will ship with KDE Gear 24.12.1 and KDE Frameworks 6.10, along with some new and exciting features.
-
Budgie 10.10 Scheduled for Q1 2025 with a Surprising Desktop Update
If Budgie is your desktop environment of choice, 2025 is going to be a great year for you.
-
Firefox 134 Offers Improvements for Linux Version
Fans of Linux and Firefox rejoice, as there's a new version available that includes some handy updates.
-
Serpent OS Arrives with a New Alpha Release
After months of silence, Ikey Doherty has released a new alpha for his Serpent OS.
-
HashiCorp Cofounder Unveils Ghostty, a Linux Terminal App
Ghostty is a new Linux terminal app that's fast, feature-rich, and offers a platform-native GUI while remaining cross-platform.
-
Fedora Asahi Remix 41 Available for Apple Silicon
If you have an Apple Silicon Mac and you're hoping to install Fedora, you're in luck because the latest release supports the M1 and M2 chips.
-
Systemd Fixes Bug While Facing New Challenger in GNU Shepherd
The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.