Improving performance of Linux on ARM
Assembly Line
![](/var/linux_magazin/storage/images/issues/2014/166/doghouse-performance-improvements/maddog_1.png/619822-1-eng-US/Maddog_1.png_medium.png)
"maddog" looks at some of Linaro's efforts to improve GNU/Linux performance on ARM architectures.
For the past several months, I have been working with Linaro [1], an association of companies who want to see GNU/Linux working well on ARM architectures. Although ARM Holdings designs the ARM architecture chips, various other companies manufacture the CPUs, GPUs, and SoCs (Systems on a Chip) from ARM's licensed designs. Some of these companies use these manufactured units in their own products, and some sell the manufactured units to other companies and to the general public. For the past couple of years, ARM has been working on a 64-bit chip, and their licensees are getting close to having ARM 64-bit hardware ready.
One of the ARM engineers determined that 1,400 different source code modules in either Ubuntu or Fedora (or both) have assembly language in the code. This is not to say that the assembly language (or lack of it) will stop the module from working on the ARM64 system, because there may be higher level fallback code (e.g., code written in C) that will take over and be compiled for the missing ARM64 assembly language. However, the modules have not been tested and verified either on actual hardware or on the emulators for the ARM64 architecture that currently exist. Thus, Linaro decided to enlist the community in porting some of these modules and has created a contest with prizes for those people who help out [2].
The engineers also noticed that a lot of the code containing assembly language was fairly old. It was designed in an age when systems had a single CPU; CPUs were much slower, with a single core; memory was measured in megabytes, not gigabytes; Ethernet was 10Mbps, not 1,000Mbps; and the GNU compilers were not as good at optimization as commercial compilers. Therefore, people wrote assembler for the tightest, fastest parts of the system.
If those programs were written today, however, they might have a lot less assembly language, and the code would be more portable. Thus, the contest was expanded to include improving the performance of these modules and (perhaps) eliminating some of the old assembly language where it made sense.
Embedded systems exemplify how our perspective of "performance" has changed over time, in that the size of the memory footprint is often a measure of performance, with a small footprint providing savings in the manufacturing process. Extended battery life, achieved by allowing parts of the system to be turned off after the application is finished, also represents an improvement in performance. In large server farms, performance is often measured in electricity savings, savings on cooling, or in reduction of equipment purchases and floor space.
In the early years of my programming career, my job was not to write new functionality but to get other people's programs to work "better." My manager told me that if I could not get the application to work in half the time, not to bother with it. In almost every case, I could make an application run not only in half the time, but often five to 10 times faster. It was a very satisfying job, so it has been interesting to start investigating new techniques for profiling code, finding the bottlenecks, and seeing new performance improvements and efficiencies that can be made since I did this work 30 years ago.
At the same time, I am working with some very small systems that have some really interesting features. The use of GPUs for computation, digital signal processing chips, and field-programmable gate arrays (FPGAs) were all conceptual years ago, but they were cost and space prohibitive. These concepts now have become not only feasible but even competitive in price/performance with other, more "mainstream" types of circuitry.
A board from a company called Adapteva not only has a SoC with a two-core ARM processor, FPGA, and digital signal processing chips, it also has a 16- or 64-core CPU. All of this, plus some system memory and USB ports, comes on a board in the US$ 100-150 price range [3]. The opportunity to learn about these architectures has now become practical.
Recently some people attracted a lot of attention by building a "supercomputer" out of a Raspberry Pi, a single-core system that does not invite the type of programming that might occur in a real HPC system. In an HPC system, each board can have several CPUs or several cores in a single CPU and use OpenMP in conjunction with MPI and other heterogeneous computing environments. Substituting computers such as the Banana Pi [4] or ODROID-U3 [5] would create a higher performing "supercomputer" at a reasonable increase in price and would afford a more realistic mix of programming styles.
I encourage readers to sign up for Linaro's contest and help GNU/Linux be the best that it can be.
The author
Jon "maddog" Hall is an author, educator, computer scientist, and free software pioneer who has been a passionate advocate for Linux since 1994 when he first met Linus Torvalds and facilitated the port of Linux to a 64-bit system. He serves as president of Linux International®.
Infos
- Linaro: http://linaro.org
- Linaro Performance Challenge: http://performance.linaro.org
- Parallela: http://www.adapteva.com/parallella/
- Banana Pi: http://www.banana-pi.org
- ODROID-U3: http://hardkernel.com/main/products/prdt_info.php
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
![Learn More](https://www.linux-magazine.com/var/linux_magazin/storage/images/media/linux-magazine-eng-us/images/misc/learn-more/834592-1-eng-US/Learn-More_medium.png)
News
-
NVIDIA Released Driver for Upcoming NVIDIA 560 GPU for Linux
Not only has NVIDIA released the driver for its upcoming CPU series, it's the first release that defaults to using open-source GPU kernel modules.
-
OpenMandriva Lx 24.07 Released
If you’re into rolling release Linux distributions, OpenMandriva ROME has a new snapshot with a new kernel.
-
Kernel 6.10 Available for General Usage
Linus Torvalds has released the 6.10 kernel and it includes significant performance increases for Intel Core hybrid systems and more.
-
TUXEDO Computers Releases InfinityBook Pro 14 Gen9 Laptop
Sporting either AMD or Intel CPUs, the TUXEDO InfinityBook Pro 14 is an extremely compact, lightweight, sturdy powerhouse.
-
Google Extends Support for Linux Kernels Used for Android
Because the LTS Linux kernel releases are so important to Android, Google has decided to extend the support period beyond that offered by the kernel development team.
-
Linux Mint 22 Stable Delayed
If you're anxious about getting your hands on the stable release of Linux Mint 22, it looks as if you're going to have to wait a bit longer.
-
Nitrux 3.5.1 Available for Install
The latest version of the immutable, systemd-free distribution includes an updated kernel and NVIDIA driver.
-
Debian 12.6 Released with Plenty of Bug Fixes and Updates
The sixth update to Debian "Bookworm" is all about security mitigations and making adjustments for some "serious problems."
-
Canonical Offers 12-Year LTS for Open Source Docker Images
Canonical is expanding its LTS offering to reach beyond the DEB packages with a new distro-less Docker image.
-
Plasma Desktop 6.1 Released with Several Enhancements
If you're a fan of Plasma Desktop, you should be excited about this new point release.