The ARM architecture – yesterday, today, and tomorrow


Because both mobile and desktop devices rarely need their full computing power, some power-saving mechanisms are being used that x86 CPUs also use in a simpler form. You can control the CPU clock speed for more energy efficiency and switch off individual cores if they are inactive. Since the energy consumption of small mobile devices is more important than that of PCs and laptops, both ARM and SoC manufacturers offer further options.

The Big-Little solution developed by ARM is based on the fact that ARM CPU cores offer different performance classes, each supporting the same instruction set. The Big-Little scheme forms blocks of cores with high processing power but low energy efficiency, alongside blocks of CPU cores with low processing performance but greater energy efficiency in the CPU (Figure 3). The model assumes that most mobile devices rarely demand maximum performance from the processor. Thus, temporary, exclusive use of slower but more energy efficient cores means significant savings and extends battery life.

Figure 3: The curve schematically shows the energy consumption of Big-Little in relation to performance.

Energy Efficiency

This use of an energy-efficient design requires software support; there are basically three approaches. The first approach uses the virtualization function of the system-on-chip. This virtualization relies on a hypervisor, which migrates all the computations of a block to another as the load changes and switches off any unused blocks completely.

This approach works flawlessly on Android systems. But it does not always make sense to migrate all running programs from the smaller to the larger cores, and vice versa. In many cases, processes generate only a small load, or the processes are non-time-critical background processes that do not justify the higher energy consumption of the larger cores.

The second approach thus only migrates applications between individual core pairs. Linux uses the existing frequency scaling (keyword cpufreq) technology by grouping the frequency ranges of the small and the large core in virtual frequency areas. Low virtual frequencies are consequently mapped to the small core, and higher ones to the large core. The operating system then automatically migrates an application to the core in whose range the selected frequency lies.

This approach also has disadvantages. On the one hand – in high-load situations – the system does not use all the physically available cores; on the other hand, the approach (like the one presented earlier) only makes sense with an identical number of small and large cores.

In fact, processors do no always have an identical number of small and large cores; the development prototype for Big-Little ARM, for instance, has three Cortex A7 cores and two Cortex A15 cores (Figure 4). In this case, a third approach is more useful: Make all the cores accessible to the user at the same time, simply allowing the Big-Little SoC to work as a multicore processor.

Figure 4: Architecture of the Big-Little test prototype with three Cortex A7 cores and two Cortex A15 cores.

This approach initially seems much easier than the first two but involves an issue that is much more difficult to implement: When an operating system distributes its applications across all the cores, it – in this case wrongly – assumes that all cores have the same computing power. This can cause the system to assign an unimportant background application to a fast core and an important application in the foreground to a slow one. The support of such asymmetric processor architectures in the operating systems is not a trivial problem, which is why there is currently no market-ready solution for Linux [5].

Plans for Big-Little

Various SoC manufacturers have announced implementations of the Big-Little architecture, but currently only Renesas Mobile and Samsung have concrete plans. Renesas Mobile intends to launch a SoC with two Cortex A7 and two Cortex A15 cores on the market this year, and Samsung has the Exynos 5 Octa chipset, which powers the new S4 Galaxy smartphone in some regions of the world. The Exynos 5 Octa is an eight-core processor, with four Cortex A7 and four Cortex A15 cores; however, until the latter part of 2013, it could only use four cores at a time. Now, with "heterogeneous multiprocessing," the Octa can use all eight cores at once.

NVidia is taking a different approach with its Tegra 3 SoC, featuring four Cortex A9, and its Tegra 4, with four Cortex A15 cores. In addition to the four Cortex A15 cores, both SoCs have one additional Cortex A9 or Cortex A15 core. These cores, dubbed by NVidia as the Companion Core, consist of different transistor types that require significantly less energy per cycle.

At the same time, the Companion Core is limited to a maximum clock speed far below that of other cores. In operations, in a style similar to the Big-Little approach, applications are only migrated to the Companion Core if only one main core is active and the load falls below a predefined threshold.

It is hard to say which is the best power-saving mechanism because this depends on factors such as the application scope, the architecture of the rest of the system, and efficient support by the operating system. The latter in particular is still in its infancy. All told, these mechanisms, combined with their energy-efficient architecture, mean that modern ARM SoCs are more energy efficient than most x86 processors with similar power.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Canonical Ports Ubuntu on ARM Platform

    End of last week ARM Ltd and Canonical Ltd announced that they would port Linux to the ARMv7 processor architecture. If all goes well, the two collaborating firms should provide further hardware manufacturers with the basis to develop new, energy-efficient mobile devices, especially for the popular netbooks and so-called hybrid computers.

  • Xeon 7300: Intel's First Quadruple-Core Processor Platform

    The Xeon 7300er processor family is Intel's first quad-core processor for multiple processor servers. The energy efficiency of the new processors differs depending on the speed with 2.93 GHz requiring 130 Watts compared to 50 Watts for a 1.86 GHz version.

  • RISC-V

    The open source RISC-V processor architecture is poised to shake up the processor industry. Thanks to the Qemu emulator, you can get to know the RISC-V without waiting for affordable hardware.

  • Chromebooks on Linux

    Chromebooks are firmly locked in the Google jail, but with the right know-how, you can break out of vendor lock-in and operate the devices with free software.

  • Programming the Cell

    The Cell architecPIture is finding its way into a vast range of computer systems – from huge supercomputers to inauspicious Playstation game consoles. We'll show you around the Cell and take a look at a sample Cell application.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More