Zack's Kernel News

Zack's Kernel News

Article from Issue 178/2015

Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

Parallel RAM Initialization

Mel Gorman recently coded up some patches to initialize RAM chips in parallel, rather than sequentially, during bootup. This feature makes virtually no difference to regular desktop systems, as these tend to boot very quickly in any case, and as Mel pointed out in his announcement, an earlier attempt at a similar patch set was nixed because it slowed things down for regular users. His new patches, however, migrated the feature to a different execution path that might be more acceptable. Instead of doing memory initialization in the page allocator code, Mel's new patches did it in the kswapd code. This way, each CPU would have only a single thread dedicated to memory initialization, thus minimizing the cost to smaller systems.

Waiman Long replied, saying he had access to a system with 12TB of RAM, and had tested Mel's patches. He observed that boot time went from 404 seconds to 298 – about a 25% reduction. Apparently a success! However, Mel and Peter Zijlstra both wanted to know whether those 106 seconds of savings actually mattered to Waiman. Maybe it didn't really matter if a big system took six and a half minutes to boot, or only five.

Waiman replied, "Booting 100s faster is certainly something that is nice to have. Right now, more time is spent in the firmware POST portion of the bootup process than in the OS boot. So I would say this patch isn't really critical right now as machines with that much memory are relatively rare. However, if we look forward to the near future, some new memory technology like persistent memory is coming and machines with large amount of memory (whether persistent or not) will become more common. This patch will certainly be useful if we look forward into the future."

Mel replied that "100 seconds off kernel init time is a starting point. I can try pushing it on on that basis but I really would like to see SGI and Intel people also chime in on how it affects their really large machines."

Scott J. Norton from HP said, "Yes, 100 seconds really does matter and is a big deal. When a business has one of these large machines go down their business is stopped (unless they have a fast failover solution in place). Every minute and second the machine is down is crucial to these businesses. The fact that POST times can be so long make it even more important that we make the kernel boot as fast as possible."

Andrew Morton came into the discussion at this point, suggesting possible alternative implementations that would make Mel's code smaller. Mel, however, thought that Andrew's alternatives either would end up being just as complex, though in different ways, or else would do things such as forcing the kernel to rely on userspace operations, which made Mel wince with revulsion.

These are the sorts of patches that may hit stumbling blocks along the way to acceptance into the mainstream kernel. As valuable as the patches may be for large systems, those systems are still rare, and so any penalties for regular users or for kernel maintainability can end up trumping the usefulness of the patch to its targeted set of users. Ultimately, though, "regular" systems will start to have more and more RAM, and something along the lines of Mel's patches will probably stop being a specialty item and become a real necessity.

Coding Memory Protection Features for Future CPUs

Dave Hansen posted some patches from deep within Intel's development team – he said it wouldn't run for anyone outside Intel, and even the development team could only run it using software simulations of future Intel chips. He wanted some advice from the Linux folks, because his team's efforts had reached the stage where they might have an effect on user interfaces. He wanted to make sure they got it right.

Specifically, his team was working on Memory Protection Keys for Userspace, which he said, "provides a mechanism for enforcing page-based protections, but without requiring modification of the page tables when an application changes protection domains. It works by dedicating 4 previously ignored bits in each page table entry to a "protection key," giving 16 possible keys."

Dave added, "There is also a new user-accessible register (PKRU) with two separate bits (Access Disable and Write Disable) for each key. As a CPU register, PKRU is inherently thread-local, potentially giving each thread a different set of protections from every other thread." He concluded, "There are two new instructions (RDPKRU/WRPKRU) for reading and writing to the new register. The feature is only available in 64-bit mode, even though there is theoretically space in the PAE PTEs. These permissions are enforced on data access only and have no effect on instruction fetches."

Ingo Molnár asked Dave to give some possible use cases for these features, and Dave replied:

There are lots of things that folks would _like_ to mprotect(), but end up not being feasible because of the overhead of going and mucking with thousands of PTEs and shooting down remote TLBs every time you want to change protections.

Data structures like logs or journals that are only written to in very limited code paths, but that you want to protect from "stray" writes.

Maybe even a database where a query operation will never need to write to memory, but an insert would. You could keep the data R/O during the entire operation except when an insert is actually in progress. It narrows the window where data might be corrupted. This becomes even more valuable if a stray write to memory is guaranteed to hit storage … like with persistent memory.

Someone mentioned to me that valgrind does lots of mprotect()s and might benefit from this.

We could keep heap metadata as R/O and only make it R/W inside of malloc() itself to catch corruption more quickly.

Alan Cox added: "You can also use it for certain types of emulator trickery, and I suspect even for things like interpreters and controlling access to 'tainted' values. Other obvious uses are making it a shade harder for SSL or ssh type errors to leak things like key data by reducing the damage done by out of bound accesses."

Ingo wasn't 100% convinced by these use cases, though he acknowledged that "The Valgrind usecase looks somewhat legit, albeit not necessarily for multithreaded apps: there you generally really want protection changes to be globally visible, such as publishing the effects of free() or malloc()."

Ingo also asked, "will apps/libraries bother if it's not a standard API and if it only runs on very fresh CPUs?" Dave replied: "It's always a problem with new CPU features. I've thought a bit about trying to "emulate" the feature on older CPUs using good ol' mprotect() so that we could have an API that folks can use _today_, but that would get magically fast on future CPUs. But, the problem with that is the thread-local aspect. mprotect() is fundamentally process-wide and protection keys right are fundamentally thread-local. Those things are going to be hard to reconcile unless we do something slightly extreme like having per-thread page tables."

At this point, the discussion grew more technical, focusing on more specific use cases, and the true security value of the feature. For me, it's still amazing to see Intel working so closely with kernel developers at this stage of hardware design. It's not just that Linux has been legitimized to the extent that vendors recognize the need to take it seriously; it's that vendors are working more publicly on products that haven't been released yet. It's a spirit of open source development that may seem ubiquitous these days, but that certainly took a very long time to achieve.

Support for Signed Firmware

Linux already supports cryptographic module signatures so that only trusted modules can be loaded, but Luis R. Rodriguez posted some patches to reuse a lot of that code to support firmware signing as well. Luis also expected that, in the future, a similar set of patches would also support signing user data.

There was no real discussion or controversy about Luis's patches. Apparently, they are a natural extension of features that are already supported, although he did say that there would be certain differences between the current feature and the existing module signing code. For example, firmware signatures would require a separate file to contain the signature. He explained that this simply made it easier to handle licensing issues.

Luis also said that, like module signing, his code implemented a config option and boot parameter to set how permissive the kernel would be toward unsigned firmware. But, he said, "Contrary to module signing we do not taint the kernel in the permissive fw signing mode due to restrictions on the firmware_class API, extensions to enable this are expected however in the future."

The biggest criticism of Luis's code came from David Howells, who suggested some alternative wording for the kconfig text, which Luis adopted word for word. Aside from that, there seemed to be no objection whatsoever. Presumably the future patch set, providing similar support for signing user data, would likewise sail through to the official tree.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Kernel News

    Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

  • Kernel News

    Zack Brown reports on container-aware cgroups, a different type of RAM chip on a single system, new SARA security framework, and improving GPIO interrupt handling.

  • Kernel News

    Zack Brown reports on: Line Ending Issues; Hardware Hinting; and Simplifying the Command Line.

  • Kernel News

    Zack Brown discusses implementing digital rights management in-kernel, improving lighting controls, and updating printk().

  • Kernel News

    This month in Kernel News: Dealing with Older GCC Versions; and On-boarding New Kernel Hackers.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More