Zack's Kernel News

Zack's Kernel News

Article from Issue 260/2022

Zack Brown reports on taking a header and the next Spectre vulnerability.

Taking a Header

In the Winter of 2020, Ingo Molnar decided that something simply had to be done to make everyone's life better. He reached into his ultimate sack of horrible things and pulled out the Linux kernel header hierarchy. This oozing nightmare consisted of all the header files in the kernel source tree, one depending upon the other in an endless glutinous web that could be neither untangled nor untied and that all kernel sub-projects simply glom onto, forming endless sticky layers upon which the fate of humanity truly does depend.

So Ingo untangled and untied it using determination and strange gifts. Then recently he submitted a patch, consisting of over 25 sub-trees, with over 2,200 individual commits, changing more than half of all source files in the entire kernel tree. He said, "As most kernel developers know, there's around ~10,000 main .h headers in the Linux kernel, in the include/ and arch/*/include/ hierarchies. Over the last 30+ years they have grown into a complicated & painful set of cross-dependencies we are affectionately calling 'Dependency Hell'."

He offered his patch to the world, calling it the Fast Kernel Headers project. According to his tests, it would cut kernel compile times down to as much as one fifth of what they had been. Incremental compile times – where files compiled earlier don't need to be recompiled – were even more drastically improved. The oozing web had become a delicate lace – or at least less hellish.

Ingo explained:

"When I started this project, late 2020, I expected there to be maybe 50-100 patches. I did a few crude measurements that suggested that about 20% build speed improvement could be gained by reducing header dependencies, without having a substantial runtime effect on the kernel. Seemed substantial enough to justify 50-100 commits.

"But as the number of patches increased, I saw only limited performance increases. By mid-2021 I got to over 500 commits in this tree and had to throw away my second attempt (!); the first two approaches simply didn't scale, weren't maintainable and barely offered a 4% build speedup, not worth the churn of 500 patches and not worth even announcing.

"With the third attempt I introduced the per_task() machinery which brought the necessary flexibility to reduce dependencies drastically, and it was a type-clean approach that improved maintainability. But even at 1,000 commits I barely got to a 10% build speed improvement. Again this was not something I felt comfortable pushing upstream, or even announcing. :-/

"But the numbers were pretty clear: 20% performance gains were very much possible. So I kept developing this tree, and most of the speedups started arriving after over 1,500 commits, in the fall of 2021. I was very surprised when it went beyond 20% speedup and more, then arrived at the current 78% with my reference config. There's a clear super-linear improvement property of kernel build overhead, once the number of dependencies is reduced to the bare minimum."

He went on, "the size of the 'default' headers (which with the fast-headers tree will mostly include type definitions), has been reduced by 1-2 orders of magnitude. Much of the build speed improvement is due to these reductions."

And finally, Ingo said, "so this is probably the largest single feature announcement in LKML's history. Not by choice! :-/ For this reason this tree is an RFC announcement, and I'd like to gather feedback from fellow maintainers about the structure of tree(s) before pushing for an upstream merge."

Greg Kroah-Hartman was highly impressed and offered a few technical suggestions. But he said, "I took a glance at the tree, and overall it looks like a lot of nice cleanups. Most of these can probably go through the various subsystem trees, after you split them out, for the 'major' .h cleanups."

Ingo replied:

"I absolutely plan on doing that too:

- About ~70% of the commits can be split up & parallelized through maintainer trees.

- With the exception of the untangling of sched.h, per_task and the "Optimize Headers" series, where a lot of patches are dependent on each other. These are actually needed to get any measurable benefits from this tree (!). We can do these through the scheduler tree, or through the dedicated headers tree I posted.

"The latter monolithic series is pretty much unavoidable; it's the result of 30 years of coupling a lot of kernel subsystems to task_struct via embedded structs & other complex types, that needed quite a bit of effort to untangle, and that untangling needed to happen in-order."

And Greg affirmed, "Yes, taking the majority through the maintainer trees and then doing the remaining bits in a single tree seems sane; that one tree will be easier to review as well."

Nathan Chancellor was also gobsmacked by Ingo's work. He ran some tests and saw an 18-35 percent speed improvement on his 80-core ARM64 server. Ingo replied, "Note that on ARM64 the elapsed time improvement is 'only' 18-35%, because the triple-linking of vmlinux serializes much of the of a build & ARM64 doesn't have the kallsyms-objtool feature yet." But he felt there was a lot of room for improvement on ARM architectures. Ingo added, "In the end I think the improvement could probably [be] moved into the broad 60-70% range that I see on x86."

Nathan offered a bunch of code suggestions and patches, which Ingo accepted gratefully, and the two of them had a technical discussion about remaining issues. Ingo remarked, "Your testing & patch sending efforts are much appreciated!! You'd help me most by continuing on the same path with new fast-headers releases as well, whenever you find the time. :-)"

Willy Tarreau also replied to Ingo's initial announcement, saying, "great work! I'm particularly interested in this work because I went through a similar process about 6 months ago in haproxy and saved 40-45% build time, and thought how well the same principles could apply to the kernel if anyone had felt brave enough to engage into that. I do appreciate how tedious a work it can be and do really sympathize with you on this!"

Nick Desaulniers also had some technical comments on Ingo's code, adding:

"This is a really cool series Ingo. I'm sure Arnd has seen it by now, but Arnd has been thinking about this area a lot, too. I haven't but I have played with running 'include what you use' on the kernel sources; Kconfig being the biggest impediment to that approach.

"To me, I'm most nervous about 'backsliding;' let's say this work lands, at some point probably years in the future, I assume without any form of automation that we might find ourselves at a similar point of header dependencies getting all tangled again.

"What are your thoughts on where/how/what we could automate to try to help developers in the future keep their header dependencies simpler? (Sorry if this was already answered in the cover letter.)

"It would be really useful if you were planning a talk at something like plumbers [Linux Plumbers Conference] how you go about making these changes. I really hope once others understand your workflow that we might help with some form of automation. Nice work!"

And Arnd Bergmann, nearby, also said to Ingo, "I've done some work in this area in the past, didn't quite take it enough of the way to get this far. The best I saw was 30% improvement with clang, which tends to be more sensitive than gcc towards header file bloat, as it does more detailed syntax checking before eliminating dead code."

The whole conversation grew into a large implementation discussion, with everyone chiming in. A few days later, Ingo announced version two of the patch, which had grown from "over 2,200 commits" to "over 2,300 commits" since version 1 and offered an even more impressive speed improvement over the official kernel. The implementation discussion continued, mostly between Ingo and Arnd.

This kind of lunacy happens from time to time – someone decides to tackle one of the ancient kernel nightmares, like the big kernel lock, or fixing all build-time warnings, or cleaning up the header hierarchy, and suddenly, in this sphere at least, the world is a brighter, happier place.

The Next Spectre Vulnerability

Recently a new Spectre-like security vulnerability was uncovered in a variety of CPU architectures. It's a double annoyance because, firstly, vulnerabilities must be patched, and secondly, the workaround likely involves some amount of runtime performance hit.

This time around, all of a sudden there was a gigantic blizzard of Spectre patches coming into the kernel. Among them, dear to my heart, were a bunch of documentation patches. One of these, from Tim Chen and Andi Kleen, explained the nature and risks associated with Spectre, the ways to mitigate the security problems, and how to use the sysfs files relevant to dealing with Spectre.

In the document, Tim and Andi explain, "Spectre is a class of side channel attacks that exploit branch prediction and speculative execution on modern CPUs to read memory, possibly bypassing access controls. Speculative execution side channel exploits do not modify memory but attempt to infer privileged data in the memory."

They went on to say:

"Speculative execution side channel methods affect a wide range of modern high performance processors, since most modern high speed processors use branch prediction and speculative execution.

"The following CPUs are vulnerable:

- Intel Core, Atom, Pentium, and Xeon processors

- AMD Phenom, EPYC, and Zen processors

- IBM POWER and zSeries processors

- Higher end ARM processors

- Apple CPUs

- Higher end MIPS CPUs

- Likely most other high performance CPUs. Contact your CPU vendor for details.

"Whether a processor is affected or not can be read out from the Spectre vulnerability files in sysfs."

The document went on to say:

"CPUs use speculative operations to improve performance. That may leave traces of memory accesses or computations in the processor's caches, buffers, and branch predictors. Malicious software may be able to influence the speculative execution paths, and then use the side effects of the speculative execution in the CPUs' caches and buffers to infer privileged data touched during the speculative execution.

"Spectre variant 1 attacks take advantage of speculative execution of conditional branches, while Spectre variant 2 attacks use speculative execution of indirect branches to leak privileged memory."

They gave an example of a hostile user process attacking the kernel itself:

"The attacker passes a parameter to the kernel via a register or via a known address in memory during a syscall. Such parameter may be used later by the kernel as an index to an array or to derive a pointer for a Spectre variant 1 attack. The index or pointer is invalid, but bound checks are bypassed in the code branch taken for speculative execution. This could cause privileged memory to be accessed and leaked.

"For kernel code that has been identified where data pointers could potentially be influenced for Spectre attacks, new 'nospec' accessor macros are used to prevent speculative loading of data.

"Spectre variant 2 attacker can poison the branch target buffer (BTB) before issuing syscall to launch an attack. After entering the kernel, the kernel could use the poisoned branch target buffer on indirect jump and jump to gadget code in speculative execution.

"If an attacker tries to control the memory addresses leaked during speculative execution, he would also need to pass a parameter to the gadget, either through a register or a known address in memory. After the gadget has executed, he can measure the side effect.

"The kernel can protect itself against consuming poisoned branch target buffer entries by using return trampolines (also known as 'retpoline') for all indirect branches. Return trampolines trap speculative execution paths to prevent jumping to gadget code during speculative execution. x86 CPUs with Enhanced Indirect Branch Restricted Speculation (Enhanced IBRS) available in hardware should use the feature to mitigate Spectre variant 2 instead of retpoline. Enhanced IBRS is more efficient than retpoline."

The document also included detailed examples of a hostile user process attacking another user process, a virtualized system attacking the underlying kernel, and a virtualized system attacking another virtualized system.

There were not only documentation patches – most were actual code changes that fixed things. A few days after the patch storm began, Linus Torvalds announced Linux 5.17-rc8, saying:

"So last weekend, I thought I'd be releasing the final 5.17 today.

"That was then, this is now. Last week was somewhat messy, mostly because of embargoed patches we had pending with another variation of spectre attacks. And while the patches were mostly fine, we had the usual 'because it was hidden, all our normal testing automation didn't see it either'.

"And once the automation sees things, it tests all the insane combinations that people don't tend to actually use or test in any normal case, and so there was a (small) flurry of fixes for the fixes.

"None of this was really surprising, but I naÔvely thought I'd be able to do the final release this weekend anyway.

"And honestly, I considered it. I don't think we really have any pending issues that would hold up a release, but on the other hand we also really don't have any reason _not_ to give it another week with all the proper automated testing. So that's what I'm doing, and as a result we have an -rc8 release today instead of doing a final 5.17.

"There's a number of non-spectre things in here too, of course. Among other things, people finally chased down a couple of mislaid patches that had been on the regression list, so hopefully we have those all nailed down now too.

"And obviously there's all the usual random fixes in here too. But because of the spectre thing, about half of the -rc8 patch is architecture updates.

"That said, it's still a fairly _small_ half of the patch. It was not one of the 'big disaster' hw speculation things; it was mostly extending existing mitigations and reporting.

"Anyway, let's not keep the testing _just_ to automation – the more the merrier, and real-life loads are always more interesting than what the automation farms do. So please do give this last rc a quick try."

Several people replied about the non-Spectre patches, but there was no Spectre discussion in that particular thread.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Meltdown and Spectre

    The blatant security holes known as Meltdown and Spectre, which are built into the computer hardware, are likely to keep us busy for the next few years. How is the Linux community addressing this unexpected challenge?

  • NEWS

    This month in the news: Chromebooks support Debian applications, Opera embraces Snap for Linux, Canonical fixes boot failure issues in Ubuntu, weird unofficial LibreOffice version shows up in the Microsoft Store, new version of the Spectre vulnerability allows attack from the network, and SUSE sold for $2.5 Billion. 

  • Kernel News


  • Kernel News

    Improving Netfilter Efficiency; Protecting Memory from Malicious Modification; and Speeding Up Workarounds for Intel Security Flaws.

  • Kernel News

    Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.