Zack's Kernel News

Zack's Kernel News

Article from Issue 192/2016

Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

Intelligent Networking Policies

Kan Liang wanted to simplify Linux networking configuration to get better performance on a variety of workloads. The problem, he said, was that the default configuration options didn't really work well, and tweaking them properly took a lot of skill.

His approach was to create a set of policies that would provide hints to the kernel that would then be used to tweak the more fine-grained networking controls automatically. Kan's idea would involve per-socket, per-task, and per-device policies that the kernel would interpret to give near optimal performance.

Kan specified that his approach would not be an attempt to optimize networking performance fully, but that it would hopefully bring a system to 90 percent networking efficiency.

One question he anticipated was whether any concept of networking policies belonged in the kernel or in user space. He felt the kernel was the right place for his code, because it would need to handle requests from multiple users and could do the job more simply and efficiently from within the kernel.

He also pointed out that, as much as possible, the net policy code would rely on existing kernel infrastructure, rather than coding the entire thing separately; for example, it would interact with networking hardware using existing interfaces.

Stephen Hemminger replied, saying that he agreed on all points except the need to do this in kernel space. His argument was, whatever could go outside the kernel should go outside the kernel. Alexei Starovoitov also thought that Kan's code didn't belong in the kernel for the same reason.

Kan stuck to his point that the kernel implementation would be much simpler and easier, but there was no further discussion on the mailing list.

The problem as I see it is that there are lots of things that would be easier to do in the kernel. Kernel code is just inherently less encumbered than user code. If everything that was easier to do in the kernel were actually implemented in the kernel, Linux would be far more monstrously huge than it is now. It's only by insisting that absolutely everything go into user space, if at all possible, that Linux is able to remain only as massively huge as it is now, and no huger.

Of course, microkernel people make the argument that a lot more could be stripped out of the kernel and put into user space, and that's true. The Linux philosophy, however, isn't simply to exclude everything from the kernel that it possibly can. There's also the problem of speed. It's possible that if Kan's code would run significantly faster in kernel space than in user space, then kernel people might be willing to accept the code. But if it's just a question of maintainer convenience, that's not enough to justify accepting code into the tree.

Securing Memory Locations

As a security measure, William C. Roberts wanted to randomize the locations of memory allocations in the kernel, so they couldn't be predicted by hostile code. He submitted some code to do this, but Nick Kralevich objected. As he understood it, William's code "adds a random gap between various mmap() mappings, with the goal of ensuring that both the mmap base address and gaps between pages are randomized."

He pointed out that Android systems had experienced problems with that kind of memory fragmentation in the past. He said, "After a program runs for a long time, the ability to find large contiguous blocks of memory becomes impossible, and mmap()s fail due to lack of a large enough address space."

Nick gave links to the various patches that had previously been needed to undo the fragmentation that he felt William's code now was trying to reintroduce. He said, "If this behavior was re-introduced, it's likely to cause hard-to-reproduce problems, and I suspect Android based distributions would tend to disable this feature either globally, or for applications which make a large number of mmap() calls." Jason Cooper agreed that Nick had identified the key problem with this type of feature and urged William to address the fragmentation concerns before putting much more work into the code.

William agreed that fragmentation was definitely a problem and that one of the goals of his code would be to implement the memory randomization without the added burden of memory fragmentation.

As a stopgap, Dave Hansen suggested simply disabling William's randomization feature on all 32-bit systems. As he put it, "All of the Android problems seemed to originate with having a constrained 32-bit address space." Pavel Machek agreed that 32-bit systems would be hard hit by William's code. Meanwhile Nick also said, "I like Dave Hansen's suggestion that this functionality be limited to 64 bits, where concerns about running out of address space are essentially nil. I'd be supportive of this change if it was limited to 64 bits." Jason also agreed that a 64-bit-only approach would resolve all of his objections, and William said he liked this idea for his code as well.

It's very rare when a security debate identifies a problem and a solution in the same breath. This seems to be one of those instances.

Early Access to Firmware

Luis R. Rodriguez posted some patches to continue the ongoing effort to end support for device firmware availability at initialization, partly motivated by a desire to simplify the incredibly variable bootup procedure in Linux. Once these changes go through, firmware will only be accessible from a mounted filesystem (i.e., a late stage of the bootup procedure). As of Luis' post, he said that only the Dell RBU driver still offered early access to firmware.

As Luis put it, "Thou shalt not make firmware calls early on init or probe."

For subsystems or devices that need access to firmware as early as possible, Luis suggested either including the firmware directly in the kernel binary or within the initramfs image used to boot the system.

Julia Lawall and Josh Boyer both pointed out areas of the kernel beyond the Dell RBU driver that still needed early access to firmware, and Luis replied that this would make things a bit more difficult. He said, "it seems we may want to allow for these type of calls within probe in the end but in order to vet for drivers that fw is available through the direct filesystem lookup we may need help from userspace."

Luis added that some of those instances made it all the more urgent to deprecate and remove the user mode helper code that allowed early access to firmware.

At the same time, Daniel Vetter pointed out that all GPU drivers that depend on firmware would still need early access, because, as he put it, "people are generally pissed when they boot their machine and the screen goes black." He tentatively suggested "loading the different subsystems of the driver in parallel (we already do that largely), and then if one firmware blob isn't there yet, simply stall that async worker until it shows up" – although he did acknowledge that some kernel folks had already told him not to do that.

Luis replied that he wanted to know more about Daniel's needs in this area, since one of the goals of his cleanup was to keep the kernel actually functioning and not break anyone's code too badly.

The discussion grew more technical, but it began to be clear that there were various use cases requiring early access to firmware. In particular, embedded systems did not want to have to include an initramfs image, because every byte would add to the footprint of their products.

There was no ultimate solution during the discussion on the mailing list. At various points, several people suggested conversing in person at the Kernel Summit or the Linux Plumbers conference.

This may be one of those things like the Big Kernel Lock, that everyone wants to get rid of, but that many parts of the kernel depend on. The reliance on early availability of firmware may have to be pruned out gradually over a course of years, rather than all at once by a single patch.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Kernel News

    Zack Brown looks at improving memory management, simplifying(ish) the Kernel Build System, and detecting firmware crashes.

  • Kernel News

    Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

  • Kernel News

    Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

  • Linux Boot Process

    If you want to troubleshoot startup issues, you need a clear understanding of how Linux boots.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More