Zack's Kernel News

Zack's Kernel News

Article from Issue 167/2014
Author(s):

Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

Simplifying the Config System

Jean Delvare from SUSE noticed that kernel config files were reaching upward of 6,000 lines and that the number of drivers and configurable options was continuing to expand explosively. He also noticed that the configuration system had some bugs, setting certain drivers to be compiled into the kernel, even if the hardware they supported was not present on the system. Recently, he suggested:

I would like to call for proper hardware dependencies to become a general trend: Every new hardware-specific driver which is added to the kernel should depend on ($hardware || COMPILE_TEST), so as to make it clear right away, which type of hardware is expected to need the driver in question.

$hardware can be the top-level architecture (e.g., ARM), but can also go down to sub-architecture/platform (e.g., ARCH_AT91 or PLATFORM_AT32AP) or even machine (e.g., PICOXCELL_PC3X3). The list can always be extended later if needed. Ideally we should restrict as much as possible as long as the result is easy to maintain, not too complex, and not likely to break in a near future.

By splitting the dependency between particular hardware and COMPILE_TEST, Jean said, the code would all still be included in build tests. Which, as a side effect, would give the user a quick-and-easy way to build a driver if the hardware dependency turned out to be too strict. They could build it with COMPILE_TEST and report the overly strict dependency to the kernel folks so it could be fixed.

Initially, said Jean, the new hardware dependency requirements would apply to new drivers only. He hoped that existing drivers would add their own hardware dependencies, although he acknowledged that it might take some time to identify all of those dependencies retroactively.

Josh Boyer, from the Fedora project, said he also had a tough time dealing with the explosive growth of config options in the kernel. He said, "I've gotten to the point where I can somewhat guess based on the driver name which arch it's for (lately the majority are for ARM), but that isn't really a great way to handle things." He loved Jean's suggestion and was glad the issue had been raised.

Greg Kroah-Hartman had a couple of suggestions. One was that for Josh's work on Fedora, he could simply select "m" for each option and build everything as a module. That was the recommended approach for distribution kernels anyway. Josh didn't like that idea, however, because, "without on-going vetting of configs you wind up building stuff that doesn't have any practical chance of being on various platforms."

Josh also pointed out that for distributions wanting to avoid building all those extra binaries, they would have to "spend a lot of time looking at Kconfig files to figure out what drivers make sense where. That in turn leads to builds failing when we turn something on that isn't in any of the defconfigs built in linux-next on a particular arch. That tends to lead to, at best, patch creation, and more usually build break reports that are dealt with in a wide variety of ways. Creating patches for build breaks is great for stats and stuff, but it would probably be nice not have to deal with that hassle if it has no practical benefit."

Jean likewise did not like the suggestion to build everything as a module. He said:

Please, no. "Just say m if you don't know" was fine in the late 90s, when Linux was mostly x86. You could afford including 3% of useless drivers, and people working on other architectures said "no" by default and only included the few drivers they needed.

But today things have changed. We have a lot of very active, mature architectures, with a bunch of existing drivers and a batch of new drivers coming with every new kernel version. Saying "m" to everything increases build times beyond reason. You also hit build failures you shouldn't have to care about – depmod takes forever, updates are slow as hell. This is the problem I am trying to solve.

He went on:

"Just say m to everything" is just so wrong today that at SUSE we are very close to switching our policy to "just say no to everything and wait for people to complain about missing drivers." This may not sound too appealing but this is the only way to keep the kernel package at a reasonable size (and build time), as long as upstream doesn't help us make smarter decisions. Useless modules aren't free, they aren't even cheap.

Ideally I would like to be able to run "make oldconfig" on a new kernel version and only be asked about things I should really care about. And I'm sure every distro kernel package maintainer and most kernel developers and advanced users feel the same.

Greg replied that on his systems, building a test kernel with 3,000 modules took about 20 minutes on his laptop, and about five minutes on a build server in the cloud. He pointed out that "Cutting out 100 modules might speed it up a bit, but really, is that a big deal overall?"

Greg said that he agreed with Jean and Josh's overall point that fixing dependencies would be a good thing; but, he said it was hard to accomplish. He said, "Yes, it's a pain for distros, and, yes, it would be nice if people wrote better Kconfig help text. Pushing back on that is the key here. If you see new drivers show up that you don't know where they work on, ask the developers and make up patches."

And, in the meantime, Greg said, "I'd suggest getting a faster build box to start with."

Elsewhere in the thread Josh had said, "Maybe I'm overly grumpy. Still, it's frustrating to see Kconfig entries that clearly say 'blahblah found on foo ARM chip' in the help with no 'depends on ARM' (not meaning to pick on ARM)."

Greg said that would make him grumpy, too. He replied, "perhaps a few developers could be auditing the new Kconfig items of every kernel around -rc3 timeframe to ensure that they don't do stuff like this."

But, Jean said, "It's the reviewer's job to refuse new drivers with bad Kconfig descriptions in the first place. This must happen as early as possible in the chain. By -rc3 it's way too late, all kernel developers and distributions have already moved to the new kernel so they have already answered the n/m/y question for all new entries."

Also, in response to Greg's 20-minute build times, Jean pointed out, "We have 34 kernel flavors for openSUSE 13.1 for example. And every commit potentially triggers a rebuild of the whole set, to catch regressions as fast as possible. So, every module we build for no good reason, gets built a hundred times each day."

Jean also pointed out that of the 3,000 modules in the kernel, he expected to be able to eliminate a lot more than the 100 Greg had estimated. He said it was "more like 1000. Not that I wouldn't want to clean things up even if that was only 100 useless modules, but please don't minimize the importance of the problem. The exact number is very difficult to evaluate, as evaluating it would basically require almost the same effort as actually fixing all the driver dependencies."

Jean added, "Electricity isn't free, hardware isn't free, rack slot count is finite and server room space is limited over here. I don't quite see why we should invest in new hardware to shorten the build times, if the same can be achieved with our current hardware simply with better configuration files. [...] And really, I don't see why I should have to wait for 10 minutes for my build to complete if half of that is spent building drivers that will never be used. The fact that 10 minutes is 'reasonable' is irrelevant."

Josh also added, "It takes my desktop machine about 30-45 min to build an x86_64 kernel RPM with the current configs. Now granted, that's a bit more than just building a kernel in a local git tree, but it's nowhere near 5 min. Our official build servers show similar timings for x86_64. For ARM kernels, it takes about 3.5-4 hours. That's due to policy decisions on now allowing cross-builds in the distro (sigh), so all of the kernels are built on native ARM machines." To that last point, Greg replied, "That's really crazy to do that, there is this wonderful tool called qemu … ."

Pavel Machek also sided with Jean and Josh, saying, "Well, cutting 100 config questions would be worth it." He added that, "I am configuring kernels for N900. After kernel learns it is n900, it would be possible to ask very little questions: It has no ISA, no PCI and set of devices on i2c/platform bus is well known and well defined … ."

There was no further discussion, and it's not clear that Greg agrees the problem has a workable solution. However, there does seem to be a strong desire among various kernel developers to improve the intelligence of the config system, and the sheer size of the kernel config system seems to indicate that such desire will only increase over time.

Linux on Small Systems

A bit of irony exists in the fact that Linux developers are now struggling to make Linux run on "small" systems that are really much bigger than the hardware Linux was originally written for.

Andi Kleen recently posted about this, saying that hardware such as Intel's Quark system-on-a-chip might have only 2 or 4MB of RAM. He said, "One problem on these small systems is the size of the network stack. Currently enabling IPv4 costs about 400k in text, which is prohibitive on a 2MB system, and very expensive with 4MB."

He remarked that Adam Dunkels's LwIP (Lightweight IP) used only about 100KB of RAM per application and that some folks had been suggesting using that on small Linux systems. Andi didn't like that idea, however. He thought it wasn't necessary to replace the Linux networking code en masse with something else. He said, "I maintain that the Linux network stack is actually not that bloated, it just has a lot of features :-) The goal of this project was to subset it in a sensible way so that the native kernel stack becomes competitive with LWIP."

So, he modularized a few features that were only needed on server systems and reimplemented a few features that could be done more simply. The end result, he said, had three available choices: You could either have the full networking system, with all features enabled; you could have the non-server networking system, that was still fully featured enough to work with modern Linux distributions; or, you could have the "minimal subset for deeply embedded systems that can use special userland. Remove rtnetlink (ioctl only), remove ethtool, raw sockets."

Richard Weinberger pointed out that it was hard to imagine a user space that would run on "kernels without procfs and netlink, so not even ps would work." Tom Zanussi of Intel replied, "The microYocto 'distro' I have running with these net-diet patches doesn't use a full procfs, but a pared-down version (CONFIG_PROCFS_MIN). Keeping ps working is of course essential, and it does that (along with a couple other things like /proc/filesystems and /proc/mounts I needed to boot)". Tom added, "It's very much a work-in-progress with a lot of rough edges, but it is a fully functional system on real hardware (Galileo board/Quark processor) with a usable shell (ps too!) and web server running on a kernel with native networking and ~750k text size."

Alexei Starovoitov was skeptical of the whole project. With all of Andi's "hacks," he said, the kernel was only half functional and kind of a mess. He suggested that instead of Andi's approach, which added to the size of the already immense config system, "Kernel function profiling can potentially achieve the same thing. Profile the kernel with the set of apps and then prune all cold functions out of kernel. Config explosion and LTO [link-time optimizations] is unnecessary. Just some linker hacks. Obviously such a kernel will also be half functional, but you'll get a big reduction in .text, that it seems is the goal of this project."

He went on, "I'm saying: no extra optimizations, no GCC changes. Compile kernel as-is. Most functions have a stub for mcount() already. Use it to track whether kernel function was called or not. Collect this data in userspace (as perf already does), add few more functions that had 'notrace' attribute on them, and feed this into special linker that unpacks existing vmlinux, throws away cold functions, relocates the rest and here you have tiny vmlinux without recompilation."

Andi said he thought that might be a good idea for certain kernel subsystems; but, he said, "That's very difficult for networking code. How would you know you exercised all the corner cases in the TCP stack? And you wouldn't want a remotely exploitable system because some important error handler is missing."

Alexei replied that whether they used his linker approach or Andi's modularization approach, it would still result in the same number of bugs. The difference, he said, was that his linker approach didn't require a lot of patches to be reviewed and maintained. However, he said the best solution was just to add more RAM to the system.

The discussion petered out there, but it seems as though Alexei's point is not that his approach should be used – it's more that both approaches are equally bad and should just be abandoned. That doesn't seem like an option, however, for the cresting wave of wearables and other tiny systems that will soon be everywhere. My sense is that the Linux kernel will want to support all of these small systems natively, as full-fledged members of the panoply of supported hardware. And that will necessarily mean code and configuration options that sacrifice functionality – perhaps a lot of functionality – in favor of size.

The Author

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News