Zack's Kernel News

Zack's Kernel News

Article from Issue 182/2016

Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

Compiling the Kernel as a Library

Octavian Purdila came up with a way to compile the kernel as a static library, called LKL (Linux Kernel Library), making all of its interfaces available to software running on other operating systems. The goal, Octavian said, was "to allow reusing the Linux kernel code as extensively as possible with minimal effort and reduced maintenance overhead."

Octavian distinguished LKL from UML (User Mode Linux), pointing out that UML offered a full operating system environment, with corresponding infrastructure requirements like filesystems and processes, whereas LKL is a programming library with a set of function APIs that any program could link to and use.

Richard Weinberger said that this librarification "eliminates UML's most problematic areas, system call handling via ptrace() and virtual memory management via SIGSEGV."

Richard asked whether LKL was currently restricted to single threading only, and Octavian replied, "at this point yes. SMP support is on my todo list though."

Several folks jumped into the discussion, mostly regarding compatibility with similar projects such as libOS and libguestFS. These aren't necessarily the sort of projects that require acceptance by Linus Torvalds or any of the other top kernel contributors. There's a certain amount of access to users that comes when a project has a dedicated build target within the kernel or its own driver or filesystem, but for projects like these, that aim mostly for specialized use cases, it's often sufficient to keep them as standalone projects.

Turning Off Portions of a Device to Save Power

Irina Tirdea posted some patches to allow the kernel to suspend a piece of hardware attached to the system (e.g., turning off video in response to closing the lid of a laptop or the screen of a phone). Currently, the SysFS control power interface has only two options: on and auto. Irina wanted to add a new option: off. As she described it, "the device will be force suspended by calling its runtime suspend callback and disabling runtime power management so that further accesses to the device will not change the actual state. The device can be resumed by setting the attribute to on or auto."

Rafael J. Wysocki dismissed the whole idea, saying, "Had we thought this had been a good idea, we'd have added that thing to the interface from the start."

The reason, he said, was that userspace generally had no way of knowing when it was safe to suspend a device. Putting that kind of control into user software, he said, could break things.

Irina pointed out that in the scenarios she'd mentioned, it wasn't software but the user who initiated the suspend, by closing the lid of the device or interacting with it in some other physical way. She said that in the current code, drivers for touchscreens and other hardware each had to have their own mechanism for suspending when not needed. She said, "This adds more complexity to every driver by adding one more logical power state. It would be good to have a common interface instead of doing this in every driver."

Oliver Neukum also had doubts about Irina's approach. It seemed to him that the software could accomplish the same thing by simply stopping using a piece of hardware, thus letting that hardware idle and use less power. Why bother suspending at all? In Irina's approach, he said, there were various complex issues, including the need to monitor all the software lock counts, to make sure the hardware was truly free to suspend.

Octavian Purdila offered some clarification of Irina's work. He said:

The very specific problem we want to solve is handling touchscreens on a phone/tablet. When the screen is turned off, it is ideal to suspend the touchscreen for two reasons: to lower the power consumption as much as possible and to prevent interrupts to wake-up the CPU when the user touches the device, and thus save even more power as we allow the CPU to stay in deep idle states for longer periods.

Note that when the screen is turned on again, we want to resume the touchscreen so that it can send events again.

This is different than the lid closes examples, as in that case the user can not generate new events and thus the usual autosuspend feature is probably good enough (if the suspend power and autosuspend power consumption is similar).

Rafael said he and Alan Stern had discussed something a few months back that might address Octavian's example better. As he described it, it should be possible "to add a third value to /sys/devices/ /power/wakeup (in addition to 'disabled' and 'enabled') so userspace can indicate that remote wakeup should not be enabled for runtime suspend for the device (since there's no way to indicate that today)."

Alan, however, added a caveat for this idea, saying that "it was never implemented. For that reason, it was never completely fleshed out."

Several folks debated the issue, but the discussion seemed focused more on implementation than on whether Irina's feature would be good or not. Irina's code seemed to be generally disapproved of, but there was no consensus on what would be better. It was difficult even to identify the specific use cases that any proposal might address. For example, at one point in the discussion Dmitry Torokhov said:

In ChromeOS, we have a custom 'inhibit' control that:

1. Tells input core to ignore all events from a given device

2. Allows driver to put device in low power mode if driver desires to do so. The driver can do it via runtime PM or on its own. Usually on its own since when using runtime PM userspace may disable it, which may not be desirable.

I would love to have something generic instead of input-specific.

But, he added, "I was hesitant bringing it upstream as I believe it is not necessarily input device specific, and I would love to have it implemented at device core level."

Ultimately, Rafael and Alan were most active in trying to come up with an appropriate approach, but no solid design emerged from the discussion.

Mounting Filesystems Under Emulation

Seth Forshee and Eric Biederman were working on some patches to support mounting ext4 and FUSE filesystems from within user namespaces, in other words from within an emulated system running on top of the Linux kernel. Seth posted an initial set of patches for consideration. As he explained in a follow-up email, "This is supporting mounting filesystems like ext4 by unprivileged users and not trusting the labels they set in the same way as we trust labels on filesystems mounted by privileged users."

Note that "labels" in this context does not refer to filesystem labels that can be used to determine target mountpoints for a given filesystem. Instead, "labels" here refers to a set of extended attributes (xattr) data used by the Linux Security Module (LSM) to constrain user access to a given filesystem.

Casey Schaufler pointed out a potential conflict with another bit of coding being done by Lukasz Pawelczyk, to support LSMs in user namespaces. He said that Seth and Eric's work "gives an unprivileged user the ability to ignore the Smack labels that are on files and to create files with labels that do not match the rules laid down by the security module."

Casey thought that ignoring the Smack labels would leave security holes, allowing untrusted users to access files that would otherwise be protected. He said, "you can't pick and choose when you are going to pay attention to the security attributes on a filesystem. It's possible that it will work out the way you want it, but it probably won't. Smack doesn't allow you to choose if you're using xattrs. SELinux does, but certainly doesn't expect you to be flipping it on and off. I'm not convinced that it's safe to do for capability sets, either."

Andy Lutomirski felt that Casey's concerns might not apply in the current situation. He suggested that, "If I mount an unprivileged filesystem, then either the contents were put there *by me*, in which case letting me access them are fine, or (with Seth's patches and then some) I control the backing store, in which case I can do whatever I want regardless of what LSM thinks."

Casey replied, "If you have a security module that uses attributes on the filesystem you can't ignore them just because it's 'your data'. Mandatory access control schemes, including Smack and SELinux don't give a fig about who you are. It's the label on the data and the process that matter. If 'you' get to muck the labels up, you've broken the mandatory access control."

Eric replied:

There are two fundamental issues mounting filesystems without privilege, by which I actually mean mounting filesystems as the root user in a user namespace.

- Are the semantics safe?

- Is the extra attack surface a problem?

Figuring out how to make semantics safe is what we are talking about.

Once we sort out the semantics we can look at the handful of filesystems like fuse where the extra attack surface is not a concern.

With that said, desktop environments have for a long time been automatically mounting whichever filesystem you place in your computer, so in practice what this is really about is trying to align the kernel with how people use filesystems.

I haven't looked closely, but I think docker is just about as bad as those desktop environments when it comes to mounting filesystems.

Eric also added:

There are filesystems like fat and minix that can not store a label. Since it is not possible to store labels securely in filesystems mounted by unprivileged users (at least in the normal sense), the intent would be to treat a filesystem mounted without the privileges of the global root user as a filesystem that does not support xattrs.

Treating such a filesystem as a filesystem that does not support xattrs is the only possible way support such a filesystem securely, because as you have said someone who can muck up the labels breaks mandatory access control.

Given how non-trivial it is to grasp the nuances of different lsms mandatory access control semantics, I am asking Seth for the first [pass] to simply forbid mounting of filesystems with just user namespace permissions when there is an lsm active.

Once we get that far, smack may never need to support such systems.

Meanwhile, Lukasz also took a look at Seth and Eric's work and thought that the conflict Casey had mentioned earlier was not a real problem. He said, "If your approach here is to treat user ns mounted filesystems as if they didn't support xattrs at all, then my patches don't conflict here any more than Smack itself already does." Seth said he'd make sure to check out Lukasz's patches in any case and make sure there were no issues.

At this point, the conversation descended into a consideration of specific user scenarios that might or might not expose private data to an untrusted user. Various folks joined in, trying to identify exactly where in the code security would break down and what needed to happen at those points to firm it up. At one point, while working through one of these scenarios, Casey remarked, "My position is that there's a workaround but that the design is still fundamentally flawed."

At a different point in the conversation, Seth said, "Right now, it looks to me like the only safe thing to do with mounts from unprivileged users is to ignore the security labels, so that's what I'm trying to do with these changes. If there's some better thing to do, or some better way to do it, I'm more than happy to receive that feedback." Casey replied, "Personally, I don't believe that the goal of supporting unprivileged mounts is especially sane. I am willing to be educated, but I don't see a rational solution."

There was ultimately no resolution to the various disagreements. The issues are very thorny to work through, as are all kernel features related to security. Sometimes the only solution is to support a subset of features that appears arbitrary to the end user but is absolutely required for security.

The Author

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Kernel News

    Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

  • Ext4

    The newest child in the Ext filesystem family provides better performance and supports bigger filesystems. Are you ready for Ext4?

  • Ask Klaus!

    Klaus Knopper is the creator of Knoppix and co-founder of the LinuxTag expo. He currently works as a teacher, programmer, and consultant. If you have a configuration problem, or if you just want to learn more about how Linux works, send your questions to:

  • Suse Linux 9.3

    The latest version of Suse Linux Professional was released in mid-April. As we have come to expect, the box has a large collection of software, some new and a lot of improved Yast modules, and an extensive documentation. Read on to find out what else has changed in Suse Linux 9.3.

  • Kernel Tips

    Worried about a recent security exploit? Want to take advantage of a new hardware feature? You don’t need to be a Linux expert to patch and compile the Linux kernel. We'll show you how to get started.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95