Zack's Kernel News

Zack's Kernel News

Article from Issue 196/2017

Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

When to Use a Filesystem Capability

Michael Kerrisk wanted to address the need for developers to know which filesystem capability to associate with new features that they want to add to the kernel. This topic has traditionally been a subject of much confusion. There hasn't been enough documentation, and the POSIX standards bodies never really nailed things down sufficiently, so it's a bit of a mess. According to the Linux man page on capabilities, "traditional UNIX implementations distinguish two categories of processes: privileged processes (whose effective user ID is 0, referred to as superuser or root), and unprivileged processes (whose effective UID is nonzero). Privileged processes bypass all kernel permission checks, while unprivileged processes are subject to full permission checking based on the process's credentials (usually: effective UID, effective GID, and supplementary group list).

"Starting with kernel 2.2, Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled."

Michael posted some documentation that, after some back-and-forth with Casey Schaufler, read:

When adding a new kernel feature that should be governed by a capability, consider the following points.

  • The goal of capabilities is to divide the power of superuser into pieces, such that if a program that has one or more capabilities is compromised, its power to do damage to the system is less than that of the same program running with root privilege.
  • You have the choice of either creating a new capability for your new feature, or associating the feature with one of the existing capabilities. In order to keep the set of capabilities to a manageable size, associating a feature with an existing capability is preferable, unless there are compelling reasons to create a new one. (You also face a technical limit: the size a capability sets is currently limited to 64 bits.)
  • To determine which existing capability might best be associated with your new feature, review the list of capabilities above in order to find a "silo" into which the new feature best fits. One approach is to determine if there are other features requiring capabilities that will always be used along with the new feature. If the new feature is useless without these other features, you should use the same capability as the other features.
  • Don't choose CAP_SYS_ADMIN if you can possibly avoid it! A vast proportion of existing capability checks are associated with this capability, to the point where it can plausibly be called "the new root." Don't make the problem worse. The only new features that should be associated with CAP_SYS_ADMIN are ones that closely match existing uses in that silo.
  • If you have determined that it really is necessary to create a new capability for your feature, don't make or name it as a "single-use" capability. Thus, for example, the addition of the highly specific CAP_PACCT was probably a mistake. Instead, try to identify and name your new capability as a broader silo into which other related future use cases might fit.

Casey disagreed with Michael's admonition not to use CAP_SYS_ADMIN unless absolutely necessary. Casey felt that anything to do with system administration belonged with that capability. But Michael replied, "To me, the CAP_SYS_ADMIN situation is a terrible mess. Around a third of all of the capability checks in the kernel are for that capability. Or, to put it another way, it is so broad, that if a process has to have that capability, it may as well be root. And because it is so broad, the number of binaries that might need that file capability is large."

Michael also offered an incomplete list of all the abilities currently associated with CAP_SYS_ADMIN and said that Casey would need a very broad definition of "system administration" in order to truly include all those abilities in that bailiwick.

Casey replied:

Back in the days of the POSIX P1003.1e/2c working group, we struggled with what to do about the things that required privilege but that were not related to the enforcement of security policy. Everyone involved was looking to use capabilities to meet B2 least-privilege requirements in NSA security evaluations. Because those evaluations were of security policy, by far the easiest thing to do was to create a single capability for all the things that didn't show up in the security policy and declare that the people doing the evaluation didn't have to look over there. Since then, people have taken a more practical view that includes security relevance in addition to security policy.

In retrospect, we should have grouped all of the attribute changes (chmod, chown, …) into one capability and broken the non-policy actions into a set of 2 or three.

The way that we think of privilege has evolved. We're not focused on policy the way we used to be. We'll never get everyone to agree on what the right granularity and grouping is, either.

Michael found that bit of history fascinating. But at this point, the discussion veered off into the question of finding the most intuitive names for each capability, and the conversation petered out.

Sometimes it feels as though the history of operating system design does as much to hold back its proper implementation as it does to advance it. There's so much room for discussion at all levels of an issue, and at the same time, there's a world of hardware that's constantly changing to suit a human market that is truly bizarre. It's amazing that something like filesystem capabilities has any rhyme or reason at all.

Cleaning Out FBDev Drivers

The once cutting-edge fbdev drivers have been sinking further and further into the backwaters of the kernel. Recently Tomi Valkeinen posted a patch to remove them from the staging area of Linux entirely. His reasoning was that all new display drivers should be using the DRM framework, and the FBDev drivers have been in maintenance mode, with no new drivers or major features coming down the pike. It was time to get rid of them! Specifically the xgifb, sm750fb, and fbtft drivers.

Daniel Vetter agreed wholeheartedly, remarking that, "we have the simple pipe helpers in drm-kms, and a few drivers starting to use them; there's really no reasons left anymore to have fbdev drivers." And Tomi agreed.

Geert Uytterhoeven wanted to see an example of a DRM driver that used the simple pipe helper, and Greg Kroah-Hartman also said that it only made sense to remove those remaining FBDev drivers if there were DRM replacements that worked on the same hardware.

But Tomi asked what it meant for code to be in the staging directory of the kernel. If it was the same as being out-of-tree, but just with greater accessibility via the Git repository, then it made sense to remove any FBDev driver for the same reasons no new ones would be added.

But Greg reiterated the need for Linux to continue to support existing hardware. He had no objection to keeping the FBDev drivers in the staging directory until it was safe to remove them, but he neither wanted to migrate them into the kernel proper, nor remove them from the tree entirely. Instead, Greg said, they should remain in staging until suitable replacements could be written.

Meanwhile Daniel replied to Geert, saying that the simple DRM drivers still hadn't appeared, although there were some projects "floating around in various places".

At the same time, Benjamin Herrenschmidt had his own objections to ditching the FBDev drivers. He said, "DRM drivers don't strike me as suitable for small/slow cores with dumb framebuffers or simple 2D only accel, such as the one found in the ASpeed BMCs. With drmfb, you basically have to shadow everything into memory & copy over everything, which locks you out of simple 2D accel. For a simple text console, the result is orders of magnitude slower and more memory hungry than a simple fbdev." And he added, "Not everything has a powerful 3D GPU."

But Tomi replied that if DRM was too heavy-weight, it should be fixed to be better. That wasn't a justification for leaving old, crufty FBDev drivers in the kernel forever.

Daniel pointed out, "we have full fbdev emulation, and drivers can implement the 2d accel in there. And a bunch of them do. It's just that most teams decided that this is a pointless waste of their time." He added that "compared to fbdev, there's a very active community who improves and refactors it every kernel release to make it even better. Since about 2 years (when atomic landed) we merge new drivers at a rate of 2-3 per kernel release, and those new drivers get ever simpler and smaller thanks to all this work."

But Geert simply replied: "This has been going on for years: 1. fbdev is obsolete, everybody should use DRM instead! 2. Can you please point me to a small sample driver for a dumb frame buffer? 3. Several are being written, but none of them is upstream yet. 4. Go to 1."

To which Daniel said that there were more than 20 small sample drivers using DRM already. And Geert pointed to Daniel's earlier quote where he said that some were floating around, but none had landed. Geert said he wanted "simple dumb memory-mapped frame buffers, which is what fbdev was initially developed for." And Daniel said, "small drivers like these we have piles now; things exploded a lot after atomic landed two years ago. And they seem to shrink with every release a bit more."

At this point Benjamin realized there was a lot of DRM documentation that had recently gone into the kernel. He ran off to read it, and returned, saying that his objections may be out of date.

Daniel also pointed out the MXSFB DRM driver, which he said was a good example of a simple driver using the display pipe helpers.

At this point, even Geert started to feel like maybe DRM was ready – or at least nearly ready – to replace fbdev fully.

The discussion continued for a bit, with more people joining in to discuss specific abilities of various drivers and specific needs of various sectors of users. But it does seem as though, finally, the main kernel folks who objected to DRM as too heavy weight have been mollified to some extent. The DRM folks have extended their code to begin to handle the most simple cases, which is what the FBDev folks want, and we can probably look forward to the final FBDev drivers being rooted out of the staging directory in the relatively near future.

Plugging Security Holes at The Hardware Level

Paolo Bonzini wanted to lock down KVM security a bit further by preventing users from invoking certain assembly instructions – specifically, SLDT, SGDT, STR, SIDT, and SMSW. Each of these instructions have certain security holes. For example, as Paolo explained, SGDT and SIDT "can leak kernel-mode addresses to userspace, and can be used to defeat kernel ASLR [address space layout randomization]." He went on: "SLDT, STR, and SMSW aren't as bad because SLDT and STR only leak selectors, while SMSW only leaks CR0.TS in practice."

There wasn't much discussion, although Liang Z. Li offered to help with any further assembly lockdown efforts.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More