Zack's Kernel News

Zack's Kernel News

Article from Issue 249/2021

Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

Protecting Filesystems from Themselves

Chao Yu recently tried to revert a kernel commit for an F2FS patch. F2FS is a Samsung filesystem for solid state drives. Chao wanted to revert the patch because one of the kernel's generic tests expected F2FS to fail to mount a read-only partition. Ironically, as pointed out by Jaegeuk Kim, F2FS had no trouble mounting such a partition and giving the user full read access to all its data. So the filesystem failed the test … because it succeeded.

Jaegeuk suggested changing the test rather than reverting the patch, but Chao pointed out that the test was actually important for filesystems in general, not just F2FS. Changing the test for that one case, he said, would mean other filesystems might technically pass the test when they really should fail.

Chao also disagreed with Jaegeuk that F2FS handled the case properly. Walking through the code, he identified a certain point at which, he said, the device was then read-only, so that all writes would fail. Therefore, recovered data would not be able to persist beyond the expiration of the page cache. At that point, he said, the user would see stale data instead of the latest system state.

Jaegeuk argued that – and this became problematic in the conversation – F2FS could synchronize its data in the device core, with the filesystem itself, if the user then mounted the filesystem as read-write. So there was no need for F2FS to fail the test, given that it could mount the filesystem and potentially also save new data.

At this point, Chao began to suspect they were talking about two different things.

There was the one case, where the filesystem is mounted with the read-only option. In that case, the user would be unable to give any command to write to that filesystem. The filesystem itself, on the other hand, would still retain the power to write to the underlying solid state device. So even if the user couldn't write anything, the filesystem would still be able to preserve some state, which was what Jaegeuk had been saying.

The second case, Chao pointed out, was where the device itself is read-only. In that case, there would truly be no way to preserve any state. This was the case Chao was concerned about. He agreed with Jaegeuk that the kernel test wasn't really relevant for the case where the filesystem was read-only while the device was read-write. The key case for Chao was when both the filesystem and the device were read-only. In that case, Chao said, the generic kernel test was correctly failing, which justified his effort to revert the particular patch in question.

Chao and Jaegeuk descended into a technical implementation discussion for future versions of Chao's patch, and eventually Chao came out with version 2.

The fun part of that debate is the effort to allow the user as much control as possible over their system. If there's some tiny remnant of data that might be preservable, the developers don't want to miss the chance to preserve it, even in odd scenarios where the user mounts a filesystem first as read-only and then as read-write.

Extending chroot() to Regular Users

Mickaël Salaün wanted to extend chroot() so that regular users also could use it. The chroot() system call creates a new root directory for the current process and its children. It's used in conjunction with lots of other things to create virtual systems that appear to be entirely distinct from the actual running system. Generally, only the root user does this, but Mickaël made the case that there was real value in letting regular users do it, too.

In particular, he wanted to let regular users create sandboxes (secure areas within a system) so they could develop their projects without fear of hostile users taking advantage of momentary bugs.

Mickaël said, "Chroot(2) is not an access-control mechanism per se, but it can be used to limit the absolute view of the filesystem." And, he continued, "Users may not wish to expose namespace complexity to potentially malicious processes, or limit their use because of limited resources. The chroot feature is much more simple (and limited) than the mount namespace but can still be useful."

The security considerations in Mickaël's patch are real. As he put it, "Allowing a task to change its own root directory is not a threat to the system if we can prevent confused deputy attacks, which could be performed through execution of SUID-like binaries. This can be prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with prctl(2). To only affect this task, its filesystem information must not be shared with other tasks, which can be achieved by not passing CLONE_FS to clone(2). A similar no_new_privs check is already used by seccomp to avoid the same kind of security issues. Furthermore, because of its security use – and to avoid giving a new way for attackers to get out of a chroot (e.g., using /proc/<pid>/root, or chroot/chdir) – an unprivileged chroot is only allowed if the calling process is not already chrooted. This limitation is the same as for creating user namespaces."

The goal, from Mickaël's perspective, would be to allow regular users to gain the benefits of using chroot() on their own, while protecting the larger system from any security issues that might then arise.

However, Mickaël's patch got a serious smackdown from Casey Schaufler. Casey said, "I don't see that new comments are necessary when I don't see that you've provided compelling counters to some of the old ones." Among other things, Casey felt that the namespaces feature would be sufficient for any of the use cases Mickaël had identified. The Linux namespace feature is a way to hide resources from processes. Like chroot(), namespaces create isolated areas that appear to the process to be an entire running system. Casey felt that namespaces would do the trick, and there was no need to extend chroot() to support regular users.

Mickaël disagreed. It's not that he felt namespaces couldn't accomplish the features. It was more that he felt it would be riskier. As he put it, "namespaces bring complexity which may not be required. When designing a secure system, we want to avoid giving access to such complexity to untrusted processes (i.e., more complexity leads to more bugs). An unprivileged chroot would be less complex. Of course it is not enough on its own, but it can be combined with existing (and future) security features."

Casey, for his part, felt that Mickaël's chroot() extensions would require users to behave in such a restricted way that the feature would be virtually unusable.

He also pointed out that if Mickaël's user restrictions were applied across the board, there were other solutions that would work just as well as Mickaël's idea that wouldn't require extending chroot().

Casey just didn't buy the argument that namespaces were too complex. As he put it, "I can use a Swiss Army Knife to cut a string even though it has a corkscrew." And he said, "If you're *really* designing a secure system you can design it to use existing mechanisms, like CAP_SYS_CHROOT!"

Mickaël, however, countered, "Not always. For instance, in the case of a web browser, we don't want to give CAP_SYS_CHROOT to every user just because their browser could (legitimately) use it as a security sandbox mechanism. The same principle can be applied to a lot of use cases, e.g., network services, file parsers, etc."

Casey was unmoved. He said, "You've identified a clever hack to justify expanding when chroot() could be done 'safely' without using privilege. Why not learn how to use the existing mechanism properly? And teach the next set of people how to do the same? I am under no delusion that we can tweak here and fiddle there and make security all rainbows and unicorns. Mature mechanisms that are general are safer than tangled heaps of special cases that make individual projects easier."

However, in the midst of this seemingly mega-rejection, Casey also launched this odd little projectile, saying, "In any case, if you can get other people to endorse your change, I'm not all that opposed to it. I think it's gratuitous. It irks me that you're unwilling to use the facilities that are available and instead want to complicate the security mechanisms and policy further. But that hasn't seemed to stop anyone before."

At this point, the debate between the two ended abruptly. Instead, Kees Cook spoke up from the sidelines to say:

"The only part of this design that worries me is that it seems as though it's still possible to escape the chroot if a process didn't set up its fds carefully, as Jann discussed earlier:

"Regardless, I still endorse this change because it doesn't make things worse, since without this, a compromised process wouldn't need any tricks to escape a chroot because it wouldn't be in one. It'd be nice if there were some way to make future openat() calls be unable to resolve outside the chroot, but I view that as an enhancement.

"But, as it stands, I think this makes sense."

As far as Kees was concerned, the code could go directly into the kernel without delay. In terms of exactly who would accept the patch and feed it up to Linus Torvalds, Kees was not sure. He remarked, "If Al is too busy to take it, and James would rather not take VFS, perhaps akpm would carry it? That's where other similar VFS security work has landed."

Al Viro remarked, "Frankly, I'm less than fond of that thing, but right now I'm buried under all kinds of crap [...]. I'll post a review, but for now it very definitely does not get an implicit ACK from me."

And that was the end of the discussion.

Rarely does a security-related patch get such a thrashing as Casey gave Mickaël's and still get immediately accepted into the kernel. It's still possible Al will raise a serious objection (in which case the patch would be a dead duck), or some other security concerns may come up. But if Kees is right, the main deciding factor could be that the patch doesn't make anything worse and could improve security in general. If a regular user made a mistake with chroot(), it would still only expose that regular user to attack. With no root user behind it all, there would be no serious reward at the end of that attack.

Tracking "Issues"

Thorsten Leemhuis recently proposed a new Linux kernel development mailing list, "linux-issues." The idea would be for developers to CC [carbon copy] their various problems to that list, which would then become a sort of central repository for all kernel-related issues.

The number of existing Linux kernel development mailing lists is truly uncountable. You can see a lot of them listed at, but there are undoubtedly vast numbers of mailing lists used by small groups of kernel developers working in close collaboration. Many of those will also be behind corporate firewalls. Counting them all would truly be impossible.

The ones available at are archived and searchable. This was one of Thorsten's main ideas: a single searchable list for all kernel issues.

The idea itself was fairly flexible, based on discussions at various kernel conferences. Thorsten gave some background, saying, "Back on the maintainers summit in 2017 it was agreed to create a dedicated list for this purpose ( I even requested a a while later but didn't hear anything back; sadly, about the same time, I started having trouble finding spare time for working on regression tracking."

Thorsten tried to anticipate certain objections. For example, he said, "The question 'Why not simply LKML' [Linux kernel mailing list] will likely pop up, but the thing is, searching for reports there will often turn up patches that improve the kernel and don't fix anything. That makes it hard to find issue reports, especially for users that are not used to deal[ing] with mailing lists and their archives."

Thorsten added, "Yes, I'm quite aware that searching list obviously won't turn up reports that are filed in or some other bug-tracking tool. That's okay, as the reporting-issues.rst tells users to look in those places as well."

And, Thorsten said, "reporting issues/bugs by mail has downsides, and maybe instead of creating yet another mailing list, it would be better if all the kernel issues would be reported to a central place like But that tracker doesn't work that well currently, as quite a few of the issues filed there, AFAICS [as far as I can see], never reach the people that need to be handle them. I don't see that changing any time soon (we had a discussion about this recently:"

Lukas Bulwahn said he supported Thorsten's general idea, but Lukas wanted some clarity on what an "issue" really was. For example, would the list include all the automated build warnings, test bot warnings, and other automated kernel testing systems? Or, Lukas asked, "Would you like to keep this list only for reports from single individual human users that need to detect the 'issue' without using one of the tools above?"

Meanwhile, Konstantin Ryabitsev offered an update on the status of He said:

"There will soon be a unified 'search all of regardless of the list/feed source' capability that may make it unnecessary to create a separate list for this purpose. There's active, ongoing work in the public-inbox project to provide parallel ways to follow aggregate topics, including query-based subscriptions (i.e., 'put a thread into my inbox whenever someone mentions my favourite file/function/device name'). This work is not complete yet, but I have great hopes that it will become available in the next little while.

"Once we have this ability, we should be able to plug in multiple sources beyond just mailing lists, including a feed of all changes. This should allow someone an easy way to query specific bugs and may not require the creation of a separate list.

"I'm not opposed to the creation of a new list, of course – just want to make sure it's aligned with the improvements we are working to make available."

Thorsten replied, "Ahh, nice, thanks to everyone working on that!"

James Bottomley also replied to Konstantin, saying, "I suspect the problem is that there's no known useful search string to find a bug report even given a searchable set of lists, so the main purpose of this list would be 'if it's on here, it's a bug report', and the triage team can CC additional lists as appropriate. Then we simply tell everyone to send kernel bugs to this list and ask maintainers to CC it if a bug report shows up on their list?"

Linus Torvalds also had a suggestion for the new list. He said, "I'd much prefer the name 'linux-regressions' as being much more targeted than 'linux-issues'. Make it clear that the list is only for regressions that people can describe some way, rather than some general 'I have issues with xyz'. The more clear-cut the list is, the better, I think."

The discussion continued and ended inconclusively. Essentially, a broad range of developers offered their sense of the various corner cases and problems that might come up or that might be solved by various approaches to Thorsten's new list.

Some comments, such as those from Theodore Ts'o, were mostly hoping that regular users would read more documentation about how to submit bug reports. To that extent the discussion ranged beyond Thorsten's original question. Theodore, for example, said, "I wonder if it might be useful to have a form which users could be encouraged to fill out so that: (a) The information is available in a structured format so it's easier for developers to find the relevant information, (b) so it is easier for programs to parse, for easier reporting or indexing, and (c) as a nudge so that users remember to report critical bits of information such as the hardware configuration, the exact kernel version, which distribution userspace was in use, etc."

It's not at all clear what will come out of this discussion. Some new sort of bug tracker, certainly. There were many ideas floating around. It's also true that creating a new mailing list is an extremely non-critical operation. So it would not be weird to see one or two pop up, with rapidly changing goals and definitions, until the idea finally settles into something no one had expected at the start.

The Author

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Command Line – Jailkit

    Setting up chroot jails is no simple task. Jailkit can make this job a little easier by automating setup and configuration.

  • Sandboxing


  • Kernel News

    Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

  • Network Block Devices

    You don’t need Samba or NFS to support a diskless client. A remote block device can help improve performance and efficiency. We'll show your how.

  • Encrypting with ZFS

    When a computer is lost, your data falling into the wrong hands is often more serious than the loss of hardware. In this article, we explain how to use LUKS and ZFS to encrypt a system so you can keep your privacy when you lose your laptop.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95