Zack's Kernel News
Zack's Kernel News
Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.
Speeding Up Security Access Checks
Linus Torvalds sent a patch to Al Viro, trying to eke out a bit more speed from the VFS. In particular, he felt that some of the security checks from SELinux could be speeded up by moving the relevant data out of SELinux itself and directly into the inode data structure. This way, when looking up file paths, the code could avoid doing an extra pointer dereference.
Casey Schaufler responded to this as the maintainer of Smack, a kernel module that implements access controls for data and running processes. He said that some of the security stuff he'd been working on would reintroduce exactly the kind of pointer dereference that Linus had been trying to eliminate.
He also said that this idea of migrating data into the inode was something he hadn't known was an option. He said, "I have been working on the assumption that the single blob pointer was all that could ever go into the inode. If that's not true stacking could get considerably easier and could have less performance impact."
Casey added that, as the Smack maintainer, he would rather see performance enhancements going into the generic Linux Security Module (LSM) rather than SELinux. He said, "The LSM blob pointer scheme is there so that you (Linus) don't have to see the dreadful things that we security people are doing. Is it time to get past that level of disassociation?"
Linus affirmed that SELinux was the only security solution that had tolerable speed to begin with. Linus remarked, "I used to actually turn it off entirely, because it impacted VFS performance so horribly. We fixed it. I (and Al) spent time to make sure that we don't need to drop RCU lookup just because we call into the security layers etc." But, Linus added, "Last time I tried Ubuntu (they still use apparmor, no?), 'make modules_install ; make install' didn't work for the kernel, and if the Ubuntu people don't want to support kernel engineers, I certainly am not going to bother with them."
At this point, or a couple email messages later, Linus went on a tirade against security folks. He said, "Btw, it really would be good if security people started realizing that performance matters. It's annoying to see the security lookups cause 50% performance degradations on pathname lookup (and no, I'm not exaggerating, that's literally what it was before we fixed it – and no, by 'we' I don't mean security people)."
But Casey took umbrage at this. He said that security people did care about performance, and moreover were "bombarded with concern over the performance impact of what we're up to. All too often it's not constructive criticism. Sometimes it is downright hostile." So, not only did they care, but they were never allowed to forget about it.
Linus retreated to the technical side of the discussion. He suggested that one way to deal with the performance issue would be to migrate yet more code out of the security system and into the core of the VFS. If the simplest and most common cases could be recognized and dealt with in highly optimized ways, then the very few corner cases that actually did require special security code could call out to the security systems as needed. But in 99 percent of the cases, he estimated, this wouldn't be necessary. He added, "once that happens, we don't care any more what security people do."
In a later email, Linus suggested that this 99 percent case could conceivably be handled by a single bit that just indicated "this inode has no special file permissions outside the normal UNIX ones." He said, "Having to call into the security layer when you cross some special boundary is fine. It's doing it for every single path component, and every single 'stat' of a regular file – *THAT* is what kills us."
Casey thought this could be workable, and he started going over some of the technical details. Shortly afterward, the thread ended.
Status of 32-Bit Kernels on 64-Bit Systems
Pierre-Loup A. Griffais was running a 32-bit kernel on a machine with lots and lots of RAM and noticed a big slowdown with memory operations that went beyond 16GB of RAM. Rik van Riel replied, "If you have that much memory in the system, you will want to run a 64 bit kernel to avoid all kinds of memory management corner cases." And, Johannes Weiner added, "You can even keep your 32 bit userland, just swap the kernel."
Pierre-Loup replied, "We're in a situation where popular distros ship 32-bit as the default "use this if you're not sure what to get" option, with PAE also enabled by default. Most modern computers shipping with more than 16G of RAM, especially for gaming. Looking at the Steam HW survey data we have hundreds of users using this combination; this commit means that installing package updates that pull in a new kernel will immediately cause their system to become unusable."
Linus Torvalds replied:
PAE is 'make it barely work'. The whole concept is fundamentally flawed, and anybody who runs a 32-bit kernel with 16GB of RAM doesn't even understand *how* flawed and stupid that is.
Don't do it. Upgrade to 64-bit, or live with the fact that IO performance will suck. The fact that it happened to work better under your particular load with one particular IO size is entirely just 'random noise'.
Yeah, the difference between 'we can cache it' and 'we have to do IO' is huge. With a 32-bit kernel, we do IO much earlier now, just to avoid some really nasty situations. That makes you go from the 'can sit in the cache' to the 'do lots of IO' situation. Tough.
Seriously, you can compile yourself a 64-bit kernel and continue to use your 32-bit user-land. And you can complain to whatever distro you used that it didn't do that in the first place. But we're not going to bother with trying to tune PAE for some particular load. It's just not worth it to anybody.
Pierre-Loup replied that he personally did exactly that – he used a 64-bit kernel on his systems and already knew why that was much better. But he said, "my goal is to avoid ending up with a variety of end-users that don't necessarily understand this getting bitten by it and breaking their systems by upgrading their kernels. I will indeed bring this up with distributors and point out that shipping PAE kernels by default is not a good idea given these problems and your stance on the matter."
Rik van Riel gave this a bit of thought and came up with the following suggestion:
Limit the memory that a 32 bit PAE kernel uses, to something small enough where the user will not encounter random breakage. Maybe 8 or 12GB?
It could also print out a friendly message, to inform the user they should upgrade to a 64 bit kernel to enjoy the use of all of their memory.
It is a bit of a heavy stick, but I suspect that it would clue in all of the affected users."
Pierre-Loup loved this idea and thought it would definitely do the trick.
So, it seems likely that something like this will go into the kernel, but it also seems very clear that if you're running a 32-bit kernel on a 64-bit system, you should stop right away. As H. Peter Anvin concluded, "We kernel guys have been asking the distros to ship 64-bit kernels even in their 32-bit distros for many years, but concerns of compat issues and the desire to deprecate 32-bit userspace seems to have kept that from happening."
Status of Tarball Releases
Linus Torvalds went on vacation to the International Domed Lunar Colony (IDLC), which only had a very low speed Internet connection. And, while there, he realized that he hadn't installed his tarball-making utilities before leaving Earth. He wrote to the mailing list saying that the poor connection was still good enough for email, and it was still good enough to update his terran Git tree, but it wasn't good enough to download software in a reasonable amount of time. He apologized, but added, "I suspect nobody actually uses the tar-balls and patches, since git is so much more convenient and efficient, so hopefully nobody cares. But I'll rectify the lack eventually. Hopefully within a day or two, as my 'yum update' actually completes. And if not in a day or two, then when I get back home a few days later."
Randy Dunlap replied that he actually did still rely on tarballs and patches. But, he said, "I do expect that some day you will just completely drop doing those and I will handle that (with some changes)."
Mikael Pettersson said that he too also used the tarballs. And Linus replied, "I won't discontinue the tar-balls/patch-files as long as there are users. At least not in the near future."
Buy this article as PDF
A major setback for the Linux desktop.
Improved support for GPU in virtualization.
News site for the openSUSE community falls victim to a Wordpress exploit.
The source code is available online.
One out of three virtual machines on Microsoft Azure Cloud run Linux.
The form factor of the board makes it a drop-in replacement for Raspberry Pi.
Makes it easier for customers to move workloads into container-centric applications.
SUSE’s answer to container-centric operating systems.
Linux 4.9 is the biggest release in terms of number of commits.
The latest version of the official RHEL clone is here.