Zack's Kernel News
Zack's Kernel News
Zack Brown reports on enhancing KUnit, arguing over a nonexistent problem, and Cgroup core wars.
Enhancing KUnit
Since its introduction by Brendan Higgins in October 2018, the KUnit unit testing framework has seen a lot of activity, and Brendan has implemented a lot of enhancements and documentation. Recently, he submitted some patches to support the bailing out of any given unit test, which he said was needed for implementing assertions. An assertion is a useful and cute debugging tool, where at a certain point in your code you claim that a particular value or condition is true, because at that point in the code you really expect it to be true. You're not branching execution based on a conditional; you're just saying, "I think the world looks like this." Then, if it doesn't look like "this," the program can crash or throw an exception or do whatever you told it to do in case an assertion turns out to be false.
You can see how unit tests and assertions might get in each other's way. If you write a unit test to feed bogus inputs into a function to make sure it handles that case properly, the bogus inputs could end up triggering the assertion and short circuiting your goal, which is to test the actual code, not the assertion.
Brendan wanted unit tests to abort in the event of triggering an assertion within the code. At that point, there would be no reason to continue that particular test. As he put it at some point deep in the conversation, "The idea with assertions is that you use them to state all the preconditions for your test. Logically speaking, these are the premises of the test case, so if a premise isn't true, there is no point in continuing the test case, because there are no conclusions that can be drawn without the premises."
Near the periphery of his goals, however, in the error-handling portion of his code, he tested to see whether the unit test had failed to actually abort as intended, in which case his patch called BUG()
. To that, Frank Rowand remarked, "BUG()
, which is a panic, which is crashing the system, is not acceptable in the Linux kernel. You will just annoy Linus if you submit this."
Brendan replied that he thought BUG()
would be OK, since KUnit would never be used in a production kernel. Consequently, there was no risk of panicking any production system. Also, he said, if KUnit ever did get to this particular point of execution, everything would go kablooey and trying to proceed normally would be a no-no.
But Frank replied that, in fact, even in a non-production kernel, it would be unacceptable to intentionally produce a panic unless it were the only way to avoid data corruption. He also added that it would be wrong to assume KUnit would not be compiled into production systems. It could very well happen, he said.
Even if the unit test failed, Frank also said that didn't mean it would be best to abort the test as a whole. He said, "The remainder of the unit tests can still run. There may be cascading failures, but that is OK."
Brendan said he didn't want to dispute Frank's point, but he did also reply for clarity that "the path where BUG()
gets called only happens if there is a bug in KUnit itself, not just because a test case fails catastrophically." And he went on, "This should never actually happen, even if the assertion that called it was violated. KUNIT_ASSERT_*
just says, 'this is a prerequisite property for this test case'; if it is violated, the test case should fail and bail, because the preconditions for the test case cannot be satisfied. Nevertheless, other tests cases will still run."
There was a little more discussion, but nothing conclusive. Several folks went back and forth on how best to handle various cases, like what null pointers traditionally meant in a testing context, and whatnot.
I love watching the Linux kernel ecosystem as it attracts more and more symbiotic entities to it. One of the greatest of these is of course the C compiler, but so are all aspects of revision control, debugging, and testing, including KUnit. It's clear this is a project that's still in the relatively early stages of what will be a long and elaborate life. For me, it's fun to watch KUnit's tendrils descending into the various inner reaches of the kernel development process.
Arguing over a Nonexistent Problem
A strange argument arose over a nonexistent problem in an inscrutable part of the kernel that turns out to be utterly relevant to all things. Andrea Parri started the whole debate with a patch to affect the ordering of memory reads during concurrent operation. Under a hypothetical circumstance, one read might depend on another, yet be performed in the reverse order. Andrea's patch was intended to fix this.
Paul McKenney had no objection and was all set to apply the patch and feed it up to Linus Torvalds, when Peter Zijlstra pointed out that, in fact, the particular hypothetical situation being fixed by Andrea's patch could never be part of an actual physical computer.
Andrea agreed wholeheartedly that there was no way that even the evilest and most depraved mind could ever construct a monstrous perversion of space and time so as to produce a computer capable of triggering the bug that he had fixed. By way of compromise, he said that in the future the bug could always be reinserted into the code if that's what the developers really wanted.
Peter was having none of this sophistry. He said flatly, "What I do object to is a model that's weaker than any possible sane hardware."
And in support of this, Will Deacon remarked, "Whilst I completely agree that relying on the ordering provided by dep **; **> rfi
is subtle and error prone, having it forbid the outcome above appeals to a hardware-based mindset of how memory ordering works. In the kernel community, I would posit that the majority of developers are writing code with the underlying hardware in mind and so allowing behaviors in the memory model, which are counter to how a real machine operates, is likely to make things more confusing, rather than simplifying them!"
Alan Stern suggested letting Andrea's impossible-to-hit bug fix go into the kernel, but to also retain the code comment for the behavior that was being replaced, in order to "explain that the operational model predicts this ordering, but it has been left out of the LKMM for practical (not technical) reasons."
Andrea backed off, saying, "I won't try to push this patch further." But he did feel that Peter in particular was being a bit of a stickler, always insisting that kernel features should only be implemented if there were actual users to use them.
Meanwhile, Paul veered into untapped realms of theory, saying that since Andrea's patch was primarily related to issues of concurrent code execution, and reading and writing:
"We had some debates about this sort of thing at the C++ Standards Committee meeting last week.
"Pointer provenance and concurrent algorithms, though for once not affecting RCU! We might actually be on the road to a fix that preserves the relevant optimizations while still allowing most (if not all) existing concurrent C/C++ code to continue working correctly. (The current thought is that loads and stores involving inline assembly, C/C++ atomics, or volatile get their provenance stripped. There may need to be some other mechanisms for plain C-language loads and stores in some cases as well.)
"But if you know of any code in the Linux kernel that needs to compare pointers, one of which might be in the process of being freed, please do point me at it. I thought that the smp_call_function()
code fit, but it avoids the problem, because only the sending CPU is allowed to push onto the stack of pending smp_call_function()
invocations.
"That same concurrent linked stack pattern using cmpxchg()
to atomically push and xchg()
to atomically pop the full list would have this problem. The old pointer passed to cmpxchg()
might reference an object that was freed between the time that the old pointer was loaded and the time that the cmpxchg()
executed. One way to avoid this is to do the push operation in an RCU read-side critical section and use kfree_rcu()
instead of kfree()
. Of course, code in the idle loop or that might run on offline CPUs cannot use RCU, plus some data structures are not happy with kfree_rcu()
delays, so…."
Peter tried to wrap his head around what Paul was asking, but had no sort of luck until he found the scholarly article, "Exploring C Semantics and Pointer Provenance" [1], which begins with the lovely and encouraging pronouncement:
"C values cannot be treated as either purely abstract or purely concrete entities: The language exposes their representations, but compiler optimizations rely on analyses that reason about provenance and initialization status, not just runtime representations. The ISO WG14 standard leaves much of this unclear, and in some respects differs with de facto standard usage – which itself is difficult to investigate."
Peter read this and the rest of that baleful article and then staggered back to the Linux Kernel Mailing List saying, "that's all massive bong-hits. That's utterly insane. Even the proposed semantics are crazy; they include UB and are therefore broken on principle."
Peter cast about desperately right and left, saying, "We need to find a GCC person to find/give us a knob to kill this entire class of nonsense. This is just horrible broken shit."
Paul calmly replied:
"Yes, this all is a bit on the insane side from a kernel viewpoint. But the paper you found does not impose this; it has instead been there for about 20 years, back before C and C++ admitted to the existence of concurrency. But of course compilers are getting more aggressive, and yes, some of the problems show up in single-threaded code.
"The usual response is 'then cast the pointers to intptr_t!
, but of course that breaks type checking.
"There is an effort to claw back the concurrency pieces, and I would be happy to run the resulting paper past you guys.
"I must confess to not being all that sympathetic to code that takes advantage of happenstance stack-frame layout. Is there some reason we need that?"
Peter replied:
"Not that I'm aware; but if it gets this 'obvious' case wrong, I worry what else it gets wrong.
"At the very least, we should get this fixed and compile a kernel with the fixed compiler to see what (if anything) changes in the generated code and analyze the changes (if any) to make sure we were OK (or not).
"I mean, yes that example is UB, but the result is also clearly batshit insane."
So, there you have it – from fixing a nonexistent bug, to questioning the nature of reality and one's own place in the universe.
Cgroup Core Wars
Andrey Ryabinin had a bone to pick with the cgroup code. In particular, he didn't like the way Linux handled memory limits. Any time there was more than one virtual machine running, if anything at all hit any kind of memory limit, Linux would attempt to reclaim memory from all cgroups. Andrey didn't like this at all, because "the cgroup that allocates more often always wins. E.g., job that allocates a lot of clean rarely used page cache will push out of memory other jobs with active relatively small all in memory working set."
He pointed out that he knew about the memcg
controls, designed to help avoid that very circumstance. But he said these were insufficient, because "memory cgroups are very hard to configure right, because it requires precise knowledge of the workload, which may vary during the execution. E.g., setting memory limit means that job won't be able to use all memory in the system for page cache even if the rest [of] the system is idle."
Essentially, as Andrey said, the current situation required painstakingly error-prone configuration of each and every cgroup.
He submitted a patch that he felt would do better. Instead of manually tweaking each cgroup, he wanted to fix the way memory was reclaimed in the first place. In particular, instead of Linux reclaiming RAM from all cgroups equally, it would reclaim only inactive memory pages – and only from cgroups that were hoarding a lot of inactive pages. Only when everyone's list of inactive pages was low would the kernel then revert to its previous behavior, shrinking all cgroups equally.
Rik van Riel liked this whole concept, but had some technical suggestions for Andrey's patch. Johannes Weiner also strongly supported Andrey's patch and also offered some technical suggestions for improvements.
Roman Gushchin, however, had doubts. As he put it, "What's considered active and inactive depends on memory pressure inside a cgroup. Actually active pages in one cgroup (e.g., just deleted) can be colder than inactive pages in another (e.g., a memory-hungry cgroup with a tight memory.max
)." He added, "Also a workload inside a cgroup can to some extent control what's going to the active LRU. So it opens a way to get more memory unfairly by artificially promoting more pages to the active LRU. So a cgroup can get an unfair advantage over other cgroups."
Andrey objected to this description of fairness. He said, "If fair means to put equal reclaim pressure on all cgroups, then, yes, the patch increases such unfairness, but such unfairness is a good thing." He presented, as an example, "Cgroup A with active job loading a big and active working set, which creates high memory pressure, and cgroup B – idle (no memory pressure) with a huge not used cache. It's definitely preferable to reclaim from B rather than from A."
Roman insisted that while Andrey was logically correct, there was still no clear way to distinguish between active and inactive pages, and thus, no clear way to know that Andrey's implementation would do the right thing – or better than existing code – in any given instance. He concluded, "So it means that depending on memory usage pattern, some workloads will benefit from your change, and some will suffer."
There was no conclusion during the thread. And it's not at all clear whether Andrey's patch will ultimately make it into the kernel. Anytime something depends on average use-case patterns across the known universe, coupled with uncertain mechanisms to distinguish between the thing we want and the thing we don't want, we're probably looking at a long and controversial debate.
At the same time, Andrey's initial memory theft exploit may indeed be real – giving preference to the cgroup that allocates and wastes the most memory and pushing other cgroups out entirely. Depending on how you look at it, that could be considered a security hole. If so, there may start to be real pressure to address the problem one way or another, either with something along the lines of Andrey's patch, or something else that avoids the exploit.
Infos
- "Exploring C Semantics and Pointer Provenance": http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2311.pdf
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Gnome OS Transitioning Toward a General-Purpose Distro
If you're looking for the perfectly vanilla take on the Gnome desktop, Gnome OS might be for you.
-
Fedora 41 Released with New Features
If you're a Fedora fan or just looking for a Linux distribution to help you migrate from Windows, Fedora 41 might be just the ticket.
-
AlmaLinux OS Kitten 10 Gives Power Users a Sneak Preview
If you're looking to kick the tires of AlmaLinux's upstream version, the developers have a purrfect solution.
-
Gnome 47.1 Released with a Few Fixes
The latest release of the Gnome desktop is all about fixing a few nagging issues and not about bringing new features into the mix.
-
System76 Unveils an Ampere-Powered Thelio Desktop
If you're looking for a new desktop system for developing autonomous driving and software-defined vehicle solutions. System76 has you covered.
-
VirtualBox 7.1.4 Includes Initial Support for Linux kernel 6.12
The latest version of VirtualBox has arrived and it not only adds initial support for kernel 6.12 but another feature that will make using the virtual machine tool much easier.
-
New Slimbook EVO with Raw AMD Ryzen Power
If you're looking for serious power in a 14" ultrabook that is powered by Linux, Slimbook has just the thing for you.
-
The Gnome Foundation Struggling to Stay Afloat
The foundation behind the Gnome desktop environment is having to go through some serious belt-tightening due to continued financial problems.
-
Thousands of Linux Servers Infected with Stealth Malware Since 2021
Perfctl is capable of remaining undetected, which makes it dangerous and hard to mitigate.
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.