Zack's Kernel News
Zack's Kernel News
Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.
Cheaper Oops Output
David Herrmann pointed out that the VT (virtual terminal) subsystem was not really needed for system consoles anymore, but trying to save space in the kernel binary by disabling the VT subsystem compile-time option would cause another problem – the kernel wouldn't send oops output to the console in the event of a kernel panic. David wanted to find a way to preserve oops output while disabling the VT subsystem.
He posted a patch to implement DRM-log (direct rendering manager). This registered its own console and pushed kernel logging messages (including oops output) to it as needed.
One of the fun things about code that outputs oops logs is that it can't really rely on the full infrastructure of the system. After all, the system just crashed and is using its dying gasp to send a message to the future. Fires and explosions could be everywhere, so DRM-log had to worry about things like wrapping the lines of the log and fitting them visually into the framebuffer, pixel by pixel, drawing everything out onto a screen that might be the only thing left standing.
In his post, David made it clear that DRM-log was very slow and should only be used for debugging. Presumably DRM-log would be used on embedded systems with limited space, although the final product would disable that feature as well and include no support for oops output at all.
Bruno Prémont pointed out that DRM-log's font selection might be too small for the user to see, depending on the type of display being used. Something like Apple's Retina display might be too high-res for DRM-log's available fonts. He suggested giving DRM-log the ability to select a larger font to suit the screen size.
David didn't like this idea, because he thought that glyphs were too high-level for DRM-log. Everything should be done at the pixel level. He did agree that size mattered. David said he could add integer scaling, whereby the output text would be two, three, four times larger, and so on – but not two-thirds larger. Bruno said this would be acceptable.
Alan Cox also commented on David's patch. He mentioned that speed was important for debugging (as opposed to simply displaying panic output. But, he added, "What I am more dubious about is tying it to DRM. Yes it uses DRM constants but it doesn't appear functionally to have a single tie to either DRM or even framebuffer. It's potentially useful in cases where neither framebuffer or DRM are compiled into the system."
David agreed that there was nothing especially DRM-specific in his code, but he replied bluntly, "I've spent enough time trying to get the attention of core maintainers for simple fixes, I really don't want to waste my time pinging on feature-patches every 5 days to get any attention. If someone outside of DRM wants to use it, I'd be happy to discuss any code-sharing. Until then, I'd like to keep it here as people are willing to take it through their tree."
That ended the discussion. It's unusual to see a potentially generalizable kernel feature relegate itself to just one part of the tree, solely because it's easier to get the feature through the maintainers of that part of the tree. It's also unusual to see a heavy-hitter like Alan suggest a more general approach and be rejected, because presumably his interest alone could be enough to motivate maintainers of other parts of the tree to take a set of patches more seriously.
Rooting Out Race Conditions
Eugene Shatokhin announced Kernel Strider version 0.3, a tool to detect race conditions in kernel modules running on the x86 architecture. A race condition is when two threads executing simultaneously need to do something in a particular order, and sometimes the wrong one gets there first. As Eugene put it, Kernel Strider was essentially a statistics-gathering tool that would track memory accesses, function calls, and other metrics, and then feed them to a third-party analyzer – Google's ThreadSanitizer tool . Eugene gave a link to the KernelStrider development page , which had tutorials and presentation slides.
Andi Kleen asked what races Eugene had found in the kernel so far, and Eugene pointed him to the KernelStrider wiki , which listed several in the 3.10.x kernel series.
A New Spinlock
Waiman Long announced qspinlock, a new type of spinlock that he thought would be a useful alternative to ticket spinlocks in certain massively parallel systems.
A spinlock is when a thread wants to use a system resource like a printer that can only be used by one thread at a time. The kernel stores a value that represents the state of the lock. When a thread wants the resource, it checks the value of the lock. If the value indicates the resource is available, the thread changes the value of the lock to indicate that the thread will now use that resource. But, if the value indicates the resource is not available, the thread will "spin" in a tight loop, repeatedly checking the value of the lock.
Different spinlocks are good for different situations. Clearly, spinning in a tight loop can be resource-heavy. Some spinlocks try to optimize for the case when no other thread contends for the resource; others optimize for heavy contention.
Waiman's qspinlock implementation represented an alternative to the ticket spinlock that Nick Piggin had written around the 2.6.25 time frame. A qspinlock was actually slower than a ticket spinlock in the case where only a few threads were in contention for a shared resource.
But, in massively parallel NUMA systems, Waiman said, the qspinlock was faster than ticket spinlocks. He said, "The idea behind this spinlock implementation is the fact that spinlocks are acquired with preemption disabled. In other words, the process will not be migrated to another CPU while it is trying to get a spinlock. Ignoring interrupt handling, a CPU can only be contending in one spinlock at any one time."
This fact enabled qspinlocks to arrange their queue on a per-CPU basis, which sped things up and allowed for more efficient memory use. He profiled some workloads and posted some encouraging stats.
With certain Linux machines running thousands and thousands of CPU cores, an improvement in spinlock efficiency could result in a noticeable speedup on those systems.
Andi Kleen, however, pointed out that smaller systems would be likely to experience just a few threads contending for a resource, which Waiman said was a case that ran slower than ticket spinlocks. So, this could have a negative effect on more common systems, even if it sped up large NUMA systems.
Waiman agreed that there was a slight speed loss in that case and said he'd do more testing. George Spelvin pointed out that the most common case was not several threads in contention but actually just one thread requesting a resource and finding it available – in other words, no contention at all. In that case, said George, qspinlocks ran just as fast as ticket spinlocks.
Rik van Riel liked Waiman's code as it was and gave his "Signed-Off-By" line to Waiman's patch.
Peter Zijlstra was concerned that "light to moderate contention should be the most common case." He wanted to see more profiling numbers under normal workloads – although he acknowledged that it was essentially impossible to define what a normal workload might be.
The technical discussion proceeded, with various suggestions as to how to figure out the true performance impact of Waiman's code under various conditions and how to mitigate the negative issues. Peter, in particular, wanted Waiman to explain why he claimed his code ran faster or slower under certain conditions; Waiman could only say that such tests were difficult and that he was attempting to come up with meaningful benchmarks.
Watching this sort of debate play out is fun, because there is so much uncertainty over whether a given piece of code is actually better than the thing it wants to replace. The same sort of issue plagued the "scheduler wars" years ago. Several developers had written their own process schedulers, but no one could say for certain which one was actually better – there was no way to identify a "standard" load with which to compare them. It made for many powerful flame wars.
Maybe we need something like Debian's "popularity contest," where users could record their use-patterns and send an anonymized set of stats back to a central server for analysis. Maybe that would be useful in defining some "standard loads" that could then be applied to testing kernels.
Buy this article as PDF
News site for the openSUSE community falls victim to a Wordpress exploit.
The source code is available online.
One out of three virtual machines on Microsoft Azure Cloud run Linux.
The form factor of the board makes it a drop-in replacement for Raspberry Pi.
Makes it easier for customers to move workloads into container-centric applications.
SUSE’s answer to container-centric operating systems.
Linux 4.9 is the biggest release in terms of number of commits.
The latest version of the official RHEL clone is here.
New release targets Linux professionals.
The Fedora project adds Wayland and Gnome 3.22