Zack's Kernel News
Zack's Kernel News
Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.
Compiler bugs are rough on the Linux kernel because the kernel needs to compile on as many different systems as possible. The more constraints Linux imposes on the toolchain, the more difficult it becomes for absolutely anyone to build a kernel on their strange and unpredictable hardware setup. Traditionally, Linux makes a big effort to avoid losing compatibility with any version of the GCC compiler.
Fengguang Wu recently reported a memory paging error in the kernel and used
git bisect to trace the problem to a patch by Peter Zijlstra. Peter had implemented an optimization suggested by Linus Torvalds to use the
asm goto instruction in the
modify_and_test() functions. In theory,
asm goto was cleaner than the previous implementation, which needed an extra hardware register.
All seemed well until Fengguang reported that surprising memory error. Peter wasn't able to reproduce the bug on his system and couldn't see anything wrong with the patch itself. He suggested Fengguang test whether other compilers produced the same problem. Apparently, Fengguang had compiled the broken kernel under GCC version 4.8.1; he also tried the earlier GCC 4.6.1 but was able to reproduce the breakage.
Linus Torvalds also couldn't find anything wrong with the code. Like Peter, he thought a compiler bug was possible. He suggested a test for Fengguang that might confirm the bug but also suggested simply disabling the use of
asm goto, and seeing if that had any effect.
Fengguang replied that indeed, disabling
asm goto fixed the problem. Linus said:
Ok. So it looks very much like 'asm goto()' is simply buggered. Too bad, since it generated nice clear code.
I suspect it's the memory clobber – maybe it only marks memory as clobbered for the fall-through case, and the actual 'goto' case might used old cached values? What do I know, it's just a theory.
We do have 'asm goto' with memory clobbers elsewhere (our x86 version of __mutex_fastpath_lock()), but that use is very limited and only gets expanded in a single place. The new bitop cases get expanded *everywhere*, so if there is something subtly wrong wrt code generation that requires some particular pattern, they'd trigger it much more easily.
Linus also admitted not knowing what the true underlying problem was and asked if anyone else had better ideas.
Jakub Jelinek remarked that the patch was "just an optimization, where object files compiled without and with the patch should actually coexist fine in the same kernel." He suggested that by selectively disabling each occurrence of
asm goto, it might be possible to narrow the problem down to a single object file, and a single code routine, at which point a suitable one-line test could be written, instead of using the whole kernel to confirm the bug each time.
Linus replied that Jakub was correct in theory, but that regarding narrowing down the problem to a single code routine, "we don't have any sane way to really do that."
Linus tried to reproduce the problem on his own hardware but was unable to and remarked at one point that, "Fengguang is the only one seeing this in his automated tests."
Meanwhile, Fengguang tested the problem using the older GCC version 4.4.7 and discovered that the problem went away. He posted the
diff output between the source code of those two versions (it was surprisingly small). Peter took a look and noticed that GCC 4.4.7 simply didn't support
asm goto, so the code automatically failed over to using the previous known-to-work implementation. His discovery offered a further indication of a compiler issue directly related to
Oleg Nesterov also looked at Fengguang's output and identified a snatch of asm code that looked wrong to him and that might explain Fengguang's memory error.
Meanwhile, Ingo Molnár was trying to reproduce the bug on his own system with no success. He suggested, "Unless my testing is off it might be a bug in GCC 4.8, or a pre-existing bug gets exposed by GCC 4.8."
Peter was finally able to reproduce the bug on his own system, under GCC version 4.8.1 using some test code from Fengguang. He found that if he forced a 64-bit build instead of a 32-bit build, the problem would go away. Ingo replied, "this at least opens up the possibility that we can create a not too painful quirk and only use the 'asm goto' optimization tricks on 64-bit kernels."
Peter did some more testing and found that, like the 64-bit build, compiling for i386-SMP also would not reproduce the bug. Later, he identified the
-march=winchip2 GCC option as a means to trigger the bug.
Ingo finally managed to reproduce the bug under GCC versions 4.7.3 and 4.7.2. He suggested, "I think we might need to turn off asm goto for all things 32-bit x86."
Jakub confirmed that the problem was indeed a GCC bug. He gave a link to the newly created bugzilla ticket , which he had also been assigned to fix. He confirmed that all GCC versions from 4.6 through 4.9 had the same problem:
asm goto produced bad code.
At this point, Linus also confirmed that the problem did indeed show up on his 64-bit system. He said, "Apparently we just have a harder time hitting it in practice in the kernel on x86-64." He added, "Too bad. It makes me nervous about all our _traditional_ uses of asm goto too, never mind the new ones."
Ingo asked Jakub to please let the kernel folks know when the bug was well-enough understood to implement a decent workaround in the kernel, and the discussion migrated to considerations of possible workarounds. In a later thread, various folks posted patches to fix all occurrences of
asm goto to add the workaround, and it looked as though GCC 4.8.2 would no longer have the problem. Because of the small number of problem cases, the workaround ended up adding no bloat at all to the kernel.
Later, when announcing Linux 3.12-rc5, Linus said, "the most excitement we had this week wasn't even a kernel bug, it was a compiler bug wrt 'asm goto' that was found because of code that is pending to be merged in 3.13. But the (happily fairly straightforward) workaround for the bug was merged early, because we _do_ use asm goto, and it's unclear whether our existing use might already trigger the bug, just not enough to be as obviously noticeable."
A day or so later, Greg Kroah-Hartman fed the fix down into the 3.11 stable tree.
No Longer Registering Internal Filesystems
Al Viro wanted to clean up some legacy interface issues that he thought no one would miss. He said the kernel required code to call
register_filesystem() before mounting a filesystem, so the kernel could identify it by name and understand how to deal with it. For userland filesystems, Al said this kind of registration was important because it allowed user code to look up the internal filesystem type on the basis of the filesystem's name.
However, for internal filesystems, such as ia64 pfmfs, anon_inodes, bdev, pipefs, and sockfs, Al said
register_filesystem() had been superfluous for about 10 years already, and he wanted to get rid of it for those cases. He pointed out that for those filesystems in recent years, "kern_mount() takes a pointer to file_system_type and doesn't bother with searching for it by name."
He added that the
kern_mount() thing by itself wasn't quite enough to justify his proposed change, although it helped. Also,
register_filesystem() traditionally initialized a crucial data structure belonging to each filesystem type. A couple of years ago, he said, that data structure was reworked to no longer need
register_filesystem(). He continued, "These days there's no reason to register the filesystem types that can only be used for internal mounts."
Al added, "The only user-visible effect of dropping those register_filesystem() would be shorter /proc/filesystems – that bunch wouldn't be mentioned there anymore."
He posted a short patch, saying that this could only cause a problem if any user code depended on seeing those internal filesystems listed in
/proc/filesystems, which he doubted.
Linus Torvalds replied that he liked the patch but did worry that
/proc/filesystems would no longer contain a complete list of filesystems on any running system and that nothing else had that list either. The
/proc/modules would have the information, but only if the filesystems had been compiled as modules.
Linus concluded, "These particular filesystems I really don't think people would ever possibly check for, so I think it's fine."
Support for Programmable Instruction Sets
Michal Simek submitted a new version of his FPGA (field-programmable gate array) subsystem for consideration. FPGAs are CPU chips whose instruction set can be configured by the user after manufacture. The goal of Michal's code was "to unify all fpga drivers which in general do the same things."
He added that after a discussion with Greg Kroah-Hartman, the subsystem would support a firmware interface to load instruction sets into FPGA chips. To this, H. Peter Anvin replied, "As I have previously stated, I think this is a mistake simply because the firmware interface is a bad mapping on requirements for an FPGA, especially once you account for the vast number of ways an FPGA can get loaded and you take partial reconfiguration into account."
Later, Peter said, "The essential question is if the firmware interface really is appropriate for FPGAs. It definitely has a feel of a 'square peg in a round hole', especially when you consider the myriad ways FPGAs can be configured (some persistent, some not, some which take effect now, some which come later, some which involve bytecode interpreters …) and considering reconfiguration and partial reconfiguration."
If you look at it in general I believe that there is [a] wide range of applications which just contain one bitstream per fpga and the bitstream is replaced by [a] newer version in upgrade. For them [a] firmware interface should be pretty useful. Just set up [a] firmware name with [the] bitstream and it will be automatically loaded in [the] startup phase.
Then there is another set of applications, especially in connection to partial reconfiguration, where this can be done statically by pregenerated partial bitstreams or automatically generated on [a] target cpu. … doing everything on the target firmware interface is not the best because everything can be handled by user application[s,] and it is easier just to push this bitstream to do device and not to save it to the fs.
I think the question here is if this subsystem could have several interfaces. For example, Alan is asking for adding char support. Does it even make sense to have more interfaces with the same backend driver? When this is answered, then we can talk [about] which one[s] make sense …. In v2 is sysfs and firmware one. Adding char is also easy to do.
Alan Tull clarified, "I'm asking for just one way to write the fpga image data, not two or three. I like being able to directly write the fpga image buffer from userspace; that will support the superset of use cases."
To Michal, Peter agreed that "the firmware interface makes sense when the use of the FPGA is an implementation detail in a fixed hardware configuration, but that is a fairly restricted use case." He asked for Greg's input.
I thought this would be just like 'firmware'; you dump the file to the FPGA, it validates it and away you go with a new image running in the chip.
But, it sounds like this is much more complicated, so much so that configfs might be the correct interface for it, as you can do lots of things there, and it is very flexible (some say too flexible …).
A char device, with a zillion different custom ioctls is also a way to do it, but one that I really want to avoid as that gets messy really quickly.
Peter replied that it might not be necessary to implement a full zillion custom ioctals. But he said, "we really need to get a better understanding of the various usage cases," including cases where an FPGA driver wasn't needed at all.
Jason Gunthorpe also replied to Greg, saying that Greg's initial concept (dumping the file to the FPGA and away you go) accounted for virtually all use cases. Everything else, he said, was "fringe."
I've been doing FPGAs for >10 years and I've never once used read back via the config bus. In fact all my FPGAs turn that feature off once they are loaded. Partial reconfiguration is very specialized, and hard to use from a FPGA design standpoint.
I also think it is sensible to focus this interface on simple SRAM FPGAs, not FLASH based stuff, or whatever complex device required a byte code interpreter.
But, Peter replied, "Every FPGA toolchain I know of has a way to emit JAM/STAPL bytecode files … and a fair number of programming scenarios need them."
Jason said, "but now you are talking about JTAG. JTAG is a very different problem than configuring over the configuration bus. I don't think it makes much sense to try and combine those two things into a single subsystem. The majority use of JAM/STAPL output is for manufacturing automation. In system, in field programming of SRAM FPGAs via JTAG is uncommon."
But Peter replied, "I do it all the time. JAM/STAPL seems to me to be more used for exotic connections to serial flash for persistent programming."
Michal asked Peter to share some of his code to be a test case for further FPGA development, but Peter replied that the code was owned by his employer and couldn't be shared. They continued discussing the technical details in general terms, though, identifying a variety of use cases.
In general, the level of interest among people like Peter and Greg seems to indicate that an FPGA is likely to get into the kernel at some point, in one form or another. There also seems to be general interest in casting as wide a net as possible in terms of the feature set. (In spite of Jason's warning that "… loading the wrong FPGA could permanently destroy the system. This is why we have meta-data encoded with the bitstream.") At the same time, there's a reluctance to support absolutely every possible case if a more general interface could accomplish the same thing.
The discussion continued along technical line but without settling on anything. Michal's patch likely will have to go through some more iterations to figure out the right scope.
Buy this article as PDF
A major setback for the Linux desktop.
Improved support for GPU in virtualization.
News site for the openSUSE community falls victim to a Wordpress exploit.
The source code is available online.
One out of three virtual machines on Microsoft Azure Cloud run Linux.
The form factor of the board makes it a drop-in replacement for Raspberry Pi.
Makes it easier for customers to move workloads into container-centric applications.
SUSE’s answer to container-centric operating systems.
Linux 4.9 is the biggest release in terms of number of commits.
The latest version of the official RHEL clone is here.