Zack's Kernel News
Zack's Kernel News
Chronicler Zack Brown reports on I3C support, fixing mmap(), and tracing RAM usage in OOM conditions.
I3C Support
Boris Brezillon posted some patches to implement a portion of the I3C core infrastructure. This is a wholesale upgrade of the I2C protocol for communications through serial ports. A lot of sensor devices use serial communications, because it's a simple two-wire interface. However, as that simplicity brings a proliferation of sensor devices, it becomes more important to manage the increased bandwidth and interrupt needs they create. I3C is designed to do that.
Boris' approach would transparently handle I2C backward compatibility for minimal user pain, but he also made certain compromises that would make using his APIs more difficult, and he left a fair chunk of the I3C API unimplemented for now, although he intends to fill it out in the future.
One implementation compromise was to require user code to run in a non-atomic context (i.e., only when the current process can be interrupted by something else). That's a slight annoyance, because it requires user code to be aware of its current state when calling the I3C API. However, Boris indicated he'd be fine with changing that. He'd mainly done it as a shortcut.
Among the missing API calls, Boris left out hot plugging support, which might be a deal-breaker for some users. Of course, the missing APIs would all be added later as the I3C code was fleshed out.
Wolfram Sang took a look at the code, but had no serious objections and generally approved the patches.
Arnd Bergmann asked why Boris had created a whole new subsystem for I3C, instead of simply extending the existing I2C code to support I3C as well. Boris replied:
I3C and I2C are really different. I'm not talking about the physical layer here, but the way the bus has to be handled by the software layer. Actually, I thin[k] the I3C bus is philosophically closer to auto-discoverable buses like USB than I2C or SPI.
Indeed, all I3C devices can be discovered and do not need to be described at the board level (using DT, board files, ACPI, or whatever). Also, some I3C devices are hot pluggable, and most importantly, all I3C devices describe themselves during the discovery procedure (called DAA in the I3C world).
There is some kind of "device class" concept. In the I3C world it's called DCR (Device Characteristic Register), but it plays the same role: It's a set of generic interfaces devices have to comply with when they declare themselves as being compatible with a DCR ID (like accelerometer, gyroscope, or whatever). [...]
Devices also expose a 48-bit Provisional ID which is made of sub-fields. Two of them are particularly interesting: the manufacturer ID and the part ID, which are comparable to the vendor and product ID in the USB world.
These three [pieces of] information (DCR, ManufacturerID, and PartID) can be used to match drivers instead of the compatible string or driver-name used for I2C devices.
So, as you can imagine, dealing with an I3C bus is really different from dealing with an I2C bus.
Boris added, "Of course, I can move all the code in drivers/i2c/
, but that won't change the fact that I3C and I2C buses are completely different with little to share between them."
Arnd didn't want to let the idea go quite yet, though. He agreed that there were reasons to oppose extending I2C to cover I3C as well, but he felt there were also reasons to go through with it, even if it involved creating an ugly mess behind the scenes. He said, "there is value in representing the physical bus hierarchy in the software model, and if I2C and I3C devices can be attached to the same host bus, a good abstraction should show them under the same parent. This is true for both the kernel representation (in sysfs
and the data structures) as well as the device tree binding (assuming we will need to represent I3C devices at all). The two don't have to use the same model, but it's easier if they do."
He also added, "We have discussed whether I2C and SPI should be merged into a single bus_type
in the past, as a lot of devices can be attached to either of them. If it's common enough for I3C devices to support an I2C fallback mode, having a common bus_type
might noticeably simplify device drivers by only requiring a single i2c_driver
structure. Simplifying many drivers a little bit can in turn offset the added complexity in the subsystem."
Boris didn't really see the benefit of these ideas and expressed surprise that I2C and SPI might ever be merged. He asked Arnd for an explanation, and Arnd replied, "well, we never changed it, so at least the work required to merge the two was considered too much to justify any advantages." But he described the rationale as simplifying kernel build-time config options. He said:
The main problem with having one driver that can operate on different bus types (I2C plus either SPI or I3C) is the handling for the various combinations in configurations (e.g., I2C=m, SPI=y).
The easy case is having a module_init
function that registers two device drivers, but that requires having a Kconfig dependency on both subsystems, and you can't use the module_i2c_driver()
helper.
The second way is to have a number of #ifdef
and complex Kconfig dependencies for the driver to only register the device_driver
objects for the buses that are enabled. This is also doable, but everyone gets the logic wrong the first time.
What we end up doing to work around this for other drivers is to have the base driver in one library module, and separate modules for the bus-specific portions, which can then use module_i2c_driver
again. There are many instances for combined I2C/SPI drivers in the kernel, and it works fine, but it adds a fair bit of overhead compared to having one driver that would, e.g., use regmap
to abstract the differences in the probe()
function and otherwise keeps everything in one place.
This made sense to Boris, and he could see the value of having a single subsystem, but he still hated the ugliness of the potential merged implementation. He asked, "Can't we solve this problem with a module_i3c_i2c_driver()
macro that would hide all this complexity from I2C/I3C drivers?"
Wolfram, who had earlier approved Boris's patches, felt that merging I2C with I3C was possibly significantly different from merging I2C with SPI. In the latter case, there was a decent likelihood that the system might only have one or the other, in which case supporting both would cover the most possibilities. But he doubted there was any hardware currently implementing both I2C and I3C. And since the I3C code was backward compatible with I2C, that seemed to obviate the need for a merger – the user could simply plug the I2C device into the port and use it.
Boris replied that although he didn't know of any devices implementing both protocols at the moment, "the spec clearly describe[s] what legacy/static addresses are for and one of their use case[s] is to connect an I3C device on an I2C bus and let it act as an I2C device," which Wolfram agreed made it more likely that a device would one day implement both protocols.
At some point, Boris took a whack at merging I2C and I3C, but he found it difficult to conceptualize a workable design. He also added, "It's kind of hard to design something when you don't have real devices. I guess I can mimic I2C for now and make it evolve based on users' needs."
Elsewhere, Greg Kroah-Hartman asked Boris to split the documentation out from the patch and submit it separately, to make the code slightly easier to go through, but he also did go through the code and offered several technical criticisms, which Boris then said he'd address.
Greg also deduced from a missing data type that Boris had never actually tested removing an I3C device once it had been installed. Boris replied, "You got me, never tried to remove a device." He asked some technical questions about how best to handle that case.
At this point, the discussion veered off into technical locking issues, as Greg and Boris worked on resolving the need to remove devices from a running system.
There was no ultimate resolution to any of the questions raised in the discussion, but it seems clear that Boris's first attempt will probably be reworked to fit Greg and Arnd's objections. It also seems like the task might be bigger than Boris had first anticipated, and he may not have the physical equipment he needs. (He remarked at one point, "all my tests have been done with dummy/fake I3C slaves emulated with a slave IP.") So, although it seems like I3C will definitely be getting into the kernel sooner rather than later, there are still some significant hurdles to overcome.
Fixing mmap()
Dan Williams had a thorny conundrum. The mmap()
system call didn't validate unknown flags, which meant that any modern new mmap
behaviors that didn't work on older systems couldn't have a graceful failover. Dan wanted to implement a new system call, mmap3()
, that would validate all flags.
There were a couple of problems. For one thing, Christoph Hellwig pointed out that "Adding new syscalls is extremely painful; it will take forever to trickle this through all architectures (especially with the various 32-bit architectures having all kinds of different granularities for the offset) and then the various C libraries, never mind applications."
Christoph suggested just using an existing __MAP_VALID
hack as a workaround.
Dan replied, "I agree with the mess and delays it causes for other archs and libc, but at the same time this is for new applications and libraries that know to look for the new flag, so they need to do the extra work to check for the new syscall."
Regarding the possibility of using __MAP_VALID
, he also pointed out that "any new mmap
flag with this hack must be documented to only work with MAP_SHARED
and that MAP_PRIVATE
is silently ignored."
However, he said he wasn't totally opposed to doing things this way if it turned out not to be too onerous.
Christoph went digging around in the mmap
code and thought he found a way to avoid the problem with ignoring MAP_PRIVATE
. In which case, he felt it certainly beat the hell out of adding a new system call.
However, Kirill A. Shutemov looked at Christoph's discovery and felt that certain architectures, in particular PA-RISC, wouldn't be able to support the fix Christoph had in mind. Christoph rejoined, "I'd be happy to say that we should not care about PA-RISC for persistent memory. We'll just have to find a way to exclude PA-RISC without making life too ugly." Kirill hated this idea, though, saying that system call interfaces should be universal and not have different behaviors depending on which machine they were run on.
The debate continued, although it seemed to be getting further afield from the original issue. Eventually Dan brought the conversation back around to the real question, saying:
The problem here is that to support new the mmap
flags the arch needs to find a flag that is guaranteed to fail on older kernels. Defining MAP_DIRECT
to 0x8
on PA-RISC doesn't work because it will simply be ignored on older PA-RISC kernels.
However, it's already the case that several archs have their own sys_mmap
entry points. Those archs that can't follow the common scheme (only PA-RISC it seems) will need to add a new mmap
syscall. I think that's a reasonable tradeoff to allow every other architecture to add this support with their existing mmap
syscall paths.
Helge Deller objected to this plan, saying, "I don't want other architectures to suffer just because of PA-RISC. But adding a new syscall just for usage on PA-RISC won't work either, because nobody will add code to call it then."
Helge proposed breaking the ABI for PA-RISC in this case. This, he said, would allow a proper fix without having to do any major contortions. He added that there were not a lot of PA-RISC users, anyway, and that most of them updated their kernels regularly, because of all the recent fixes that had gone into the code.
However, Dan replied, "The whole point is to avoid an ABI regression and the chance for false positive results. We're immediately stuck if some application was expecting 0x8
to be ignored, or conversely, an application that absolutely needs to rely on MAP_SYNC
/MAP_DIRECT
semantics assumes the wrong result on a PA-RISC kernel where they are ignored."
The debate petered out at that point, and it's not yet clear what solution they'll ultimately use for mmap()
. I found it interesting to watch how each proposed solution ran into its own stumbling blocks. New system calls take time to trickle down. The proposed workaround would only work for certain cases – or if it could be fixed to work for all cases, there was still a single holdout architecture that wouldn't go along with it. Choosing to break ABI compatibility for that one case was a nonstarter because the whole point was not to do that in the first place.
Of course, many parts of the kernel would love the opportunity to break ABI compatibility, but this seems to be one of the most sacrosanct elements of the kernel. Almost the only thing capable of trumping ABI compatibility is a security hole. Security trumps all. But one of these days, it would be wonderful to have a special ABI-breaking kernel release, where every part of the kernel is free to break the ABI for just that one time. Oh the feeding frenzy! Oh the carnage! And afterward … oh the regret! Oh the recrimination! It would be glorious.
Tracing RAM Usage in OOM Conditions
Yang Shi was annoyed when his kernel ran out of memory, and the out-of-memory (OOM) killer couldn't find a process to kill to stave off the kernel panic. If there's no process to kill, how could the system be out of memory? It turned out that the system had put all its memory into unreclaimable slabs. Yang posted a patch to add a new -U
option to the slabinfo
program to cause that program to output only data about unreclaimable slabs. This, he said, would at least allow the user to troubleshoot the problem and possibly find a way to deal with it properly.
Michal Hocko had no objection to the patch going into the kernel, but he did note that Yang's code might produce a metric ton of output, on top of the already highly verbose OOM killer report. As such, he suggested leaving the feature disabled by default.
Yang disagreed, saying the output would only be produced in the event of catastrophic failure and not as a part of normal operations. He added that the code could easily add a file to the proc filesystem (procfs
), to control the amount of output, even under that circumstance.
Michal said he didn't care that much about this particular issue, because "most OOM reports I have seen were simply user space pinned memory." He was fine leaving the output more verbose rather than less.
They kept talking, and eventually they seemed to reach a compromise when Yang suggested, "Maybe we can set a unreclaimable slab/total mem ratio. For example, when unreclaimable slab size >= 50% total memory size, then we print out slab stats in OOM? And, the ratio might be adjustable in /proc
."
This made sense to Michal, and the thread came to an end.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.
-
Fedora KDE Approved as an Official Spin
If you prefer the Plasma desktop environment and the Fedora distribution, you're in luck because there's now an official spin that is listed on the same level as the Fedora Workstation edition.
-
New Steam Client Ups the Ante for Linux
The latest release from Steam has some pretty cool tricks up its sleeve.
-
Gnome OS Transitioning Toward a General-Purpose Distro
If you're looking for the perfectly vanilla take on the Gnome desktop, Gnome OS might be for you.