Zack's Kernel News
Zack's Kernel News
Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.
Delaying Patches to the Stable Tree
Greg Kroah-Hartman recently decided to make a small change to the way the stable kernels accepted patches. Typically, any patch intended for the stable series had to get into Linus Torvalds's tree first, thus guaranteeing that it wouldn't just become an aging artifact of the stable series but would persist in the official tree as well. This policy has drawn its share of complaints, because certain relatively obvious fixes had to be delayed by Linus's -rc release cycle.
Greg recently applied a patch to the stable series to fix a race condition in the genetlink code. However, although the patch had gone to Linus's tree first, it hadn't been tested long enough and turned out to have a significant bug. The problem wasn't caught until Greg had released the official 3.0.92 kernel.
Linus suggested, "I really think there should be delay before stable picks up stuff from mainline, unless it's something particularly critical and well-discussed. Maybe a week or so." He added that the buggy code was already in the 3.0 and 3.10 kernels, so it would actually have to be reverted in multiple places.
Greg said he'd take care of pushing out a new debugged 3.0 kernel and, in a separate thread, he proposed some changes to how he would accept patches in the future. The primary change was that, instead of simply waiting for a patch to appear in Linus's Git repository, he'd wait until the corresponding -rc version came out. Only then would Greg consider including it in an upcoming stable kernel release.
Greg added, however, that under certain conditions, he wouldn't need to follow this delay. If the official maintainer of the feature in question acknowledged the patch as being good, or sent it to Greg directly, for example, that would be sufficient to get the code in before it appeared in an -rc version.
Likewise, if Greg saw enough discussion of the patch to ensure that it had been getting proper testing and was probably good, then he wouldn't have to wait for the -rc version. Finally, if the patch was just obviously fine and clearly too simple to cause a problem, Greg would accept it before the -rc version as well.
So, a maintainer, a discussion, or Greg's own eyes could bring a patch early to the stable series. Anything beyond that would have to bake in Linus's Git tree until a new -rc version came out.
Nicholas A. Bellinger thought that this new proposed delay would be acceptable, as long as the really important fixes were not delayed too long.
Willy Tarreau proposed an even more extreme approach. He suggested not just waiting for a new -rc version but actually waiting for a new -rc1 version, after a full development cycle had completed. The only exception, beyond those Greg already proposed, was that patches marked "urgent" would be considered sooner. Willy said, "I tend to think it will be hard at the beginning but will quickly motivate maintainers to care more about their fixes flow towards -stable depending on their severity."
Greg replied, "Waiting 3 months is too long, in my opinion, sorry."
Josh Boyer liked Greg's initial proposal, although he suggested waiting for -rc2 for all patches marked "stable" that got into Linus's tree during the merge window. Greg said, "Maintainers can always tag their patches to have me hold off until -rc2 for that." However, he added, "given the huge number of patches for stable that come in during the -rc1 merge window, there's no way I can get to them all before -rc2 comes out, so this will probably end up being the case for most of them anyway."
Josh pointed out that relying on the sheer pressure of numbers to slow down patch acceptance was perhaps not the ideal approach. He offered to help if Greg had any ideas about how. Greg replied, "Letting me know when something breaks is always good as well. Right now that doesn't seem to happen much, so either not much is breaking, or I'm just not told about it, I don't know which."
Josh rattled off a list of a bunch of broken stuff, although he acknowledged that most were things Greg already knew about or had already been fixed. Regarding his work on Rawhide, he said, "In the future, if we can get the information from the end user in time, I'll be happy to forward issues that aren't already reported onwards. Or if you still want to hear about it, I can chime in on the existing threads with bugzilla numbers. I'm also willing to do a monthly 'patches we're carrying not in stable' report if people find that helpful. I'll likely be doing that within Fedora already and I'm happy to send it to stable@, even if those patches aren't exactly stable-rules matching. We did that when kernel.org went down and it helped then, just not sure how much it would help now or if people care."
Regarding Josh's "Patches We're Carrying That Are Not In Stable" report, Greg replied, "I would love that report, one of the things I keep asking for is for people to send the patches that distros have that are not in stable to me, as those obviously are things that are needed for a valid reason that everyone should be able to benefit from."
Elsewhere, the discussion continued, on what constituted an appropriate delay between patches going into Linus's tree, and then into Greg's tree. Steven Rostedt suggested that it would be good to have some statistics on how long it took a serious flaw to be discovered in patches that were intended for the stable series. Was it a single -rc release? Was it two? He said, "We should be using past data to determine these heuristics. But I don't have that data, but Greg should."
Steven also suggested, "A patch ideally should simmer in linux-next for a bit, then go into mainline. It should also simmer there for a bit before it goes into stable. Except for those very rare patches that can cause the system to easily crash, let an exploit happen, or corrupt data. Those do need to go into stable ASAP. But luckily, those are also rare and far between."
Stephen Warren also responded to Greg's initial proposal. Stephen suggested that instead of waiting until the next -rc version from Linus, Greg should wait for the second -rc. Stephen said that once the first -rc came out, people would start testing it, and presumably any problems would be found before the second -rc. He said, "That's still only an average of 1.5 weeks delay, with a min-max of 1-2, ignoring the merge window and assuming bugfix patches go quickly upstream from subsystem maintainers."
At this point, Linus entered the discussion, saying:
"I don't think the -rc releases are all that important either. The important part is to _wait_. Not 'wait for an -rc'. There are reasonable number of developers and users who actually run git kernels, just because they want to help. At rc points, you tend to get a few more of those.
In contrast, when patches get moved from the development tree to stable within a day or two, that patch has gotten basically _no_ testing in some cases (or rather, it's been tested to fix the thing it was supposed to fix, but then there are surprising new problems that it introduces that nobody even thought about, and wasn't tested for).
So I don't think 'is in an rc release' is the important thing. I think 'has been in the standard git tree for at least a week' is what we should aim for.
Will it catch all cases? Hell no. We don't have *that* many people who run git kernels, and even people who do don't tend to update daily anyway. But at least this kind of embarrassing 'We found a bug within almost minutes of it hitting mainline' should not make it into stable."
Regarding Linus's statement that some developers did run Git kernels, and that this was good, Borislav Petkov asked if Linus meant that *more* people should run Git kernel. He said, "We don't want to run daily snapshots of your tree though, right? Only -rcs because the daily states are kinda arbitrary and they can be broken in various ways. Or are we at a point in time where we can amend that rule?"
Borislav added, "What I do currently is, I take your -rcX something and merge tip/master on top of it and this is running on my machines for that week. Come next week, I rinse and repeat. Or does it make sense to do that more than once a week?"
Tony Luck pointed out, "If *nobody* runs daily snapshots – then problems just sit latent all week until the -rc is released and people start testing. Doesn't sound optimal." He added, "Running daily git snapshots can be 'exciting' during the merge window. But I rarely see problems running a random build after -rc1. If you are still running that ancient 3.11-rc6 released on Sunday – then you are missing out on 28 commits worth of goodness since then :-)"
The discussion petered out shortly afterward, but it looks as if Greg's initial proposal – or something nearly identical to it, such as Linus's "wait a week" idea – will be adopted. So, for the people who complain that patches take too long to get into the stable tree because they have to hit Linus's tree first … there will be more to complain about. It does seem as though some slight delay will help prevent stable releases from having bugs like the one that inspired Greg's initial proposal.
Un-Behemothing the Device Tree Behemoth
Grant Likely announced that Linux's device tree (DT) bindings needed a larger group of maintainers to keep up with the load. DT bindings are machine-parsable descriptions of certain aspects of hardware. By relying on DT bindings, the kernel doesn't have to hard code every aspect of a piece of hardware.
Grant said, "Neither Rob [Herring] nor I can keep up with the load and there are a lot of poorly designed bindings appearing in the tree." Grant had already sent in a patch listing more people as co-maintainers, and he invited anyone else interested in the project to volunteer.
Grant also raised the issue of schema validation. Each device tree binding follows a standard format – or at least it should – but this wasn't really checked anywhere. Grant announced that Tomasz Figa was going to work on enforcing valid schemas. Ideally, invalid schemas would get flagged at compile time, whereas valid schemas would be human-readable, as well as machine-parsable, and would double as their own documentation.
Grant also raised the issue of the development process itself. As of that time, there was no real process in place, just a frantic scurrying. Should there be a distinction between staged bindings and stable bindings? How should bindings be merged into the kernel – through subsystem trees or only through a centralized DT bindings tree?
Tomasz replied with his thoughts as well. He pointed out that once a DT binding had been marked "stable," it should be considered part of the official application binary interface (ABI); thus, any further changes could never be allowed to break backward compatibility.
He also said that DT bindings should enter a staging state at first and that there should be periodic meetings (perhaps on IRC, he suggested) to decide which staged bindings were ready to be marked as stable.
David Gibson was interested in following Tomasz's work. Regarding schema validation in particular, David had tried to implement something along similar lines in the past but hadn't gotten very far. He was interested in watching how the new effort proceeded.
David also added, "It seems to me that the kernel tree has become the informal repository for board dts files is in itself a problem. It encourages people to think the two are closely linked and that all that matters is that a specific dts works with its corresponding kernel and vice versa, rather than fdts being the general description they're supposed to be."
However, he acknowledged that he had no solution for that.
Elsewhere, regarding schema validation, Alison Chaiken asked how this was going to be done. She said, "Since device-tree source looks a bit like XML (or maybe more like JSON), will the schemas be similar in spirit to DTDs, and is it helpful to think of the validator in this spirit? Or will the checker be more like "gcc -Wall," since it will be invoked by a compiler?"
Tomasz replied, "My idea is to implement compile time verification in dtc, so I guess it will be more like the latter. Since dts is what dtc can already parse, my plan is to keep the schemas in spirit to dts, just modifying/extending it to allow specifying bindings with them, rather than static values."
Jon Loeliger replied, "It is possible to add some-damn XML DTD parsing and rule glomming even in DTC if that is what is wanted." But, Grant retched into his hand, and Tomasz said that he'd prefer not to introduce XML in addition to the other DT binding format already in use.
Grant also got back to Tomasz's question about distinguishing between staged and stable bindings. He said that simply using two different directories in the repository would probably be the most manageable solution. A less desirable option, he said, would be to add a text tag within the bindings files themselves.
Grant reiterated, "Once a binding is moved into the stable directory, only backwards compatible changes should be allowed." Stephen Warren responded to this, saying that it might be important to mark only certain parts of a DT binding as stable and allow the details to stabilize or remain in the staging state as needed. He suggested that it might be a good idea to use Grant's idea of separate directories, not just for individual bindings, but for the stable and staged portions of a single DT binding.
Grant also gave a relatively detailed explanation of the sort of DT bindings schema he'd like to see emerge from Tomasz's work – which made it clear that the discussion was not just about validation, but also about creating an entirely new way of expressing DT bindings. See his post  for the full(ish) description.
The discussion ended without firm conclusions on most of the issues. However, it seems clear that now: There are a great many more DT bindings maintainers; the bindings format will become more standardized and readable; patches will make their way into the kernel in a more orderly fashion than before; and bindings will be organized in such a way that one can recognize when backward-incompatible changes are and are not acceptable.
- The future of DT binding maintainership: https://lkml.org/lkml/2013/7/25/8
Buy this article as PDF
Makes it easier for customers to move workloads into container-centric applications.
SUSE’s answer to container-centric operating systems.
Linux 4.9 is the biggest release in terms of number of commits.
The latest version of the official RHEL clone is here.
New release targets Linux professionals.
The Fedora project adds Wayland and Gnome 3.22
CeBIT 2017: Open Source Forum Call for Papers
Long-time Linux antagonist joins the revolution.
Major bug affects Debian/Ubuntu distributions.
Canonical releases the minimal edition for embedded devices, Internet of Things, and cloud deployments.