Prev: KVM: MMU: introduce pte_prefetch_topup_memory_cache()
Next: NET_NS: unregister_netdevice: waiting for lo to become free (after using openvpn) (was Re: sysfs bug when using tun with network namespaces)
From: David Newall on 12 Jul 2010 16:00 Stefan Richter wrote: > David Newall wrote: > >> Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I >> doubt anybody honestly thinks otherwise. >> > > It works stable for what I use it for. > Mea culpa. I didn't mean that 2.6.34 is unstable, but that the term "stable" is not appropriate for a newly released kernel; "gamma" should be used instead. Merely six months ago 2.6.32 was released; today we're preparing for 2.6.35; a new kernel every two months! Perhaps 2.6.31 is truly the latest stable kernel; or else 2.6.27 does, which is the other 2.6 on the front page of kernel.org. I'm pretty sure 2.4 is stable (which might explain why I see it embedded *much* more frequently than 2.6.) > If it doesn't for you, then I hope you are already in contact with the > respective subsystem developers to get the regressions that you > experience fixed. > (Segue to a problem which follows from calling bleeding-edge kernels "stable".) When reporting bugs, the first response is often, "we're not interested in such an old kernel; try it with the latest." That's not hugely useful when the latest kernels are not suitable for production use. If kernels weren't marked stable until they had earned the moniker, for example 2.6.27, then the expectation of developers and of users would be consistent: developers could expect users to try it again with latest stable kernel, and users could reasonably expect that trying it wouldn't break their system. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nix on 12 Jul 2010 16:30 On 11 Jul 2010, Martin Steigerwald said: > 2.6.34 was a desaster for me: bug #15969 - patch was availble before > 2.6.34 already, bug #15788, also reported with 2.6.34-rc2 already, as well > as most important two complete lockups - well maybe just X.org and radeon > KMS, I didn't start my second laptop to SSH into the locked up one - on my > ThinkPad T42. I fixed the first one with the patch, but after the lockups I > just downgraded to 2.6.33 again. [...] > hang on hibernation with kernel 2.6.34.1 and TuxOnIce 3.1.1.1 > > on this mailing list just a moment ago. But then 2.6.33 did hang with > TuxOnIce which apparently (!) wasn't a TuxOnIce problem either, since > 2.6.34 did not hang with it anymore which was a reason for me to try > 2.6.34 earlier. To introduce yet more anecdata into this thread, I too had problems with TuxOnIce-driven suspend/resume from just post-2.6.32 to just pre-2.6.34. The solution was, surprise surprise, to *raise a bug report*, whereupon in short order I had a workaround. In 2.6.34, the problem vanished as mysteriously as it appeared, as did the bug whereby X coredumped and the screen stayed dark forever upon quitting X. 2.6.34 and 2.6.34.1 have worked better for me than any kernel I've used since 2.6.30, with no bugs noticeable on any of my machines (that's a first since 2.6.26). I speculate that there may be some subtle piece of overwriting inside the Radeon KMS and/or DRM code, which is obscure enough that it is relatively easily perturbed by changes elsewhere in the kernel. But nonetheless, one cannot extrapolate from a single bug in a subsystem as complex as DRM/KMS to the quality of the entire kernel. This is doubly true given the degree of difference between different cards labelled as Radeons: I'd venture to state that most of the Radeon bugs I've seen flow past over the last year or so only affect a small subset of cards: but if you add them all up, it's likely that most users have been bitten by at least one. But the problem here is not the kernel developers, nor the kernel quality: it's that ATI Radeons are a horrifically complicated and tangled web of slightly variable hardware. (In this they are no different from any other modern graphics card.) Martin, might I suggest considering stable kernels 'experimental' until at least .1 is out? Before Linus releases a kernel, its only users are dedicated masochists and developers: after the release, piles of regular early adopters pour in, and heaps of bug reports head to lkml and fixes head to -stable. The .1 kernels, with fixes for some of those, are the first you can really call *stable*, as they've got fixes for bugs isolated after testing by a much larger userbase of suckers. -- N., dedicated sucker and masochist -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Stefan Richter on 12 Jul 2010 17:20 David Newall wrote: > Stefan Richter wrote: >> If it doesn't for you, then I hope you are already in contact with the >> respective subsystem developers to get the regressions that you >> experience fixed. >> > (Segue to a problem which follows from calling bleeding-edge kernels > "stable".) > > When reporting bugs, the first response is often, "we're not interested > in such an old kernel; try it with the latest." Because there are continuously going bug fixes into the new kernels. > That's not hugely useful when the latest kernels are not suitable for > production use. "I have this bug here." - "It might be fixed in 2.6.mn. Try it." - "I don't want to because I got burned by 2.6.jk." Well, then don't do it and keep using the old buggy kernel. Or use a forked kernel where somebody adds bugfix backports and feature backports as you require them, if that somebody does a really good job. > If kernels weren't marked stable until they had earned the moniker, > for example 2.6.27, then the expectation of developers and of users > would be consistent: 2.6.27.y is what you call stable exactly because none of the boatloads of bug fixes and improvements of each subsequent 2.6.x release goes into it anymore. That's the nature of the beast. You can't have the cake and eat it. Which is why it is important that we keep the regression count in new kernels low and try to detect and fix regressions as early as possible. I admit that I do not really help with this myself outside the subsystem which I maintain. I usually start to run -rc kernel at later -rc's only (say, -rc5, only sometimes earlier) and don't test them beyond the one or to two configurations that I use personally. There were occasionally regressions in the subsystem that I maintain but they were few and always fixed quickly, and each one was a lesson how to do better. So, for that subsystem, the "Latest Stable Kernel" that is advertised on the front page of kernel.org really and truly /is/ the latest stable release that is recommended for production use, as far as that subsystem is concerned. -- Stefan Richter -=====-==-=- -=== -==-- http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Martin Steigerwald on 12 Jul 2010 17:50 Am Montag 12 Juli 2010 schrieb David Newall: > Stefan Richter wrote: > > David Newall wrote: > >> Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I > >> doubt anybody honestly thinks otherwise. > > > > It works stable for what I use it for. > > Mea culpa. I didn't mean that 2.6.34 is unstable, but that the term > "stable" is not appropriate for a newly released kernel; "gamma" should > be used instead. I indeed think stable should mean "stable for the majority of users". Its difficult to estimate. But I doubt that every dot-0 release qualified for that. > Merely six months ago 2.6.32 was released; today we're preparing for > 2.6.35; a new kernel every two months! Perhaps 2.6.31 is truly the > latest stable kernel; or else 2.6.27 does, which is the other 2.6 on > the front page of kernel.org. I'm pretty sure 2.4 is stable (which > might explain why I see it embedded *much* more frequently than 2.6.) I have these metrics: martin(a)shambhala:~> uprecords -m 20 | cut -c1-70 # Uptime | System ----------------------------+----------------------------------------- 1 36 days, 09:57:31 | Linux 2.6.32.3-tp42-toi- Tue Jan 12 09: 2 31 days, 01:07:24 | Linux 2.6.26.5-tp42-toi- Tue Sep 30 13: 3 24 days, 13:29:07 | Linux 2.6.33.2-tp42-toi- Mon May 31 22: 4 21 days, 15:08:21 | Linux 2.6.29.2-tp42-toi- Tue Apr 28 22: 5 19 days, 21:22:14 | Linux 2.6.33.2-tp42-toi- Tue May 11 17: 6 19 days, 09:49:05 | Linux 2.6.32.8-tp42-toi- Fri Mar 5 11: 7 18 days, 02:31:41 | Linux 2.6.29.6-tp42-toi- Thu Jul 9 09: 8 17 days, 12:38:36 | Linux 2.6.28.8-tp42-toi- Wed Mar 18 10: 9 16 days, 16:10:28 | Linux 2.6.31-tp42-toi-3. Tue Sep 22 21: 10 15 days, 14:39:26 | Linux 2.6.28.4-tp42-toi- Mon Feb 9 22: 11 15 days, 13:58:12 | Linux 2.6.27.7-tp42-toi- Tue Dec 9 22: 12 13 days, 21:11:06 | Linux 2.6.31-rc7-tp42-to Mon Aug 31 21: 13 13 days, 18:34:00 | Linux 2.6.29.2-tp42-toi- Wed May 27 19: 14 12 days, 21:54:18 | Linux 2.6.26.5-tp42-toi- Fri Oct 31 13: 15 10 days, 22:02:14 | Linux 2.6.28.7-tp42-toi- Thu Feb 26 16: 16 10 days, 16:29:02 | Linux 2.6.33.2-tp42-toi- Fri Jun 25 19: 17 10 days, 08:04:52 | Linux 2.6.26.2-tp42-toi- Thu Sep 18 14: 18 10 days, 03:52:30 | Linux 2.6.31.3-tp42-toi- Thu Oct 15 09: 19 9 days, 22:03:29 | Linux 2.6.31.5-tp42-toi- Tue Nov 3 11: 20 9 days, 00:24:22 | Linux 2.6.29.2-tp42-toi- Thu Jun 25 14: ----------------------------+----------------------------------------- -> 116 0 days, 00:52:03 | Linux 2.6.33.6-tp42-toi- Mo ----------------------------+----------------------------------------- 1up in 0 days, 00:31:56 | at Mon Jul 12 23: t10 in 15 days, 13:47:24 | at Wed Jul 28 12: no1 in 36 days, 09:05:29 | at Wed Aug 18 08: up 608 days, 02:40:08 | since Thu Sep 18 14: down 54 days, 06:12:57 | since Thu Sep 18 14: %up 91.808 | since Thu Sep 18 14: And 228 entries in there in total since 2.6.26, with martin(a)shambhala:~> uprecords -m 300 | cut -c1-70 | grep "0 days" | wc -l 148 entries for shorter than one day. Sure these are not to be read without the experiences I made and the reasons for rebooting, since sometimes just I messed up with some kernel option and compiled another one. AFAIR 2.6.26 upto 2.6.32 has been fine, except 2.6.30 where TuxOnIce just didn't work, but I am not yet sure whether this was caused by TuxOnIce or by some problem with general hibernation infrastructure. I then just omitted 2.6.30. Since I only tried 2.6.31 with my T42 I got an whooping uptime of over 100 days for 2.6.29 on my T23! Thats stable. Well any kernels that reproducably reach more than 15 or 30 days are quite stable in my own subjective consideration. Most kernels that got that far would likely have lastest much longer if I didn't just compile the next one, be it a dot release or a major release. This all without Radeon KMS! 2.6.33.2 was only stable when I used Radeon KMS without TuxOnIce. Ok, so might be a TuxOnIce problem, but then at least those quite frequent hangs on hibernation at the place where the screen goes black for a few seconds and comes back then which I had with 2.6.33.2 where gone for 2.6.34. Maybe they are gone with 2.6.33.6 since it carries some more radeon drm fixes. 2.6.34 did not reach an uptime of more than 2 or 3 days yet. Well maybe Nix is right and its just that Radeon KMS has not been stabilized enough and rest of kernel is quite stable. And when the combination of 2.6.33 now .6 and userspace software suspend works for me - for the first time, often it was TuxOnIce that worked, but not any in kernel method I tried from time to time - so be it for the time being, even if userspace software suspend is way slower and doesn't satisfy the disk on writing the image. > > If it doesn't for you, then I hope you are already in contact with > > the respective subsystem developers to get the regressions that you > > experience fixed. > > (Segue to a problem which follows from calling bleeding-edge kernels > "stable".) > > When reporting bugs, the first response is often, "we're not interested > in such an old kernel; try it with the latest." That's not hugely > useful when the latest kernels are not suitable for production use. If > kernels weren't marked stable until they had earned the moniker, for > example 2.6.27, then the expectation of developers and of users would > be consistent: developers could expect users to try it again with > latest stable kernel, and users could reasonably expect that trying it > wouldn't break their system. I think thats really a question on how to attract more widespread testing. For wider spread testing it needs to be stable enough to have enough users deal with it. But without wider spread testing it might not get there. I just dropped 2.6.34 for now and I will wait for more dot releases. Maybe I am really the only one for whom 2.6.34 doesn't work, maybe just other people did so to frustrated without telling here or in bugzilla. Maybe providing better ways to report bugs and gather information even on freeze bugs without setting up too much manually could help. I certainly think that the enhanced DrKonqi crash reported from KDE 4.3 and up helped users to provide *good bug reports*. Maybe there could be something like that for the kernel and an easy option to have the kernel store even backtraces for hard crashes. Unfortunately there is no reset button on notebooks, so memory might be the wrong place. Well one could dedicate a ring buffer space on the swap partition for that or something like that - that area should be writable even when no filesystem is not working anymore. On next reboot the bug report application recovers the crash data from there. Would impose a risk that on severe memory corruption the kernels write crash data elsewhere, where it shouldn't save it. An USB stick comes to mind, but what when the USB stack doesn't work anymore? Well not every bug is a freeze bug and maybe something could be done for non freeze bugs. Like an application which records selected data while the user reproduces the bug. Just like enhanced DrKonqi collects crash data and even helps the user to install necessary debug packages. But I think when a kernel behaves to unstable for lots of users they just drop it. Some bugs are okay, but especially freeze bugs and even more so fs corruptions bugs scare non die-hard kernel debuggers who bisect a kernel a day away. Maybe I just had lots of bad luck, so I would love to hear other experiences, some already said 2.6.34 works pretty stable for them. I will leave 2.6.34.1 on my T23 which has a Savage which maybe will never get KMS, who knows, and on the workstation at work, which doesn't use Radeon KMS due to rock solid stable Debian Lenny userspace. Maybe this at least sheds a light, whether most of my issues have likely been Radeon KMS related. As a side note: Ext4 is absolutely rock stable for me! As is XFS on my T23 and even BTRFS for the T23 /home and some work directory on the workstation (not yet on my production T42). Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
From: Stefan Richter on 12 Jul 2010 18:50
Martin Steigerwald wrote: > And when the combination of 2.6.33 now .6 and userspace software suspend > works for me - for the first time, often it was TuxOnIce that worked, but > not any in kernel method I tried from time to time - so be it for the time > being, even if userspace software suspend is way slower and doesn't > satisfy the disk on writing the image. BTW, the need to rely on a quite fundamental kernel component that is not in the mainline (for whichever reason) in the long term, almost guarantees you a lot of recurring pain, one way or another. -- Stefan Richter -=====-==-=- -=== -==-= http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |