Prev: [MeeGo-Dev][PATCH] Topcliff: Update PCH_IEEE1588 driver to 2.6.35
Next: [PATCHv3] fixed resource leak in scripts/mod/modpost.c
From: Jens Axboe on 10 Aug 2010 20:00 On 08/10/2010 12:35 PM, Jeff Layton wrote: > On Tue, 10 Aug 2010 12:10:05 -0400 > Jens Axboe <axboe(a)kernel.dk> wrote: > >> On 08/10/2010 10:27 AM, Jeff Layton wrote: >>> On Tue, 10 Aug 2010 10:22:41 -0400 >>> Jeff Moyer <jmoyer(a)redhat.com> wrote: >>> >>>> Jeff Layton <jlayton(a)redhat.com> writes: >>>> >>>>> Saw this oops on my test machine this morning. I rebooted the machine >>>>> last night and hadn't done anything on it other than log in this >>>>> morning. The kernel here is based on Steve French's git tree, which is >>>>> based on Linus' as of Sunday Aug 8th. Last non-cifs commit is: >>>> >>>> This looks a lot like this bug: >>>> https://bugzilla.redhat.com/show_bug.cgi?id=577968 >>>> >>>> See also: >>>> http://kerneloops.org/guilty.php?guilty=cfq_free_io_context&version=2.6.34-rc&start=2228224&end=2260991&class=oops >>>> >>>> It's been around since 2.6.30.8 according to kerneloops.org. If you >>>> find that you have a reliable way of reproducing the issue, that would >>>> be great. >>>> >>> >>> Ok, thanks -- no clear reproducer so far. This morning was the >>> first time I've seen it and it was on the console of my rawhide >>> machine. The last thing I did with it was reboot it last night. I >>> suspect that the gzip process came from a cron job or something. >> >> What version did you hit it on? >> > > It was a kernel built out of git, based on Steve French's git tree. The > last commit from Linus in it was > 45d7f32c7a43cbb9592886d38190e379e2eb2226. Everything else on top of > that was patches that only touched cifs code. cifs.ko hadn't been > plugged in since it was rebooted. OK. That bug is pretty elusive, so far I haven't been able to figure out what the heck is going on here and my attempts at reproducing have all failed. The reports so far seem to have the cron component in common. Does fedora ionice some cron jobs or anything like that? Or use CLONE_IO? -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jeff Layton on 10 Aug 2010 21:30 On Tue, 10 Aug 2010 19:58:41 -0400 Jens Axboe <axboe(a)kernel.dk> wrote: > On 08/10/2010 12:35 PM, Jeff Layton wrote: > > On Tue, 10 Aug 2010 12:10:05 -0400 > > Jens Axboe <axboe(a)kernel.dk> wrote: > > > >> On 08/10/2010 10:27 AM, Jeff Layton wrote: > >>> On Tue, 10 Aug 2010 10:22:41 -0400 > >>> Jeff Moyer <jmoyer(a)redhat.com> wrote: > >>> > >>>> Jeff Layton <jlayton(a)redhat.com> writes: > >>>> > >>>>> Saw this oops on my test machine this morning. I rebooted the machine > >>>>> last night and hadn't done anything on it other than log in this > >>>>> morning. The kernel here is based on Steve French's git tree, which is > >>>>> based on Linus' as of Sunday Aug 8th. Last non-cifs commit is: > >>>> > >>>> This looks a lot like this bug: > >>>> https://bugzilla.redhat.com/show_bug.cgi?id=577968 > >>>> > >>>> See also: > >>>> http://kerneloops.org/guilty.php?guilty=cfq_free_io_context&version=2.6.34-rc&start=2228224&end=2260991&class=oops > >>>> > >>>> It's been around since 2.6.30.8 according to kerneloops.org. If you > >>>> find that you have a reliable way of reproducing the issue, that would > >>>> be great. > >>>> > >>> > >>> Ok, thanks -- no clear reproducer so far. This morning was the > >>> first time I've seen it and it was on the console of my rawhide > >>> machine. The last thing I did with it was reboot it last night. I > >>> suspect that the gzip process came from a cron job or something. > >> > >> What version did you hit it on? > >> > > > > It was a kernel built out of git, based on Steve French's git tree. The > > last commit from Linus in it was > > 45d7f32c7a43cbb9592886d38190e379e2eb2226. Everything else on top of > > that was patches that only touched cifs code. cifs.ko hadn't been > > plugged in since it was rebooted. > > OK. That bug is pretty elusive, so far I haven't been able to figure > out what the heck is going on here and my attempts at reproducing > have all failed. The reports so far seem to have the cron component > in common. Does fedora ionice some cron jobs or anything like that? > Or use CLONE_IO? > Yes. I sort of doubt anything there would use CLONE_IO, but ionice is definitely used. Fedora uses anacron. I don't see any explicit calls to gzip in there, but it's possible something else is calling it: # grep ionice /etc/cron.*/* /etc/cron.daily/mlocate.cron:ionice -c2 -n7 -p $$ >/dev/null 2>&1 /etc/cron.daily/readahead.cron:ionice -c3 -p $$ >/dev/null 2>&1 # cat /etc/anacrontab # /etc/anacrontab: configuration file for anacron # See anacron(8) and anacrontab(5) for details. SHELL=/bin/sh PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root # the maximal random delay added to the base delay of the jobs RANDOM_DELAY=45 # the jobs will be started during the following hours only START_HOURS_RANGE=3-22 #period in days delay in minutes job-identifier command 1 5 cron.daily nice run-parts /etc/cron.daily 7 25 cron.weekly nice run-parts /etc/cron.weekly @monthly 45 cron.monthly nice run-parts /etc/cron.monthly -- Jeff Layton <jlayton(a)redhat.com> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on 11 Aug 2010 09:30 On 08/10/2010 09:23 PM, Jeff Layton wrote: > On Tue, 10 Aug 2010 19:58:41 -0400 > Jens Axboe <axboe(a)kernel.dk> wrote: > >> On 08/10/2010 12:35 PM, Jeff Layton wrote: >>> On Tue, 10 Aug 2010 12:10:05 -0400 >>> Jens Axboe <axboe(a)kernel.dk> wrote: >>> >>>> On 08/10/2010 10:27 AM, Jeff Layton wrote: >>>>> On Tue, 10 Aug 2010 10:22:41 -0400 >>>>> Jeff Moyer <jmoyer(a)redhat.com> wrote: >>>>> >>>>>> Jeff Layton <jlayton(a)redhat.com> writes: >>>>>> >>>>>>> Saw this oops on my test machine this morning. I rebooted the machine >>>>>>> last night and hadn't done anything on it other than log in this >>>>>>> morning. The kernel here is based on Steve French's git tree, which is >>>>>>> based on Linus' as of Sunday Aug 8th. Last non-cifs commit is: >>>>>> >>>>>> This looks a lot like this bug: >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=577968 >>>>>> >>>>>> See also: >>>>>> http://kerneloops.org/guilty.php?guilty=cfq_free_io_context&version=2.6.34-rc&start=2228224&end=2260991&class=oops >>>>>> >>>>>> It's been around since 2.6.30.8 according to kerneloops.org. If you >>>>>> find that you have a reliable way of reproducing the issue, that would >>>>>> be great. >>>>>> >>>>> >>>>> Ok, thanks -- no clear reproducer so far. This morning was the >>>>> first time I've seen it and it was on the console of my rawhide >>>>> machine. The last thing I did with it was reboot it last night. I >>>>> suspect that the gzip process came from a cron job or something. >>>> >>>> What version did you hit it on? >>>> >>> >>> It was a kernel built out of git, based on Steve French's git tree. The >>> last commit from Linus in it was >>> 45d7f32c7a43cbb9592886d38190e379e2eb2226. Everything else on top of >>> that was patches that only touched cifs code. cifs.ko hadn't been >>> plugged in since it was rebooted. >> >> OK. That bug is pretty elusive, so far I haven't been able to figure >> out what the heck is going on here and my attempts at reproducing >> have all failed. The reports so far seem to have the cron component >> in common. Does fedora ionice some cron jobs or anything like that? >> Or use CLONE_IO? >> > > Yes. I sort of doubt anything there would use CLONE_IO, but ionice is > definitely used. Fedora uses anacron. I don't see any explicit calls to > gzip in there, but it's possible something else is calling it: > > # grep ionice /etc/cron.*/* > /etc/cron.daily/mlocate.cron:ionice -c2 -n7 -p $$ >/dev/null 2>&1 > /etc/cron.daily/readahead.cron:ionice -c3 -p $$ >/dev/null 2>&1 > > # cat /etc/anacrontab > # /etc/anacrontab: configuration file for anacron > > # See anacron(8) and anacrontab(5) for details. > > SHELL=/bin/sh > PATH=/sbin:/bin:/usr/sbin:/usr/bin > MAILTO=root > # the maximal random delay added to the base delay of the jobs > RANDOM_DELAY=45 > # the jobs will be started during the following hours only > START_HOURS_RANGE=3-22 > > #period in days delay in minutes job-identifier command > 1 5 cron.daily nice run-parts /etc/cron.daily > 7 25 cron.weekly nice run-parts /etc/cron.weekly > @monthly 45 cron.monthly nice run-parts /etc/cron.monthly ionice must be a deciding factor in this, perhaps coupled with something else. Otherwise we would be seeing a lot more of these. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jeff Moyer on 11 Aug 2010 11:50
Jens Axboe <axboe(a)kernel.dk> writes: >> #period in days delay in minutes job-identifier command >> 1 5 cron.daily nice run-parts /etc/cron.daily >> 7 25 cron.weekly nice run-parts /etc/cron.weekly >> @monthly 45 cron.monthly nice run-parts /etc/cron.monthly > > ionice must be a deciding factor in this, perhaps coupled with something > else. Otherwise we would be seeing a lot more of these. Well, what's really strange is that this is only affecting f14. I'm installing a system and I'll see if I can't reproduce it. Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |