Prev: [PATCH 4/6] writeback: sync expired inodes first in background writeback
Next: [2.6.34.1] OOPS in raid10 module.
From: Cong Wang on 22 Jul 2010 02:40 On 07/22/10 14:28, Eric W. Biederman wrote: > Amerigo Wang<amwang(a)redhat.com> writes: > >> Currently KEXEC_SEGMENT_MAX is only 16 which is too small for machine with >> many memory ranges. Increase this hard limit to 1024 which is reasonably large, >> and change ->segment from a static array to a dynamically allocated memory. > > ??? > > This should be about segments in the executable being loaded. What > executable has one segment for each range of physical memory? > > Not that generalizing this is a bad idea but with a comment that > seems entirely wrong I am wondering what the problem really is. > Ah, I think Neil should explain this. He made a patch which includes many memory ranges, caused kexec fails to load the kernel. Increasing this limit and the corresponding one in kexec-tools fixes the problem. His patch is not in upstream kexec-tools, AFAIK. However, even if we don't consider that patch, isn't 16 too small too? Thanks. -- The opposite of love is not hate, it's indifference. - Elie Wiesel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Eric W. Biederman on 24 Jul 2010 23:00 huang ying <huang.ying.caritas(a)gmail.com> writes: > On Thu, Jul 22, 2010 at 3:08 PM, Eric W. Biederman > <ebiederm(a)xmission.com> wrote: >> Cong Wang <amwang(a)redhat.com> writes: >> >>> On 07/22/10 14:28, Eric W. Biederman wrote: >>>> Amerigo Wang<amwang(a)redhat.com> writes: >>>> >>>>> Currently KEXEC_SEGMENT_MAX is only 16 which is too small for machine with >>>>> many memory ranges. Increase this hard limit to 1024 which is reasonably large, >>>>> and change ->segment from a static array to a dynamically allocated memory. >>>> >>>> ??? >>>> >>>> This should be about segments in the executable being loaded. What >>>> executable has one segment for each range of physical memory? >>>> >>>> Not that generalizing this is a bad idea but with a comment that >>>> seems entirely wrong I am wondering what the problem really is. >>>> >>> >>> Ah, I think Neil should explain this. >>> >>> He made a patch which includes many memory ranges, caused kexec >>> fails to load the kernel. Increasing this limit and the corresponding >>> one in kexec-tools fixes the problem. His patch is not in upstream >>> kexec-tools, AFAIK. >>> >>> However, even if we don't consider that patch, isn't 16 too small too? >> >> Generally you just need one physical hunk for the code, maybe a second >> for the initrd. >> >> It is perfectly fine to raise the number of segments as it doesn't >> affect the ABI, but it wants a good explanation of what kind of weird >> application wants to write to all over memory when it is loaded. > > kexec can be used to load not only the kernel images, but also more > complex images such as hibernation image. So I think it is good to > raise the number of segments. Totally reasonable. And in all fairness the patch does a good job of raising the limit. However if that is the goal 1024 is probably a bit low as I believe SGI has built machines with that many nodes. Still after the patch under discussion 1024 was only a limit in a header file so it can be trivially changed. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on 27 Jul 2010 04:20 On 07/26/10 20:19, Eric W. Biederman wrote: > Cong Wang<amwang(a)redhat.com> writes: > >> So, what is a better number? 2048? :) > > I was thinking something a little ridiculous like 10K or 64K. > > Assuming the usage you care about is something like hibernate on a > machine with disjoint memory where you truly need one segment for > each memory region. > > The only point of a limit at all once we introduce dynamic allocation > is to catch buggy apps. > Ok, I will change that number and improve the changelog. Thanks. -- The opposite of love is not hate, it's indifference. - Elie Wiesel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on 29 Jul 2010 02:40 On 07/27/10 18:00, Milton Miller wrote: > [ Added kexec at lists.infradead.org and linuxppc-dev(a)lists.ozlabs.org ] > >> >> Currently KEXEC_SEGMENT_MAX is only 16 which is too small for machine with >> many memory ranges. When hibernate on a machine with disjoint memory we do >> need one segment for each memory region. Increase this hard limit to 16K >> which is reasonably large. >> >> And change ->segment from a static array to a dynamically allocated memory. >> >> Cc: Neil Horman<nhorman(a)redhat.com> >> Cc: huang ying<huang.ying.caritas(a)gmail.com> >> Cc: Eric W. Biederman<ebiederm(a)xmission.com> >> Signed-off-by: WANG Cong<amwang(a)redhat.com> >> >> --- >> diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c >> index ed31a29..f115585 100644 >> --- a/arch/powerpc/kernel/machine_kexec_64.c >> +++ b/arch/powerpc/kernel/machine_kexec_64.c >> @@ -131,10 +131,7 @@ static void copy_segments(unsigned long ind) >> void kexec_copy_flush(struct kimage *image) >> { >> long i, nr_segments = image->nr_segments; >> - struct kexec_segment ranges[KEXEC_SEGMENT_MAX]; >> - >> - /* save the ranges on the stack to efficiently flush the icache */ >> - memcpy(ranges, image->segment, sizeof(ranges)); >> + struct kexec_segment range; > > I'm glad you found our copy on the stack and removed the stack overflow > that comes with this bump, but ... > >> >> /* >> * After this call we may not use anything allocated in dynamic >> @@ -148,9 +145,11 @@ void kexec_copy_flush(struct kimage *image) >> * we need to clear the icache for all dest pages sometime, >> * including ones that were in place on the original copy >> */ >> - for (i = 0; i< nr_segments; i++) >> - flush_icache_range((unsigned long)__va(ranges[i].mem), >> - (unsigned long)__va(ranges[i].mem + ranges[i].memsz)); >> + for (i = 0; i< nr_segments; i++) { >> + memcpy(&range,&image->segment[i], sizeof(range)); >> + flush_icache_range((unsigned long)__va(range.mem), >> + (unsigned long)__va(range.mem + range.memsz)); >> + } >> } > > This is executed after the copy, so as it says, > "we may not use anything allocated in dynamic memory". > > We could allocate control pages to copy the segment list into. > Actually ppc64 doesn't use the existing control page, but that > is only 4kB today. > > We need the list to icache flush all the pages in all the segments. > The as the indirect list doesn't have pages that were allocated at > their destination. > > Or maybe the icache flush should be done in the generic code > like it does for crash load segments? > I don't get the point here, according to the comments, it is copied into stack because of efficiency. -- The opposite of love is not hate, it's indifference. - Elie Wiesel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on 4 Aug 2010 22:30
(Ping Milton...) On 07/29/10 14:42, Cong Wang wrote: > On 07/27/10 18:00, Milton Miller wrote: >> [ Added kexec at lists.infradead.org and linuxppc-dev(a)lists.ozlabs.org ] >> >>> >>> Currently KEXEC_SEGMENT_MAX is only 16 which is too small for machine >>> with >>> many memory ranges. When hibernate on a machine with disjoint memory >>> we do >>> need one segment for each memory region. Increase this hard limit to 16K >>> which is reasonably large. >>> >>> And change ->segment from a static array to a dynamically allocated >>> memory. >>> >>> Cc: Neil Horman<nhorman(a)redhat.com> >>> Cc: huang ying<huang.ying.caritas(a)gmail.com> >>> Cc: Eric W. Biederman<ebiederm(a)xmission.com> >>> Signed-off-by: WANG Cong<amwang(a)redhat.com> >>> >>> --- >>> diff --git a/arch/powerpc/kernel/machine_kexec_64.c >>> b/arch/powerpc/kernel/machine_kexec_64.c >>> index ed31a29..f115585 100644 >>> --- a/arch/powerpc/kernel/machine_kexec_64.c >>> +++ b/arch/powerpc/kernel/machine_kexec_64.c >>> @@ -131,10 +131,7 @@ static void copy_segments(unsigned long ind) >>> void kexec_copy_flush(struct kimage *image) >>> { >>> long i, nr_segments = image->nr_segments; >>> - struct kexec_segment ranges[KEXEC_SEGMENT_MAX]; >>> - >>> - /* save the ranges on the stack to efficiently flush the icache */ >>> - memcpy(ranges, image->segment, sizeof(ranges)); >>> + struct kexec_segment range; >> >> I'm glad you found our copy on the stack and removed the stack overflow >> that comes with this bump, but ... >> >>> >>> /* >>> * After this call we may not use anything allocated in dynamic >>> @@ -148,9 +145,11 @@ void kexec_copy_flush(struct kimage *image) >>> * we need to clear the icache for all dest pages sometime, >>> * including ones that were in place on the original copy >>> */ >>> - for (i = 0; i< nr_segments; i++) >>> - flush_icache_range((unsigned long)__va(ranges[i].mem), >>> - (unsigned long)__va(ranges[i].mem + ranges[i].memsz)); >>> + for (i = 0; i< nr_segments; i++) { >>> + memcpy(&range,&image->segment[i], sizeof(range)); >>> + flush_icache_range((unsigned long)__va(range.mem), >>> + (unsigned long)__va(range.mem + range.memsz)); >>> + } >>> } >> >> This is executed after the copy, so as it says, >> "we may not use anything allocated in dynamic memory". >> >> We could allocate control pages to copy the segment list into. >> Actually ppc64 doesn't use the existing control page, but that >> is only 4kB today. >> >> We need the list to icache flush all the pages in all the segments. >> The as the indirect list doesn't have pages that were allocated at >> their destination. >> >> Or maybe the icache flush should be done in the generic code >> like it does for crash load segments? >> > > I don't get the point here, according to the comments, > it is copied into stack because of efficiency. > -- The opposite of love is not hate, it's indifference. - Elie Wiesel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |