Prev: perf: Provide PERF_SAMPLE_REGS
Next: [PATCH] staging/dt3155: fix build error caused by {write|read}l change
From: Masami Hiramatsu on 3 Mar 2010 13:10 Peter Zijlstra wrote: > PEBS always reports the IP+1, that is the instruction after the one > that got sampled, cure this by using the LBR to reliably rewind the > instruction stream. Hmm, does PEBS always report one byte after the end address of the sampled instruction? Or the instruction which will be executed next step? [...] > +#include <asm/insn.h> > + > +#define MAX_INSN_SIZE 16 Hmm, we'd better integrate these kinds of definitions into asm/insn.h... (several features define it) > + > +static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs) > +{ > +#if 0 > + /* > + * Borken, makes the machine expode at times trying to > + * derefence funny userspace addresses. > + * > + * Should we always fwd decode from @to, instead of trying > + * to rewind as implemented? > + */ > + > + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); > + unsigned long from = cpuc->lbr_entries[0].from; > + unsigned long to = cpuc->lbr_entries[0].to; Ah, I see. For branch instruction case, we can use LBR to find previous IP... > + unsigned long ip = regs->ip; > + u8 buf[2*MAX_INSN_SIZE]; > + u8 *kaddr; > + int i; > + > + if (from && to) { > + /* > + * We sampled a branch insn, rewind using the LBR stack > + */ > + if (ip == to) { > + regs->ip = from; > + return; > + } > + } > + > + if (user_mode(regs)) { > + int bytes = copy_from_user_nmi(buf, > + (void __user *)(ip - MAX_INSN_SIZE), > + 2*MAX_INSN_SIZE); > + maybe, you'd better check the source address range is within the user address range. e.g. ip < MAX_INSN_SIZE. > + /* > + * If we fail to copy the insn stream, give up > + */ > + if (bytes != 2*MAX_INSN_SIZE) > + return; > + > + kaddr = buf; > + } else > + kaddr = (void *)(ip - MAX_INSN_SIZE); It also needs to be checked this address within kernel text. > + > + /* > + * Try to find the longest insn ending up at the given IP > + */ > + for (i = MAX_INSN_SIZE; i > 0; i--) { > + struct insn insn; > + > + kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i); > + insn_get_length(&insn); > + if (insn.length == i) { > + regs->ip -= i; > + return; > + } > + } Hmm, this will not work correctly on x86, since the decoder can miss-decode the tail bytes of previous instruction as prefix bytes. :( Thus, if you want to rewind instruction stream, you need to decode a function (or basic block) entirely. Thank you, > + > + /* > + * We failed to find a match for the previous insn.. give up > + */ > +#endif > +} > + > static int intel_pmu_save_and_restart(struct perf_event *event); > static void intel_pmu_disable_event(struct perf_event *event); > > @@ -458,6 +532,8 @@ static void intel_pmu_drain_pebs_core(st > > PEBS_TO_REGS(at, ®s); > > + intel_pmu_pebs_fixup_ip(®s); > + > if (perf_event_overflow(event, 1, data, ®s)) > intel_pmu_disable_event(event); > > @@ -519,6 +595,7 @@ static void intel_pmu_drain_pebs_nhm(str > data->period = event->hw.last_period; > > PEBS_TO_REGS(at, ®s); > + intel_pmu_pebs_fixup_ip(®s); > > if (perf_event_overflow(event, 1, data, ®s)) > intel_pmu_disable_event(event); > > -- -- Masami Hiramatsu e-mail: mhiramat(a)redhat.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 3 Mar 2010 14:40 On Wed, 2010-03-03 at 13:05 -0500, Masami Hiramatsu wrote: > Peter Zijlstra wrote: > > PEBS always reports the IP+1, that is the instruction after the one > > that got sampled, cure this by using the LBR to reliably rewind the > > instruction stream. > > Hmm, does PEBS always report one byte after the end address of the > sampled instruction? Or the instruction which will be executed next > step? The next instruction, its trap like. > [...] > > +#include <asm/insn.h> > > + > > +#define MAX_INSN_SIZE 16 > > Hmm, we'd better integrate these kinds of definitions into > asm/insn.h... (several features define it) Agreed, I'll look at doing a patch to collect them all into asm/insn.h if nobody beats me to it :-) > > + > > +static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs) > > +{ > > +#if 0 > > + /* > > + * Borken, makes the machine expode at times trying to > > + * derefence funny userspace addresses. > > + * > > + * Should we always fwd decode from @to, instead of trying > > + * to rewind as implemented? > > + */ > > + > > + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); > > + unsigned long from = cpuc->lbr_entries[0].from; > > + unsigned long to = cpuc->lbr_entries[0].to; > > Ah, I see. For branch instruction case, we can use LBR to > find previous IP... Right, we use the LBR to find the basic block. > > + unsigned long ip = regs->ip; > > + u8 buf[2*MAX_INSN_SIZE]; > > + u8 *kaddr; > > + int i; > > + > > + if (from && to) { > > + /* > > + * We sampled a branch insn, rewind using the LBR stack > > + */ > > + if (ip == to) { > > + regs->ip = from; > > + return; > > + } > > + } > > + > > + if (user_mode(regs)) { > > + int bytes = copy_from_user_nmi(buf, > > + (void __user *)(ip - MAX_INSN_SIZE), > > + 2*MAX_INSN_SIZE); > > + > > maybe, you'd better check the source address range is within > the user address range. e.g. ip < MAX_INSN_SIZE. Not only that, I realized user_mode() checks regs->cs, which is not set by the PEBS code, so I added some helpers. > > + > > + /* > > + * Try to find the longest insn ending up at the given IP > > + */ > > + for (i = MAX_INSN_SIZE; i > 0; i--) { > > + struct insn insn; > > + > > + kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i); > > + insn_get_length(&insn); > > + if (insn.length == i) { > > + regs->ip -= i; > > + return; > > + } > > + } > > Hmm, this will not work correctly on x86, since the decoder can > miss-decode the tail bytes of previous instruction as prefix bytes. :( > > Thus, if you want to rewind instruction stream, you need to decode > a function (or basic block) entirely. Something like the below? #ifdef CONFIG_X86_32 static bool kernel_ip(unsigned long ip) { return ip > TASK_SIZE; } #else static bool kernel_ip(unsigned long ip) { return (long)ip < 0; } #endif static int intel_pmu_pebs_fixup_ip(unsigned long *ipp) { struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); unsigned long from = cpuc->lbr_entries[0].from; unsigned long old_to, to = cpuc->lbr_entries[0].to; unsigned long ip = *ipp; int i; /* * We don't need to fixup if the PEBS assist is fault like */ if (!x86_pmu.intel_perf_capabilities.pebs_trap) return 0; if (!cpuc->lbr_stack.nr || !from || !to) return 0; if (ip < to) return 0; /* * We sampled a branch insn, rewind using the LBR stack */ if (ip == to) { *ipp = from; return 1; } do { struct insn insn; u8 buf[MAX_INSN_SIZE]; void *kaddr; old_to = to; if (!kernel_ip(ip)) { int bytes = copy_from_user_nmi(buf, (void __user *)to, MAX_INSN_SIZE); if (bytes != MAX_INSN_SIZE) return 0; kaddr = buf; } else kaddr = (void *)to; kernel_insn_init(&insn, kaddr); insn_get_length(&insn); to += insn.length; } while (to < ip); if (to == ip) { *ipp = old_to; return 1; } return 0; } I thought about exposing the success of this fixup as a PERF_RECORD_MISC bit. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Masami Hiramatsu on 3 Mar 2010 16:20 Peter Zijlstra wrote: > On Wed, 2010-03-03 at 13:05 -0500, Masami Hiramatsu wrote: >> Peter Zijlstra wrote: >>> PEBS always reports the IP+1, that is the instruction after the one >>> that got sampled, cure this by using the LBR to reliably rewind the >>> instruction stream. >> >> Hmm, does PEBS always report one byte after the end address of the >> sampled instruction? Or the instruction which will be executed next >> step? > > The next instruction, its trap like. > >> [...] >>> +#include <asm/insn.h> >>> + >>> +#define MAX_INSN_SIZE 16 >> >> Hmm, we'd better integrate these kinds of definitions into >> asm/insn.h... (several features define it) > > Agreed, I'll look at doing a patch to collect them all into asm/insn.h > if nobody beats me to it :-) At least kprobes doesn't :) >>> + >>> +static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs) >>> +{ >>> +#if 0 >>> + /* >>> + * Borken, makes the machine expode at times trying to >>> + * derefence funny userspace addresses. >>> + * >>> + * Should we always fwd decode from @to, instead of trying >>> + * to rewind as implemented? >>> + */ >>> + >>> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); >>> + unsigned long from = cpuc->lbr_entries[0].from; >>> + unsigned long to = cpuc->lbr_entries[0].to; >> >> Ah, I see. For branch instruction case, we can use LBR to >> find previous IP... > > Right, we use the LBR to find the basic block. Hm, that's a good idea :) >>> + unsigned long ip = regs->ip; >>> + u8 buf[2*MAX_INSN_SIZE]; >>> + u8 *kaddr; >>> + int i; >>> + >>> + if (from && to) { >>> + /* >>> + * We sampled a branch insn, rewind using the LBR stack >>> + */ >>> + if (ip == to) { >>> + regs->ip = from; >>> + return; >>> + } >>> + } >>> + >>> + if (user_mode(regs)) { >>> + int bytes = copy_from_user_nmi(buf, >>> + (void __user *)(ip - MAX_INSN_SIZE), >>> + 2*MAX_INSN_SIZE); >>> + >> >> maybe, you'd better check the source address range is within >> the user address range. e.g. ip < MAX_INSN_SIZE. > > Not only that, I realized user_mode() checks regs->cs, which is not set > by the PEBS code, so I added some helpers. > >>> + >>> + /* >>> + * Try to find the longest insn ending up at the given IP >>> + */ >>> + for (i = MAX_INSN_SIZE; i > 0; i--) { >>> + struct insn insn; >>> + >>> + kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i); >>> + insn_get_length(&insn); >>> + if (insn.length == i) { >>> + regs->ip -= i; >>> + return; >>> + } >>> + } >> >> Hmm, this will not work correctly on x86, since the decoder can >> miss-decode the tail bytes of previous instruction as prefix bytes. :( >> >> Thus, if you want to rewind instruction stream, you need to decode >> a function (or basic block) entirely. > > Something like the below? Great! it looks good to me. Yeah, LBR.to may always smaller than current ip (if no one disabled LBR). Thank you, > > #ifdef CONFIG_X86_32 > static bool kernel_ip(unsigned long ip) > { > return ip > TASK_SIZE; > } > #else > static bool kernel_ip(unsigned long ip) > { > return (long)ip < 0; > } > #endif > > static int intel_pmu_pebs_fixup_ip(unsigned long *ipp) > { > struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); > unsigned long from = cpuc->lbr_entries[0].from; > unsigned long old_to, to = cpuc->lbr_entries[0].to; > unsigned long ip = *ipp; > int i; > > /* > * We don't need to fixup if the PEBS assist is fault like > */ > if (!x86_pmu.intel_perf_capabilities.pebs_trap) > return 0; > > if (!cpuc->lbr_stack.nr || !from || !to) > return 0; > > if (ip < to) > return 0; > > /* > * We sampled a branch insn, rewind using the LBR stack > */ > if (ip == to) { > *ipp = from; > return 1; > } > > do { > struct insn insn; > u8 buf[MAX_INSN_SIZE]; > void *kaddr; > > old_to = to; > if (!kernel_ip(ip)) { > int bytes = copy_from_user_nmi(buf, (void __user *)to, > MAX_INSN_SIZE); > > if (bytes != MAX_INSN_SIZE) > return 0; > > kaddr = buf; > } else kaddr = (void *)to; > > kernel_insn_init(&insn, kaddr); > insn_get_length(&insn); > to += insn.length; > } while (to < ip); > > if (to == ip) { > *ipp = old_to; > return 1; > } > > return 0; > } > > I thought about exposing the success of this fixup as a PERF_RECORD_MISC > bit. > -- Masami Hiramatsu e-mail: mhiramat(a)redhat.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Stephane Eranian on 3 Mar 2010 17:00 I think systematically and transparently using LBR to correct PEBS off-by-one problem is not such a good idea. You've basically highjacked LBR and user cannot use it in a different way. There are PEBS+LBR measurements where you care about extracting the LBR data. There are PEBS measurements where you don't care about getting the correct IP. I don't necessarily want to pay the price, especially when this could be done offline in the tool. On Wed, Mar 3, 2010 at 10:11 PM, Masami Hiramatsu <mhiramat(a)redhat.com> wrote: > Peter Zijlstra wrote: >> On Wed, 2010-03-03 at 13:05 -0500, Masami Hiramatsu wrote: >>> Peter Zijlstra wrote: >>>> PEBS always reports the IP+1, that is the instruction after the one >>>> that got sampled, cure this by using the LBR to reliably rewind the >>>> instruction stream. >>> >>> Hmm, does PEBS always report one byte after the end address of the >>> sampled instruction? Or the instruction which will be executed next >>> step? >> >> The next instruction, its trap like. >> >>> [...] >>>> +#include <asm/insn.h> >>>> + >>>> +#define MAX_INSN_SIZE 16 >>> >>> Hmm, we'd better integrate these kinds of definitions into >>> asm/insn.h... (several features define it) >> >> Agreed, I'll look at doing a patch to collect them all into asm/insn.h >> if nobody beats me to it :-) > > At least kprobes doesn't :) > >>>> + >>>> +static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs) >>>> +{ >>>> +#if 0 >>>> + /* >>>> + * Borken, makes the machine expode at times trying to >>>> + * derefence funny userspace addresses. >>>> + * >>>> + * Should we always fwd decode from @to, instead of trying >>>> + * to rewind as implemented? >>>> + */ >>>> + >>>> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); >>>> + unsigned long from = cpuc->lbr_entries[0].from; >>>> + unsigned long to = cpuc->lbr_entries[0].to; >>> >>> Ah, I see. For branch instruction case, we can use LBR to >>> find previous IP... >> >> Right, we use the LBR to find the basic block. > > Hm, that's a good idea :) > >>>> + unsigned long ip = regs->ip; >>>> + u8 buf[2*MAX_INSN_SIZE]; >>>> + u8 *kaddr; >>>> + int i; >>>> + >>>> + if (from && to) { >>>> + /* >>>> + * We sampled a branch insn, rewind using the LBR stack >>>> + */ >>>> + if (ip == to) { >>>> + regs->ip = from; >>>> + return; >>>> + } >>>> + } >>>> + >>>> + if (user_mode(regs)) { >>>> + int bytes = copy_from_user_nmi(buf, >>>> + (void __user *)(ip - MAX_INSN_SIZE), >>>> + 2*MAX_INSN_SIZE); >>>> + >>> >>> maybe, you'd better check the source address range is within >>> the user address range. e.g. ip < MAX_INSN_SIZE. >> >> Not only that, I realized user_mode() checks regs->cs, which is not set >> by the PEBS code, so I added some helpers. >> >>>> + >>>> + /* >>>> + * Try to find the longest insn ending up at the given IP >>>> + */ >>>> + for (i = MAX_INSN_SIZE; i > 0; i--) { >>>> + struct insn insn; >>>> + >>>> + kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i); >>>> + insn_get_length(&insn); >>>> + if (insn.length == i) { >>>> + regs->ip -= i; >>>> + return; >>>> + } >>>> + } >>> >>> Hmm, this will not work correctly on x86, since the decoder can >>> miss-decode the tail bytes of previous instruction as prefix bytes. :( >>> >>> Thus, if you want to rewind instruction stream, you need to decode >>> a function (or basic block) entirely. >> >> Something like the below? > > Great! it looks good to me. > Yeah, LBR.to may always smaller than current ip (if no one disabled LBR). > > Thank you, > >> >> #ifdef CONFIG_X86_32 >> static bool kernel_ip(unsigned long ip) >> { >> return ip > TASK_SIZE; >> } >> #else >> static bool kernel_ip(unsigned long ip) >> { >> return (long)ip < 0; >> } >> #endif >> >> static int intel_pmu_pebs_fixup_ip(unsigned long *ipp) >> { >> struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); >> unsigned long from = cpuc->lbr_entries[0].from; >> unsigned long old_to, to = cpuc->lbr_entries[0].to; >> unsigned long ip = *ipp; >> int i; >> >> /* >> * We don't need to fixup if the PEBS assist is fault like >> */ >> if (!x86_pmu.intel_perf_capabilities.pebs_trap) >> return 0; >> >> if (!cpuc->lbr_stack.nr || !from || !to) >> return 0; >> >> if (ip < to) >> return 0; >> >> /* >> * We sampled a branch insn, rewind using the LBR stack >> */ >> if (ip == to) { >> *ipp = from; >> return 1; >> } >> >> do { >> struct insn insn; >> u8 buf[MAX_INSN_SIZE]; >> void *kaddr; >> >> old_to = to; >> if (!kernel_ip(ip)) { >> int bytes = copy_from_user_nmi(buf, (void __user *)to, >> MAX_INSN_SIZE); >> >> if (bytes != MAX_INSN_SIZE) >> return 0; >> >> kaddr = buf; >> } else kaddr = (void *)to; >> >> kernel_insn_init(&insn, kaddr); >> insn_get_length(&insn); >> to += insn.length; >> } while (to < ip); >> >> if (to == ip) { >> *ipp = old_to; >> return 1; >> } >> >> return 0; >> } >> >> I thought about exposing the success of this fixup as a PERF_RECORD_MISC >> bit. >> > > -- > Masami Hiramatsu > e-mail: mhiramat(a)redhat.com > -- Stephane Eranian | EMEA Software Engineering Google France | 38 avenue de l'Opéra | 75002 Paris Tel : +33 (0) 1 42 68 53 00 This email may be confidential or privileged. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it went to the wrong person. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 4 Mar 2010 04:00 On Wed, 2010-03-03 at 22:50 +0100, Stephane Eranian wrote: > I think systematically and transparently using LBR to correct PEBS off-by-one > problem is not such a good idea. You've basically highjacked LBR and user > cannot use it in a different way. Well, they could, it just makes scheduling the stuff more interesting. > There are PEBS+LBR measurements where you care about extracting the LBR data. > There are PEBS measurements where you don't care about getting the correct IP. > I don't necessarily want to pay the price, especially when this could > be done offline in the tool. There are some people who argue that fixing up that +1 insn issue is critical, sadly they don't appear to want to argue their case in public. What we can do is make it optional I guess. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 Prev: perf: Provide PERF_SAMPLE_REGS Next: [PATCH] staging/dt3155: fix build error caused by {write|read}l change |