Prev: [tip:x86/mrst] x86, mrst: make mrst_identify_cpu() an inline returning enum
Next: drivers/media/video/uvc: Use kmemdup
From: Corey Ashford on 19 May 2010 18:10 Hi Robert, On 5/19/2010 2:20 PM, Robert Richter wrote: > This patch introduces a method to specify the type of a raw sample. > This can be used to setup hardware events other than generic > performance counters by passing special config data to the pmu. The > config data can be interpreted different from generic events and thus > can be used for other purposes. > > The raw_type attribute is an extension of the ABI. It reuses the > unused bp_type space for this. Generic performance counters can be > setup by setting the raw_type attribute to null. Thus special raw > events must have a type other than null. > > Raw types can be defined as needed for cpu models or architectures. > To keep backward compatibility all architectures must return an error > for an event with a raw_type other than null that is not supported. > > E.g., raw_type can be used to setup IBS on an AMD cpu. IBS is not > common to pmu features from other vendors or architectures. The pmu > must be setup with a special config value. Sample data is returned in > a certain format back to the userland. An IBS event is created by > setting a raw event and encoding the IBS type in raw_type. The pmu > handles this raw event then and passes raw sample data back. > > Raw type could be architecure specific, e.g. for x86: > > enum perf_raw_type { > PERF_RAW_PERFCTR = 0, > PERF_RAW_IBS_FETCH = 1, > PERF_RAW_IBS_OP = 2, > > PERF_RAW_MAX, > }; > > Null is the architecture's default, meaning for x86 a perfctr. > > Maybe the raw type definition could also be part of the ABI with one > definition for all architectures. > > To use raw events with perf, the raw event syntax could be suffixed by > the type (as for breakpoints): > > -e rNNN[:TYPE] > > Example: > > perf record -e r186A:1 # ... meaning IBS fetch, cycle count 100000 > perf record -e r0:1 -c 100000 # ... the same > > Or with named types: > > perf record -e r186A:IBS_FETCH ... > perf record -e r0:IBS_FETCH -c 100000 ... Should this raw value have been 186A0 instead of 186A? Where is the named type translation coming from? Is this something that needs to be hard-coded into perf? Have you looked at Lin Ming's patch series? I think it offers another way to support IBS and other arch-specific and off-chip PMUs in a more general way, though it's not quite fully-baked yet. - Corey -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on 20 May 2010 03:00 * Corey Ashford <cjashfor(a)linux.vnet.ibm.com> wrote: > [...] > > Have you looked at Lin Ming's patch series? I think it > offers another way to support IBS and other > arch-specific and off-chip PMUs in a more general way, > though it's not quite fully-baked yet. Robert, check out this thread on lkml: [RFC][PATCH v2 06/11] perf: core, export pmus via sysfs Here is Peter's post that describes the high level design: http://groups.google.com/group/linux.kernel/msg/ab9aa075016c639e And please help out Lin Ming with this work - we dont want to add new, special-purpose ABI extensions for enumeration purposes, it should be introduced using a new event_source node via the sysfs enumeration. Whether IBS fits that category is an open question - ideally it should have similar high level usage as the PEBS/LBR bits on Intel CPUs. (at least as far as tooling goes) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Robert Richter on 20 May 2010 10:00 On 20.05.10 05:23:33, Peter Zijlstra wrote: > On Thu, 2010-05-20 at 10:10 +0200, Stephane Eranian wrote: > > I still don't understand why you need all of this to encode IBS. > > I still believe that with attr.config there is plenty of bits to choose > > from. I do understand the need for PERF_SAMPLE_RAW. I think > > there is no other way. > > > > You simply need to pick an encoding to mark the config as IBS. You > > need two bits for this: 00 regular counters, 01 IBS Fetch, 10 IBS op. > > Regular counters use 43 bits, IBS fetch uses 58, IBS op uses 52. > > So you could use bits 62-63 for instance. You don't need to encode > > the sampling period in attr.config for either IBS. You can use > > attr.sample_period, so you free up 16 bits. > > > > I understand that IBS may evolve and thus may use more bits. But > > you still have at least 16 bits of margin. We have some bits in the msrs that are reserved now, for perfctr and also for ibs. We could start reusing it to mark a special sample type and so on. So far, ok. Somewhen in the future there is a hw extensions and these bits are not reserved anymore, what now? In case the register layout changes, you start workaround this shifting and masking bits back and forth. This will end in a mess. We don't know what will change and when, but *reserved* means it could be subject to change. So, actually you have to assume arbitrary 64 bit config values for perfctrs and ibs. Thus, you cannot use reserved bits for encoding, doing this would be a hack. The approach is also not extendable for new pmu features. The question will rise again then how to encode it. So encoding this information in some other attribute like raw_type or any other field is a much cleaner and a more stable solution, easier to program and to maintain. And there is enough space in the attribute structure. > > Users and tools would rely on an library to provide the event encoding. > > No need to come up with some raw hex number on the cmdline. We can pack everything in tools and libraries and hide direct access to the user. Now, one reads the IBS spec and wants to start using it. He will have to learn (by reading the source or documentation) now, how to pass parameters using the tool or library to set certain bits in the registers. This adds an additional i/f layer what needs to be defined, documented, implemented and maintained. Why not keep things simple and let the user pass a register value to the pmu? Only the config has to be validated in the kernel and that's it. The hw spec is then the reference for using the interface. We have already raw config values for counters and there was a reason for it. > For Instruction-Fetch: > > 0:15 sample-period (r/w) > 57 randomized (r/w) > For Instruction-Execution: > > 0:15 sample-period (r/w) > 0x87 Instruction Fetch Stall -- Ins-Fetch > 0xC0 Retired Instructions -- Ins-Exec I agree, where possible we should reuse attributes or existing events for configuration. But since the pmu behavior differs, there is the risk of mixing different things together. "precise" would be different on Intel and AMD. This will be hidden to the user leading to wrong assumptions and results. And, what about new options that don't have equivalent attribute flags? But it's not the question how to pass the config, the question is how to mark the event configuration as a certain IBS event. > Furthermore, these counters will have to deal with sample-period > 2^16 > by 'ignoring' interrupts until we get ->period_left down to 0. No question here, ibs counters should be extended to 64 bit counters, but this is not addressed with this patch set, I have kept this for later work. > The extra data could possibly be exposed through attaching non-sampling > group events and using SAMPLE_READ, like L1-misses, although > reconstructing the count from just one bit seems 'interesting'. > > The IbsFetchLinAd/IbsOpRip would go straight into PERF_SAMPLE_IP by > replacing pt_regs->ip I guess. > > IbsDcLinAd goes into PERF_SAMPLE_ADDR It's the same for sampling data as with config data. We can spread it all over its data structures, but a user will then recollect all of it again needing to know where the information can be found. And, it will never be the same kind of information if the pmus are different. -Robert -- Advanced Micro Devices, Inc. Operating System Research Center email: robert.richter(a)amd.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Robert Richter on 20 May 2010 10:10 On 20.05.10 06:37:39, Peter Zijlstra wrote: > So yeah, you might as well expose it as a whole separate PMU using Lin's > stuff. This seems to be the most promising alternative to raw_type. But it's also much more bloated requiring sysfs and additional file descriptors. -Robert -- Advanced Micro Devices, Inc. Operating System Research Center email: robert.richter(a)amd.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Robert Richter on 20 May 2010 11:30
On 20.05.10 08:13:43, Stephane Eranian wrote: > What's wrong with creating pseudo-events for IBS? We'd have to pick > two unused event codes. That would have the advantage of making it > explicit you're using IBS. I think we could still use the precise_ip field > if people are only interested in the IP. They would use PERF_SAMPLE_RAW > if they need more. For some key events this is fine and useful. But, if this has to be used to programm all IBS features, it will introduce too much implementation and maintenance effort. What you get then will probably of less interest since users/tools want to have the raw sample data for further processing. But maybe we agree already in this point, if there is a demand we should provide pseudo events for important or offen used events. > >> > The Ins-Exec will have to re-construct the actual event->count by adding > >> > sample-period on each interrupt, as it seems we lack an actual counter > >> > in hardware. > >> > > >> For what? counting mode? > > > > Yeah, events are supposed to count. > > > IBS is a sampling only feature. I suspect it would be okay to return 0 here > or do as you said, count the number of IBS interrupts and multiply by the > sampling period. True, ibs can be used as an additional counter using the interrupt only and dropping sampling data. I was already thinking of adding it somehow to the list of available counters. But this is also later work as its 64 bit extension. > > Dunno, reconstruct sensible counters? Surely the software that uses IBS > > does something useful with that data? What does libpfm do with the IBS > > data? Here are some use cases described: http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf http://developer.amd.com/cpu/CodeAnalyst/assets/ISPASS2010_IBS_CA_abstract.pdf -Robert -- Advanced Micro Devices, Inc. Operating System Research Center email: robert.richter(a)amd.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |