Prev: BKL: Explicitly add BKL around get_sb/fill_super
Next: [PATCH 2/3] sysctl: Remove CTL_NONE and CTL_UNNUMBERED
From: Paul Mackerras on 21 Dec 2009 20:20 On Fri, Dec 11, 2009 at 12:59:16PM +0100, stephane eranian wrote: > There is a major difference between PPC and X86 here. PPC has a centralized > register to control start/stop. This register uses bitmask to enable > or disable counters. Thus, in hw_perf_enable(), if n_added=0, then you > just need to > use the pre-computed bitmask. Otherwise, you need to recompute the bitmask to > include the new registers. The assignment of events and validation is done in > hw_group_sched_in(). That's not entirely accurate. Yes there is a global start/stop bit, but there isn't a bitmask to enable or disable counters. There is a selector bitfield for each counter (except the limited-function counters) and you can set the selector to the 'count nothing' value if you don't want a particular counter to count. Validation is done in hw_group_sched_in() but not the assignment of events to counters. That's done in hw_perf_enable(), via the model-specific ppmu->compute_mmcr() call. > In X86, assignment and validation is done in hw_group_sched_in(). Activation is > done individually for each counter. There is no centralized register > used here, thus > no bitmask to update. > > Disabling a counter does not trigger a complete reschedule of events. > This happens > only when hw_group_sched_in() is called. > > The n_events = 0 in hw_perf_disable() is used to signal that something > is changing. > It should not be here but here. The meaning of "It should not be here but here" is quite unclear to me. > The problem is that > hw_group_sched_in() needs a way > to know that it is called for a completely new series of group > scheduling so it can > discard any previous assignment. This goes back to the issue I raised > in my previous > email. You could add a parameter to hw_group_sched_in() that would > indicate this is > the first group. that would cause n_events =0 and the function would > start accumulating > events for the new scheduling period. I don't think hw_group_sched_in is ever called for a completely new series of group scheduling. If you have per-cpu counters active, they don't get scheduled out and in again with each task switch. So you will tend to get a hw_pmu_disable call, then a series of disable calls for the per-task events for the old task, then a series of hw_group_sched_in calls for the per-task events for the new task, then a hw_pmu_enable call. On powerpc we maintain an array with pointers to all the currently active events. That makes it easy to know at hw_pmu_enable() time what events need to be put on the PMU. Also it means that at hw_group_sched_in time you can look at the whole set of events, including the ones just added, to see if it's feasible to put them all on. At that point we just check feasibility, which is quite quick and easy using the bit-vector encoding of constraints. The bit-vector encoding lets us represent multiple constraints of various forms in one pair of 64-bit values per event. We can express constraints such as "you can have at most N events in a class X" or "you can't have events in all of classes A, B, C and D" or "control register bitfield X must be set to Y", and then check that a set of events satisfies all the constraints with some simple integer arithmetic. I don't know exactly what constraints you have on x86 but I would be surprised if you couldn't handle them the same way. Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Stephane Eranian on 29 Dec 2009 09:50 Paul, So if I understand what both of you are saying, it seems that event scheduling has to take place in the pmu->enable() callback which is per-event. In the case of X86, you can chose to do a best-effort scheduling, i.e., only assign the new event if there is a compatible free counter. That would be incremental. But the better solution would be to re-examine the whole situation and potentially move existing enabled events around to free a counter if the new event is more constrained. That would require stopping the PMU, rewriting config and data registers and re-enabling the PMU. This latter solution is the only possibility to avoid ordering side effects, i.e., the assignment of events to counters depends on the order in which events are created (or enabled). The register can be considered freed by pmu->disable() if scheduling takes place in pmu->enable(). From what Paul was saying about hw_perf_group_sched_in(), it seems like this function can be used to check if a new group is compatible with the existing enabled events. Compatible means that there is a possible assignment of events to counters. As for the n_added logic, it seems like perf_disable() resets n_added to zero. N_added is incremented in pmu->enable(), i.e., add one event, or the hw_perf_group_sched_in(), i.e., add a whole group. Scheduling is based on n_events. The point of n_added is to verify whether something needs to be done, i.e., event scheduling, if an event or group was added between perf_disable() and perf_enable(). In pmu->disable(), the list of enabled events is compacted and n_events is decremented. Did I get this right? All the enable and disable calls can be called from NMI interrupt context and thus must be very careful with locks. On Tue, Dec 22, 2009 at 2:02 AM, Paul Mackerras <paulus(a)samba.org> wrote: > On Mon, Dec 21, 2009 at 04:40:40PM +0100, Peter Zijlstra wrote: > >> I'm not really seeing the problem here... >> >> >> perf_disable() <-- shut down the full pmu >> >> pmu->disable() <-- hey someone got removed (easy free the reg) >> pmu->enable() <-- hey someone got added (harder, check constraints) >> >> hw_perf_group_sched_in() <-- hey a full group got added >> (better than multiple ->enable) >> >> perf_enable() <-- re-enable pmu >> >> >> So ->disable() is used to track freeing, ->enable is used to add >> individual counters, check constraints etc.. >> >> hw_perf_group_sched_in() is used to optimize the full group enable. >> >> Afaict that is what power does (Paul?) and that should I think be >> sufficient to track x86 as well. > > That sounds right to me. > >> Since sched_in() is balanced with sched_out(), the ->disable() calls >> should provide the required information as to the occupation of the pmu. >> I don't see the need for more hooks. >> >> Paul, could you comment, since you did all this for power? > > On powerpc we maintain a list of currently enabled events in the arch > code. Does x86 do that as well? > > If you have the list (or array) of events easily accessible, it's > relatively easy to check whether the whole set is feasible at any > point, without worrying about which events were recently added. The > perf_event structure has a spot where the arch code can store which > PMU register is used for that event, so you can easily optimize the > case where the event doesn't move. > > Like you, I'm not seeing where the difficulty lies. Perhaps Stephane > could give us a detailed example if he still thinks there's a > difficulty. > > Paul. > -- Stephane Eranian | EMEA Software Engineering Google France | 38 avenue de l'Opéra | 75002 Paris Tel : +33 (0) 1 42 68 53 00 This email may be confidential or privileged. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it went to the wrong person. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Paul Mackerras on 6 Jan 2010 23:20 Stephane, > So if I understand what both of you are saying, it seems that > event scheduling has to take place in the pmu->enable() callback > which is per-event. How much you have to do in the pmu->enable() function depends on whether the PMU is already stopped (via a call hw_perf_disable()) or not. If it is already stopped you don't have to do the full event scheduling computation, you only have to do enough to work out if the current set of events plus the event being enabled is feasible, and record the event for later. In other words, you can defer the actual scheduling until hw_perf_enable() time. Most calls to the pmu->enable() function are with the PMU already stopped, so it's a worthwhile optimization. > In the case of X86, you can chose to do a best-effort scheduling, > i.e., only assign > the new event if there is a compatible free counter. That would be incremental. > > But the better solution would be to re-examine the whole situation and > potentially > move existing enabled events around to free a counter if the new event is more > constrained. That would require stopping the PMU, rewriting config and > data registers > and re-enabling the PMU. This latter solution is the only possibility > to avoid ordering > side effects, i.e., the assignment of events to counters depends on > the order in which > events are created (or enabled). On powerpc, the pmu->enable() function always stops the PMU if it wasn't already stopped. That simplifies the code a lot because it means that I can do all the actual event scheduling (i.e. deciding on which event goes on which counter and working out PMU control register values) in hw_perf_enable(). > The register can be considered freed by pmu->disable() if scheduling takes place > in pmu->enable(). > > >From what Paul was saying about hw_perf_group_sched_in(), it seems like this > function can be used to check if a new group is compatible with the existing > enabled events. Compatible means that there is a possible assignment of > events to counters. The hw_perf_group_sched_in() function is an optimization so I can add all the counters in the group to the set under consideration and then do just one feasibility check, rather than adding each counter in the group in turn and doing the feasibility check for each one. > As for the n_added logic, it seems like perf_disable() resets n_added to zero. > N_added is incremented in pmu->enable(), i.e., add one event, or the > hw_perf_group_sched_in(), i.e., add a whole group. Scheduling is based on > n_events. The point of n_added is to verify whether something needs to be > done, i.e., event scheduling, if an event or group was added between > perf_disable() > and perf_enable(). In pmu->disable(), the list of enabled events is > compacted and > n_events is decremented. > > Did I get this right? The main point of n_added was so that hw_perf_enable() could know whether the current set of events is a subset of the last set. If it is a subset, the scheduling decisions are much simpler. > All the enable and disable calls can be called from NMI interrupt context > and thus must be very careful with locks. I didn't think the pmu->enable() and pmu->disable() functions could be called from NMI context. Also, I would expect that if hw_perf_enable and hw_perf_disable are called from NMI context, that the calls would be balanced. That simplifies things a lot. Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 7 Jan 2010 04:00 On Thu, 2010-01-07 at 15:13 +1100, Paul Mackerras wrote: > > On powerpc, the pmu->enable() function always stops the PMU if it > wasn't already stopped. I just looked at the generic code and it looks like kernel/perf_event.c always calls perf_disable() before calling ->enable(). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on 7 Jan 2010 04:10 On Thu, 2010-01-07 at 15:13 +1100, Paul Mackerras wrote: > > > All the enable and disable calls can be called from NMI interrupt context > > and thus must be very careful with locks. > > I didn't think the pmu->enable() and pmu->disable() functions could be > called from NMI context. I don't think they're called from NMI context either, most certainly not from the generic code. The x86 calls the raw disable from nmi to throttle the counter, but all that (should) do is disable that counter, which is limited to a single msr write. After that it schedules a full disable by sending a self-ipi. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: BKL: Explicitly add BKL around get_sb/fill_super Next: [PATCH 2/3] sysctl: Remove CTL_NONE and CTL_UNNUMBERED |