Prev: [PATCH] enable readback to get HPET working on ATI SB4x00, kernel 2.6.35_rc5
Next: input: Fix wrong dimensions check for synaptics
From: Masami Hiramatsu on 7 Aug 2010 06:00 Peter Zijlstra wrote: > On Fri, 2010-08-06 at 15:18 +0900, Masami Hiramatsu wrote: >> Peter Zijlstra wrote: >>> On Wed, 2010-08-04 at 10:45 -0400, Mathieu Desnoyers wrote: >>> >>>> How do you plan to read the data concurrently with the writer overwriting the >>>> data while you are reading it without corruption ? >>> I don't consider reading while writing (in overwrite mode) a valid case. >>> >>> If you want to use overwrite, stop the writer before reading it. >> For example, would you like to read system audit log always after >> stop the audit? >> >> NO, that's a most important requirement for tracers, especially for >> system admins (they're the most important users of Linux) to check >> the system health and catch system troubles. >> >> For performance measurement and checking hotspot, one-shot tracing >> is enough. But it's just for developers. But for the real world >> computing, Linux is just an OS, users want to run their system, >> middleware and applications, without troubles. But when they hit >> a trouble, they wanna shoot it ASAP. >> The flight recorder mode is mainly for those users. > > You cannot over-write and consistently read the buffer, that's plain > impossible. With sub-buffers you can swivel a sub-buffer and > consistently read that, but there is no guarantee the next sub-buffer > you steal was indeed adjacent to the previous buffer you stole as that > might have gotten over-written by the active writer while you were > stealing the previous one. Right, we cannot ensure that. In over-written mode, reader could lose some data, because of overwriting by writers. (or writer may fails to write new data on buffer in non-overwritten mode) However, I think that doesn't mean this mode is completely useless. If we can know when(where) the data was lost, the rest of data is enough useful in some cases. > If you want to snapshot buffers, do that, simply swivel the whole trace > buffer, and continue tracing in a new one, then consume the old trace in > a consistent manner. Hmm, would that consume much more memory compared with sub-buffer ring buffer if we have spare buffers? Or, if allocating it after reader opens buffer, will it also slow down the reader process? > I really see no value in being able to read unrelated bits and pieces of > a buffer. I think there is a trade-off of perfect snapshot and consuming memory, and it depends on use-case in many cases. > > So no, I will _not_ support reading an over-write buffer while there is > an active reader. > I hope you to reconsider how over-write buffer is useful even if it is far from perfect. Thank you, -- Masami HIRAMATSU 2nd Research Dept. Hitachi, Ltd., Systems Development Laboratory E-mail: masami.hiramatsu.pt(a)hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Frederic Weisbecker on 9 Aug 2010 13:00 On Fri, Aug 06, 2010 at 11:50:40AM +0200, Peter Zijlstra wrote: > On Fri, 2010-08-06 at 15:18 +0900, Masami Hiramatsu wrote: > > Peter Zijlstra wrote: > > > On Wed, 2010-08-04 at 10:45 -0400, Mathieu Desnoyers wrote: > > > > > >> How do you plan to read the data concurrently with the writer overwriting the > > >> data while you are reading it without corruption ? > > > > > > I don't consider reading while writing (in overwrite mode) a valid case. > > > > > > If you want to use overwrite, stop the writer before reading it. > > > > For example, would you like to read system audit log always after > > stop the audit? > > > > NO, that's a most important requirement for tracers, especially for > > system admins (they're the most important users of Linux) to check > > the system health and catch system troubles. > > > > For performance measurement and checking hotspot, one-shot tracing > > is enough. But it's just for developers. But for the real world > > computing, Linux is just an OS, users want to run their system, > > middleware and applications, without troubles. But when they hit > > a trouble, they wanna shoot it ASAP. > > The flight recorder mode is mainly for those users. > > You cannot over-write and consistently read the buffer, that's plain > impossible. With sub-buffers you can swivel a sub-buffer and > consistently read that, but there is no guarantee the next sub-buffer > you steal was indeed adjacent to the previous buffer you stole as that > might have gotten over-written by the active writer while you were > stealing the previous one. > > If you want to snapshot buffers, do that, simply swivel the whole trace > buffer, and continue tracing in a new one, then consume the old trace in > a consistent manner. > > I really see no value in being able to read unrelated bits and pieces of > a buffer. It all depends on the frequency on your events and on the amount of memory used for the buffer. If you are tracing syscalls in a semi-idle box with a ring buffer of 500 MB per cpu, you really don't care about the writer catching up the reader: it will simply not happen. OTOH if you are tracing function graphs, no buffer size will ever be enough: the writer will always be faster and catch up the reader. Using the sub-buffer scheme though, and allowing concurrent writer and reader in overwriting mode, we can easily tell the user about the writer beeing faster and content that have been lost. On top of these informations, the user can chose what to do: trying with a larger buffer or so. See? It's not our role to say: the result might be unreliable if the user does silly settings (not enough memory, reader too slow for random reasons, too high frequency events or so...). Let the user deal with that and just inform him about unreliable results. This is what ftrace does currently. Also the snapshot thing doesn't look like a replacement. If you are tracing on a low memory embedded system, you consume a lot of memory to keep the snapshot alive, it means the live buffer can be critically lowered and you might in turn lose traces there. That said it's an interesting feature that may fit on other kind of environments or for other needs. Off-topic: It's sad that about tracing, we often have to figure out the needs from embedded world, or learn from indirect sources. In the end we rarely know from them directly. Except may be in confs.... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Steven Rostedt on 11 Aug 2010 10:40 Egad! Go on vacation and the world falls apart. On Wed, 2010-08-04 at 08:27 +0200, Peter Zijlstra wrote: > On Tue, 2010-08-03 at 11:56 -0700, Linus Torvalds wrote: > > On Tue, Aug 3, 2010 at 10:18 AM, Peter Zijlstra <peterz(a)infradead.org> wrote: > > > > > > FWIW I really utterly detest the whole concept of sub-buffers. > > > > I'm not quite sure why. Is it something fundamental, or just an > > implementation issue? > > The sub-buffer thing that both ftrace and lttng have is creating a large > buffer from a lot of small buffers, I simply don't see the point of > doing that. It adds complexity and limitations for very little gain. So, I want to allocate a 10Meg buffer. I need to make sure the kernel has 10megs of memory available. If the memory is quite fragmented, then too bad, I lose out. Oh wait, I could also use vmalloc. But then again, now I'm blasting valuable TLB entries for a tracing utility, thus making the tracer have a even bigger impact on the entire system. BAH! I originally wanted to go with the continuous buffer, but I was convinced after trying to implement it, that it was a bad choice. Specifically, because of needing to 1) get large amounts of memory that is continuous, or 2) eating up TLB entries and causing the system to perform poorer. I chose page size "sub-buffers" to solve the above. It also made implementing splice trivial. OK, I admit, I never thought about mmapping the buffers, just because I figured splice was faster. But I do have patches that allow a user to mmap the entire ring buffer, but only in a "producer/consumer" mode. Note, I use page size sub-buffers, but the design could work with any size sub-buffers. I just never implemented that (even though, when I wrote the code it was secretly on my todo list). > > Their benefit is known synchronization points into the stream, you can > parse each sub-buffer independently, but you can always break up a > continuous stream into smaller parts or use a transport that includes > index points or whatever. > > Their down side is that you can never have individual events larger than > the sub-buffer, you need to be aware of the sub-buffer when reserving > space etc.. The answer to that is to make a macro to do the assignment of the event, and add a new API. event = ring_buffer_reserve_unlimited(); ring_buffer_assign(event, data1); ring_buffer_assign(event, data2); ring_buffer_commit(event); The ring_buffer_reserve_unlimited() could reserve a bunch of space beyond one ring buffer. It could reserve data in fragments. Then the ring_buffer_assgin() could either copy directly to the event (if the event exists on one sub buffer) or do a copy the space was fragmented. Of course, userspace would need to know how to read it. And it can get complex due to interrupts coming in and also reserving between fragments, or what happens if a partial fragment is overwritten. But all these are not impossible to solve. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Steven Rostedt on 11 Aug 2010 10:50
On Fri, 2010-08-06 at 10:13 -0400, Mathieu Desnoyers wrote: > * Peter Zijlstra (peterz(a)infradead.org) wrote: > Less code = less instruction cache overhead. I've also shown that the LTTng code > is at least twice faster. In terms of complexity, it is not much more complex; I > also took the extra care of doing the formal proofs to make sure the > corner-cases were dealt with, which I don't reckon neither Steven nor yourself > have done. Yes Mathieu, you did a formal proof. Good for you. But honestly, it is starting to get very annoying to hear you constantly stating that, because, to most kernel developers, it is meaningless. Any slight modification of your algorithm, renders the proof invalid. You are not the only one that has done a proof to an algorithm in the kernel, but you are definitely the only one that constantly reminds people that you have done so. Congrats on your PhD, and in academia, proofs are important. But this is a ring buffer, not a critical part of the workings of the kernel. There are much more critical and fragile parts of the kernel that work fine without a formal proof. Paul McKenney did a proof for RCU not for us, but just to help give him a warm fuzzy about it. RCU is much more complex than the ftrace ring buffer, and it also is much more critical. If Paul gets it wrong, a machine will crash. He's right to worry. And even Paul told me that no formal proof makes up for large scale testing. Which BTW, the ftrace ring buffer has gone through. Someday I may go ahead and do that proof, but I did do a very intensive state diagram, and I'm quite confident that it works. It's been deployed for quite a bit, and the design has yet to be a factor in any bug report of the ring buffer. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |