Prev: mqueue: fix kernel BUG caused by double free() on mq_open()
Next: [tip:perf/core] perf symbols: allow forcing use of cplus_demangle
From: David Howells on 12 May 2010 04:40 Out of interest, does it make the code smaller if you mark ioat2_get_ring_ent() and ioat2_ring_mask() with __attribute_const__? I'm not sure whether it'll affect how long gcc is willing to cache these, but once computed, I would guess they won't change within the calling function. Also, is the device you're driving watching the ring and its indices? If so, does it modify the indices? If that is the case, you might need to use read_barrier_depends() rather than smp_read_barrier_depends(). > + prefetch(ioat2_get_ring_ent(ioat, idx + i + 1)); > + desc = ioat2_get_ring_ent(ioat, idx + i); > dump_desc_dbg(ioat, desc); > tx = &desc->txd; > if (tx->cookie) { Is this right, I wonder? You're prefetching [i+1] before reading [i]? Doesn't this mean that you might have to wait for [i+1] to be retrieved from RAM before [i] can be read? Should you instead read tx->cookie before issuing the prefetch? Admittedly, this is only likely to affect the reading of the head of the queue - subsequent reads in the same loop will, of course, have been prefetched. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Dan Williams on 13 May 2010 19:50
On Wed, May 12, 2010 at 1:36 AM, David Howells <dhowells(a)redhat.com> wrote: > > Out of interest, does it make the code smaller if you mark > ioat2_get_ring_ent() and ioat2_ring_mask() with __attribute_const__? > > I'm not sure whether it'll affect how long gcc is willing to cache these, but > once computed, I would guess they won't change within the calling function. Unfortunately, it does not make a difference, but I'll keep this in mind if ioat2_get_ring_ent() ever gets more complicated (which it might in the future). > Also, is the device you're driving watching the ring and its indices? �If so, > does it modify the indices? �If that is the case, you might need to use > read_barrier_depends() rather than smp_read_barrier_depends(). The device does not observe the indices directly. Instead we increment a free running 'count' register by the distance between ioat->pending and ioat->head. > >> + � � � � � � prefetch(ioat2_get_ring_ent(ioat, idx + i + 1)); >> + � � � � � � desc = ioat2_get_ring_ent(ioat, idx + i); >> � � � � � � � dump_desc_dbg(ioat, desc); >> � � � � � � � tx = &desc->txd; >> � � � � � � � if (tx->cookie) { > > Is this right, I wonder? �You're prefetching [i+1] before reading [i]? �Doesn't > this mean that you might have to wait for [i+1] to be retrieved from RAM before > [i] can be read? �Should you instead read tx->cookie before issuing the > prefetch? �Admittedly, this is only likely to affect the reading of the head of > the queue - subsequent reads in the same loop will, of course, have been > prefetched. Yes, it should be the other way around. Thanks! -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |