Prev: vfs: introduce FS_IOC_SYNCFS to sync a single super
Next: Reduce per table entry overhead by 4 bytes
From: Pekka Enberg on 9 Aug 2010 15:00 On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta(a)vflare.org> wrote: > @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio) > � � � � � � � � � � � � � � � �zram_test_flag(zram, index, ZRAM_ZERO)) > � � � � � � � � � � � �zram_free_page(zram, index); > > - � � � � � � � mutex_lock(&zram->lock); > + � � � � � � � preempt_disable(); > + � � � � � � � zbuffer = __get_cpu_var(compress_buffer); > + � � � � � � � zworkmem = __get_cpu_var(compress_workmem); > + � � � � � � � if (unlikely(!zbuffer || !zworkmem)) { > + � � � � � � � � � � � preempt_enable(); > + � � � � � � � � � � � goto out; > + � � � � � � � } The per-CPU buffer thing with this preempt_disable() trickery looks overkill to me. Most block device drivers seem to use mempool_alloc() for this sort of thing. Is there some reason you can't use that here? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nitin Gupta on 10 Aug 2010 00:50 On 08/10/2010 12:27 AM, Pekka Enberg wrote: > On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta(a)vflare.org> wrote: >> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio) >> zram_test_flag(zram, index, ZRAM_ZERO)) >> zram_free_page(zram, index); >> >> - mutex_lock(&zram->lock); >> + preempt_disable(); >> + zbuffer = __get_cpu_var(compress_buffer); >> + zworkmem = __get_cpu_var(compress_workmem); >> + if (unlikely(!zbuffer || !zworkmem)) { >> + preempt_enable(); >> + goto out; >> + } > > The per-CPU buffer thing with this preempt_disable() trickery looks > overkill to me. Most block device drivers seem to use mempool_alloc() > for this sort of thing. Is there some reason you can't use that here? > Other block drivers are allocating relatively small structs using mempool_alloc(). However, in case of zram, these buffers are quite large (compress_workmem is 64K!). So, allocating them on every write would probably be much slower than using a pre-allocated per-cpu buffer. Thanks, Nitin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Pekka Enberg on 10 Aug 2010 01:20 Hi Nitin, On 10.8.2010 7.47, Nitin Gupta wrote: > On 08/10/2010 12:27 AM, Pekka Enberg wrote: >> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta<ngupta(a)vflare.org> wrote: >>> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio) >>> zram_test_flag(zram, index, ZRAM_ZERO)) >>> zram_free_page(zram, index); >>> >>> - mutex_lock(&zram->lock); >>> + preempt_disable(); >>> + zbuffer = __get_cpu_var(compress_buffer); >>> + zworkmem = __get_cpu_var(compress_workmem); >>> + if (unlikely(!zbuffer || !zworkmem)) { >>> + preempt_enable(); >>> + goto out; >>> + } >> The per-CPU buffer thing with this preempt_disable() trickery looks >> overkill to me. Most block device drivers seem to use mempool_alloc() >> for this sort of thing. Is there some reason you can't use that here? >> > Other block drivers are allocating relatively small structs using > mempool_alloc(). However, in case of zram, these buffers are quite > large (compress_workmem is 64K!). So, allocating them on every write > would probably be much slower than using a pre-allocated per-cpu buffer. The mempool API is precisely for that - using pre-allocated buffers instead of allocating every time. The preempt_disable() games make the code complex and have the downside of higher scheduling latencies so why not give mempools a try? Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nitin Gupta on 10 Aug 2010 01:40 Hi, On 08/10/2010 10:35 AM, Pekka Enberg wrote: > On 10.8.2010 7.47, Nitin Gupta wrote: >> On 08/10/2010 12:27 AM, Pekka Enberg wrote: >>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta<ngupta(a)vflare.org> wrote: >>>> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio) >>>> zram_test_flag(zram, index, ZRAM_ZERO)) >>>> zram_free_page(zram, index); >>>> >>>> - mutex_lock(&zram->lock); >>>> + preempt_disable(); >>>> + zbuffer = __get_cpu_var(compress_buffer); >>>> + zworkmem = __get_cpu_var(compress_workmem); >>>> + if (unlikely(!zbuffer || !zworkmem)) { >>>> + preempt_enable(); >>>> + goto out; >>>> + } >>> The per-CPU buffer thing with this preempt_disable() trickery looks >>> overkill to me. Most block device drivers seem to use mempool_alloc() >>> for this sort of thing. Is there some reason you can't use that here? >>> >> Other block drivers are allocating relatively small structs using >> mempool_alloc(). However, in case of zram, these buffers are quite >> large (compress_workmem is 64K!). So, allocating them on every write >> would probably be much slower than using a pre-allocated per-cpu buffer. > The mempool API is precisely for that - using pre-allocated buffers instead of allocating every time. The preempt_disable() games make the code complex and have the downside of higher scheduling latencies so why not give mempools a try? > mempool_alloc() first calls alloc_fn with ~(__GFP_WAIT | __GFP_IO) and *then* falls down to pre-allocated buffers. So, it will always be slower than directly using pre-allocated buffers as is done currently. One trick we can use is to have alloc_fn such that it always returns failure with ~__GFP_WAIT and do actual allocation otherwise. But still it seems like unnecessary cost. Thanks, Nitin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Pekka Enberg on 10 Aug 2010 03:40
Hi Nitin, On 8/10/10 8:32 AM, Nitin Gupta wrote: > Other block drivers are allocating relatively small structs using >>> mempool_alloc(). However, in case of zram, these buffers are quite >>> large (compress_workmem is 64K!). So, allocating them on every write >>> would probably be much slower than using a pre-allocated per-cpu buffer. >>> >> The mempool API is precisely for that - using pre-allocated buffers instead of allocating every time. The preempt_disable() games make the code complex and have the downside of higher scheduling latencies so why not give mempools a try? >> > mempool_alloc() first calls alloc_fn with ~(__GFP_WAIT | __GFP_IO) > and *then* falls down to pre-allocated buffers. So, it will always > be slower than directly using pre-allocated buffers as is done > currently. > > One trick we can use is to have alloc_fn such that it always returns > failure with ~__GFP_WAIT and do actual allocation otherwise. But still > it seems like unnecessary cost. > We can always extend the mempool API with mempool_prealloc() function if that turns out to be a problem. The per-CPU buffer with preempt_disable() trickery isn't really the proper thing to do here. It doesn't make much sense to disable preemption for compression that's purely CPU bound. Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |