Prev: add x86 platform driver tree
Next: firewire: core: check for 1394a compliant IRM, fix inaccessibility of Sony camcorder
From: Herbert Xu on 1 Jun 2010 00:40 On Mon, May 31, 2010 at 10:44:30PM -0400, Mikulas Patocka wrote: > Questions: > > If you are optimizing it, > > 1) why don't you optimize it in such a way that if one CPU submits > requests, the crypto work is spread among all the CPUs? Currently it > spreads the work only if different CPUs submit it. Because the crypto layer already provides that functionality, through pcrypt. By instantiating pcrypt for a given algorithm, you can parallelise that algorithm across CPUs. This would be inappropriate for upper layer code as they do not know whether the underlying algorithm should be parallelised, e.g., a PCI offload board certainly should not be parallelised. > 2) why not optimize software async crypto daemon (crypt/cryptd.c) instead > of dm-crypt, so that all kernel subsystems can actually take advantage of > those multi-CPU optimizations, not just dm-crypt? Because you cannot do what Andi is doing here in the crypto layer. What dm-crypt does today (which hasn't always been the case BTW) hides information away (the original submitting CPU) that we cannot recreate. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert(a)gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on 1 Jun 2010 02:50 , Mikulas Patocka wrote: > Questions: > > If you are optimizing it, > > 1) why don't you optimize it in such a way that if one CPU submits > requests, the crypto work is spread among all the CPUs? Currently it > spreads the work only if different CPUs submit it. This case is only useful with very slow CPUs and is handled by pcrypt in theory (but I haven't tested it) > > 2) why not optimize software async crypto daemon (crypt/cryptd.c) instead > of dm-crypt, so that all kernel subsystems can actually take advantage of > those multi-CPU optimizations, not just dm-crypt? Normally most subsystems are multi-CPU already, unless they limit themselves artitifically like dm-crypt. For dm-crypt would be wasteful to funnel everything through two single CPU threads just to spread it out again. That is why I also used per CPU IO threads too. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Mikulas Patocka on 2 Jun 2010 01:20 On Tue, 1 Jun 2010, Herbert Xu wrote: > On Mon, May 31, 2010 at 10:44:30PM -0400, Mikulas Patocka wrote: > > Questions: > > > > If you are optimizing it, > > > > 1) why don't you optimize it in such a way that if one CPU submits > > requests, the crypto work is spread among all the CPUs? Currently it > > spreads the work only if different CPUs submit it. > > Because the crypto layer already provides that functionality, > through pcrypt. By instantiating pcrypt for a given algorithm, > you can parallelise that algorithm across CPUs. And how can I use pcrypt for dm-crypt? After a quick look at pcrypt sources, it seems to be dependent on aead and not useable for general encryption algorithms at all. I tried cryptd --- in theory it should work by requesting the algorithm like cryptd(cbc(aes)) --- but if I replace "%s(%s)" with "cryptd(%s(%s))" in dm-crypt sources it locks up and doesn't work. > This would be inappropriate for upper layer code as they do not > know whether the underlying algorithm should be parallelised, > e.g., a PCI offload board certainly should not be parallelised. The upper layer should ideally request "cbc(aes)" and the crypto routine should select the most efficient implementation --- sync on single-core system, async with cryptd on multi-core system and async with hardware implementation if you have HIFN crypto card. > > 2) why not optimize software async crypto daemon (crypt/cryptd.c) instead > > of dm-crypt, so that all kernel subsystems can actually take advantage of > > those multi-CPU optimizations, not just dm-crypt? > > Because you cannot do what Andi is doing here in the crypto layer. > What dm-crypt does today (which hasn't always been the case BTW) > hides information away (the original submitting CPU) that we cannot > recreate. It is pointless to track the submitting CPU. Majority of time is consumed by raw encyption/decryption. And you must optimize that --- i.e. on SMP system make sure that cryptd distributes the work across all available cores. When you get this right --- i.e. when reading encrypted disk, you get either read speed equivalent to non-encrypted disk or all the cores are saturated, then you can start thinking about other optimizations. Mikulas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Herbert Xu on 2 Jun 2010 01:20 On Wed, Jun 02, 2010 at 01:10:00AM -0400, Mikulas Patocka wrote: > > And how can I use pcrypt for dm-crypt? After a quick look at pcrypt > sources, it seems to be dependent on aead and not useable for general > encryption algorithms at all. You instantiate a pcrypt variant of whatever algorithm that you're using. For example, if you're using XTS then you should instantiate pcrypt(xts(aes)). Currently you must use tcrypt to instantiate. > I tried cryptd --- in theory it should work by requesting the algorithm > like cryptd(cbc(aes)) --- but if I replace "%s(%s)" with "cryptd(%s(%s))" > in dm-crypt sources it locks up and doesn't work. cryptd is something else altogether. However, it certainly should not lock up. What kernel version is this? > > This would be inappropriate for upper layer code as they do not > > know whether the underlying algorithm should be parallelised, > > e.g., a PCI offload board certainly should not be parallelised. > > The upper layer should ideally request "cbc(aes)" and the crypto routine > should select the most efficient implementation --- sync on single-core > system, async with cryptd on multi-core system and async with hardware > implementation if you have HIFN crypto card. That's exactly what will happen when the admin instantiates pcrypt. dm-crypt simply needs to specify cbc(aes) and it will get pcrypt automatically. The point is that on a modern processor like Nehalem you don't need pcrypt. > It is pointless to track the submitting CPU. No you are wrong. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert(a)gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Herbert Xu on 2 Jun 2010 01:30 On Wed, Jun 02, 2010 at 01:15:43AM -0400, Mikulas Patocka wrote: > > Almost every CPU is "very slow" so that it lags behind disk when > encrypting. CPUs with hardware AES may be the exception. I would not call a platform like Nehalem the exception. > If one CPU submits I/O for 10MB of data, your patch makes no > paralelization at all. Because all those 10MB will be encrypted by the > same CPU that submitted it. He doesn't need to. This is already solved by pcrypt. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert(a)gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: add x86 platform driver tree Next: firewire: core: check for 1394a compliant IRM, fix inaccessibility of Sony camcorder |