From: Shirley Ma on 29 Jul 2010 18:40 Hello Xiaohui, On Thu, 2010-07-29 at 19:14 +0800, xiaohui.xin(a)intel.com wrote: > The idea is simple, just to pin the guest VM user space and then > let host NIC driver has the chance to directly DMA to it. > The patches are based on vhost-net backend driver. We add a device > which provides proto_ops as sendmsg/recvmsg to vhost-net to > send/recv directly to/from the NIC driver. KVM guest who use the > vhost-net backend may bind any ethX interface in the host side to > get copyless data transfer thru guest virtio-net frontend. Since vhost-net already supports macvtap/tun backends, do you think whether it's better to implement zero copy in macvtap/tun than inducing a new media passthrough device here? > Our goal is to improve the bandwidth and reduce the CPU usage. > Exact performance data will be provided later. I did some vhost performance measurement over 10Gb ixgbe, and found that in order to get consistent BW results, netperf/netserver, qemu, vhost threads smp affinities are required. Looking forward to these results for small message size comparison. For large message size 10Gb ixgbe BW already reached by doing vhost smp affinity w/i offloading support, we will see how much CPU utilization it can be reduced. Please provide latency results as well. I did some experimental on macvtap zero copy sendmsg, what I have found that get_user_pages latency pretty high. Thanks Shirley -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Xin, Xiaohui on 30 Jul 2010 05:00 >Hello Xiaohui, > >On Thu, 2010-07-29 at 19:14 +0800, xiaohui.xin(a)intel.com wrote: >> The idea is simple, just to pin the guest VM user space and then >> let host NIC driver has the chance to directly DMA to it. >> The patches are based on vhost-net backend driver. We add a device >> which provides proto_ops as sendmsg/recvmsg to vhost-net to >> send/recv directly to/from the NIC driver. KVM guest who use the >> vhost-net backend may bind any ethX interface in the host side to >> get copyless data transfer thru guest virtio-net frontend. > >Since vhost-net already supports macvtap/tun backends, do you think >whether it's better to implement zero copy in macvtap/tun than inducing >a new media passthrough device here? > I'm not sure if there will be more duplicated code in the kernel. >> Our goal is to improve the bandwidth and reduce the CPU usage. >> Exact performance data will be provided later. > >I did some vhost performance measurement over 10Gb ixgbe, and found that >in order to get consistent BW results, netperf/netserver, qemu, vhost >threads smp affinities are required. > >Looking forward to these results for small message size comparison. For >large message size 10Gb ixgbe BW already reached by doing vhost smp >affinity w/i offloading support, we will see how much CPU utilization it >can be reduced. > >Please provide latency results as well. I did some experimental on >macvtap zero copy sendmsg, what I have found that get_user_pages latency >pretty high. > Ok, I will try that. >Thanks >Shirley > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Shirley Ma on 30 Jul 2010 12:00 Hello Avi, On Fri, 2010-07-30 at 08:02 +0300, Avi Kivity wrote: > get_user_pages() is indeed slow. But what about > get_user_pages_fast()? > > Note that when the page is first touched, get_user_pages_fast() falls > back to get_user_pages(), so the latency needs to be measured after > quite a bit of warm-up. Yes, I used get_user_pages_fast, however if falled back to get_user_pages() when the apps doesn't allocate buffer on the same page. If I run a single ping, the RTT is extremely high, but when running multiple pings, the RTT time reduce significantly, but still it is not as fast as copy from my initial test. I am thinking that we might need to pre-pin memory pool. Shirley -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michael S. Tsirkin on 1 Aug 2010 04:40 On Thu, Jul 29, 2010 at 03:31:22PM -0700, Shirley Ma wrote: > I did some vhost performance measurement over 10Gb ixgbe, and found that > in order to get consistent BW results, netperf/netserver, qemu, vhost > threads smp affinities are required. Could you provide an example of a good setup? Specifically, is it a good idea for the vhost thread to inherit CPU affinities from qemu? > Looking forward to these results for small message size comparison. I think we should explore the idea for the driver to fall back on data copy for small message sizes. The benefit of zero copy would then be CPU utilization on large messages. > For > large message size 10Gb ixgbe BW already reached by doing vhost smp > affinity w/i offloading support, we will see how much CPU utilization it > can be reduced. > > Please provide latency results as well. I did some experimental on > macvtap zero copy sendmsg, what I have found that get_user_pages latency > pretty high. > > Thanks > Shirley > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Shirley Ma on 2 Aug 2010 12:10 Hello Avi, On Sun, 2010-08-01 at 11:18 +0300, Avi Kivity wrote: > I don't understand. Under what conditions do you use > get_user_pages() > instead of get_user_pages_fast()? Why? The code always calls get_user_pages_fast, however, the page will be unpinned in skb_free if the same page is not used again for a new buffer. The reason for unpin the page is we don't want to pin all of the guest kernel memory(memory over commit). So get_user_pages_fast will call slow path get_user_pages. Your previous comment is suggesting to keep the page pinned for get_user_pages_fast fast path? Thanks Shirley -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Next
|
Last
Pages: 1 2 3 4 Prev: [PATCH 05/13] AppArmor: dfa match engine Next: 2.6.29.6: nfsd: non-standard errno: -9 |