From: Paul Gortmaker on 25 Feb 2010 20:00 On Thu, Feb 25, 2010 at 12:49 PM, Anton Vorontsov <avorontsov(a)ru.mvista.com> wrote: > On Thu, Feb 25, 2010 at 07:51:41PM +0300, Anton Vorontsov wrote: >> On Thu, Feb 25, 2010 at 04:46:54PM +0000, Martyn Welch wrote: >> [...] >> > > nfs: server 192.168.0.1 not responding, still trying >> > > >> > >> > Further testing has shown that this isn't restricted to warm reboots, it >> > happens from cold as well. In addition, the exact timing of the failure >> > seems to vary, some boots have got further before failing. >> >> Unfortunately I don't have any 8641 boards near me, so I can't >> debug this myself. Though, I tested gianfar on MPC8568E-MDS with >> 2.6.33 kernel, and it seems to work just fine. >> >> I see you use SMP. Can you try to turn it off? If that will fix >> the issue, then it'll be a good data point. >> >> Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you >> know the results. > > Nope, no luck. Can't trigger the issue. :-/ > Tested with NFS boot, TCP and UDP netperf tests. I was able to reproduce it on an 8641D and bisected it down to this: ----------- commit a3bc1f11e9b867a4f49505ecac486a33af248b2e Author: Anton Vorontsov <avorontsov(a)ru.mvista.com> Date: Tue Nov 10 14:11:10 2009 +0000 gianfar: Revive SKB recycling Before calling gfar_clean_tx_ring() the driver grabs an irqsave spinlock, and then tries to recycle skbs. But since skb_recycle_check() returns 0 with IRQs disabled, we'll never recycle any skbs. It appears that gfar_clean_tx_ring() and gfar_start_xmit() are mostly idependent and can work in parallel, except when they modify num_txbdfree. So we can drop the lock from most sections and thus fix the skb recycling. ----------- ....which probably explains why you weren't seeing it on non-SMP. I'd imagine it would show up on any of the e500mc boards too. I'd done a rev-list on gianfar.[ch] from 32 to 33-rc1, and then cherry-picked those onto a 32 baseline to reduce the scale of the bisection, but I don't think that should impact the final result I got in any meaningful way. Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Anton Vorontsov on 25 Feb 2010 22:20 On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote: [...] > I was able to reproduce it on an 8641D and bisected it down to this: > > ----------- > commit a3bc1f11e9b867a4f49505ecac486a33af248b2e > Author: Anton Vorontsov <avorontsov(a)ru.mvista.com> > Date: Tue Nov 10 14:11:10 2009 +0000 > > gianfar: Revive SKB recycling Thanks for the bisect. I have a guess why tx hangs in SMP case. Could anyone try the patch down below? [...] > ...which probably explains why you weren't seeing it on non-SMP. > I'd imagine it would show up on any of the e500mc boards too. Yeah.. Pity, I don't have SMP boards anymore. I'll try to get one though. diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 8bd3c9f..3ff3bd0 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget) tx_queue = priv->tx_queue[rx_queue->qindex]; tx_cleaned += gfar_clean_tx_ring(tx_queue); + if (!tx_cleaned && !tx_queue->num_txbdfree) + tx_cleaned += 1; /* don't complete napi */ rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue, budget_per_queue); rx_cleaned += rx_cleaned_per_queue; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Kumar Gopalpet-B05799 on 26 Feb 2010 00:00 >-----Original Message----- >From: Anton Vorontsov [mailto:avorontsov(a)ru.mvista.com] >Sent: Friday, February 26, 2010 8:45 AM >To: Paul Gortmaker >Cc: Martyn Welch; linuxppc-dev list; netdev(a)vger.kernel.org; >linux-kernel(a)vger.kernel.org; Kumar Gopalpet-B05799; >davem(a)davemloft.net; Kumar Gala >Subject: Re: Gianfar driver failing on MPC8641D based board > >On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote: >[...] >> I was able to reproduce it on an 8641D and bisected it down to this: >> >> ----------- >> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e >> Author: Anton Vorontsov <avorontsov(a)ru.mvista.com> >> Date: Tue Nov 10 14:11:10 2009 +0000 >> >> gianfar: Revive SKB recycling > >Thanks for the bisect. I have a guess why tx hangs in SMP >case. Could anyone try the patch down below? > >[...] >> ...which probably explains why you weren't seeing it on non-SMP. >> I'd imagine it would show up on any of the e500mc boards too. > >Yeah.. Pity, I don't have SMP boards anymore. I'll try to get >one though. > > >diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c >index 8bd3c9f..3ff3bd0 100644 >--- a/drivers/net/gianfar.c >+++ b/drivers/net/gianfar.c >@@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct >*napi, int budget) > tx_queue = priv->tx_queue[rx_queue->qindex]; > > tx_cleaned += gfar_clean_tx_ring(tx_queue); >+ if (!tx_cleaned && !tx_queue->num_txbdfree) >+ tx_cleaned += 1; /* don't >complete napi */ > rx_cleaned_per_queue = >gfar_clean_rx_ring(rx_queue, > >budget_per_queue); > rx_cleaned += rx_cleaned_per_queue; > Anton, There is also one more issue that I have been observing with the patch "gianfar: Revive SKB recycling". The issue is when I do a IPV4 forwarding test scenario with bidirectional flows (SMP environment). I am using Spirent smart bits (smartflow) for automation testing and I frequently observe smart flow reporting "Rx packet counte greater than Tx packet count. Duplicate packets might have been received". To just get over the issue I have removed this patch and I didn't see the issue. To a certain extent I could get over the problem by using atomic_t for num_txbdfree (atomic_add and atomic_dec instructions for updating the num_txbdfree) and completely removing the spin_locks in the tx routines. Also, I feel we might want to make some more changes to the gfar_clean_tx_ring( ) and gfar_start_xmit() routines so that they can operate parallely. I am really sorry for not posting it a bit earlier as I am caught up with some urgent issues. -- Thanks Sandeep -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Martyn Welch on 26 Feb 2010 06:50 Anton Vorontsov wrote: > On Thu, Feb 25, 2010 at 04:46:54PM +0000, Martyn Welch wrote: > [...] > >>> nfs: server 192.168.0.1 not responding, still trying >>> >>> >> Further testing has shown that this isn't restricted to warm reboots, it >> happens from cold as well. In addition, the exact timing of the failure >> seems to vary, some boots have got further before failing. >> > > Unfortunately I don't have any 8641 boards near me, so I can't > debug this myself. Though, I tested gianfar on MPC8568E-MDS with > 2.6.33 kernel, and it seems to work just fine. > > I see you use SMP. Can you try to turn it off? If that will fix > the issue, then it'll be a good data point. > > Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you > know the results. > > Thanks I removed the second core from the dts file rather than truly disabling SMP in the kernel config. Doing this allowed the board to boot reliably. Martyn -- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748 | Barbirolli Square, Manchester, E martyn.welch(a)ge.com | M2 3AB VAT:GB 927559189 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Martyn Welch on 26 Feb 2010 07:10 Anton Vorontsov wrote: > On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote: > [...] > >> I was able to reproduce it on an 8641D and bisected it down to this: >> >> ----------- >> commit a3bc1f11e9b867a4f49505ecac486a33af248b2e >> Author: Anton Vorontsov <avorontsov(a)ru.mvista.com> >> Date: Tue Nov 10 14:11:10 2009 +0000 >> >> gianfar: Revive SKB recycling >> > > Thanks for the bisect. I have a guess why tx hangs in > SMP case. Could anyone try the patch down below? > Yup, no problem. I'm afraid it doesn't resolve the problem for me. > [...] > >> ...which probably explains why you weren't seeing it on non-SMP. >> I'd imagine it would show up on any of the e500mc boards too. >> > > Yeah.. Pity, I don't have SMP boards anymore. I'll try > to get one though. > > > diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c > index 8bd3c9f..3ff3bd0 100644 > --- a/drivers/net/gianfar.c > +++ b/drivers/net/gianfar.c > @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget) > tx_queue = priv->tx_queue[rx_queue->qindex]; > > tx_cleaned += gfar_clean_tx_ring(tx_queue); > + if (!tx_cleaned && !tx_queue->num_txbdfree) > + tx_cleaned += 1; /* don't complete napi */ > rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue, > budget_per_queue); > rx_cleaned += rx_cleaned_per_queue; > -- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748 | Barbirolli Square, Manchester, E martyn.welch(a)ge.com | M2 3AB VAT:GB 927559189 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: Another set of nested svm fixes and optimizations Next: Affordable Loan |