Prev: New HID device Philips Remote RC 153_Vista
Next: tpm_infineon: Fix suspend/resume handler for pnp_driver
From: Jarek Poplawski on 6 Jan 2010 16:20 On Wed, Jan 06, 2010 at 03:33:05PM -0500, Michael Breuer wrote: > On 1/6/2010 3:22 PM, Jarek Poplawski wrote: > >On Wed, Jan 06, 2010 at 02:49:38PM -0500, Michael Breuer wrote: > >>On 1/6/2010 2:22 AM, Jarek Poplawski wrote: > >>>On Tue, Jan 05, 2010 at 09:36:28PM -0500, Michael Breuer wrote: > >>>>On 1/5/2010 6:07 PM, Jarek Poplawski wrote: > >>>>>-----------------> > >>>>> > >>>>>Changing an skb after dev_queue_xmit() is illegal. And since it's > >>>>>inconsistent to treat specially net_xmit_errno() non-zero return, > >>>>>while ignoring other dev_queue_xmit() errors, there is no reason > >>>>>to break the loop in tpacket_snd() in this case. > >>>>> > >>>>>With debugging by: Stephen Hemminger<shemminger(a)linux-foundation.org> > >>>>> > >>>>>Reported-by: Michael Breuer<mbreuer(a)majjas.com> > >>>>>Signed-off-by: Jarek Poplawski<jarkao2(a)gmail.com> > >>>>>--- > >>>>> > >>>>> net/packet/af_packet.c | 8 +++----- > >>>>> 1 files changed, 3 insertions(+), 5 deletions(-) > >>>>> > >>>>>diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > >>>>>index e0516a2..984a1fa 100644 > >>>>>--- a/net/packet/af_packet.c > >>>>>+++ b/net/packet/af_packet.c > >>>>>@@ -1021,8 +1021,9 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) > >>>>> > >>>>> status = TP_STATUS_SEND_REQUEST; > >>>>> err = dev_queue_xmit(skb); > >>>>>- if (unlikely(err> 0&& (err = net_xmit_errno(err)) != 0)) > >>>>>- goto out_xmit; > >>>>>+ if (unlikely(err> 0)) > >>>>>+ err = net_xmit_errno(err); > >>>>>+ > >>>>> packet_increment_head(&po->tx_ring); > >>>>> len_sum += tp_len; > >>>>> } while (likely((ph != NULL) || > >>>>>@@ -1033,9 +1034,6 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) > >>>>> err = len_sum; > >>>>> goto out_put; > >>>>> > >>>>>-out_xmit: > >>>>>- skb->destructor = sock_wfree; > >>>>>- atomic_dec(&po->tx_ring.pending); > >>>>> out_status: > >>>>> __packet_set_status(po, ph, status); > >>>>> kfree_skb(skb); > >>>>>-- .... > >>This patch at first behaved similarly to the previous one - seemed > >>to be running a bit better... until the adapter went down :( > >I'm not sure: do you mean this patch above vs previous one by Stephen, > >or did you manage to try my "alernative #2" patch already? > > > >BTW, I forgot to mention, and maybe it doesn't matter here, but it > >would be better to (always) use my sky2 patch from Berck Nash's > >thread. > > > >Jarek P. > This was using "alternative #2" patch. I didn't get the hang with > alternative #1. Your sky2 patch from Berck Nash's thread was > included in both cases; Stephen's was not. OK, so I guess "alternative #1" (above) seems safer to recommend for now (as I assumed earlier). On the other hand, we really don't know if it's only because it's because it's nicer for your hardware (or still some other bug around), so as before: let David choose ;-) BTW, I think you could still use Stephen's patch too (there might be still something more like this). There was also mentioned this network manager again. I might be wrong, but IMHO there could be some interaction even if it doesn't use this device; so could/did you try to disable it entirely? Thanks for testing! Jarek P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michael Breuer on 6 Jan 2010 16:30 On 1/6/2010 4:10 PM, Stephen Hemminger wrote: > On Wed, 06 Jan 2010 14:49:38 -0500 > Michael Breuer<mbreuer(a)majjas.com> wrote: > > >> This patch at first behaved similarly to the previous one - seemed to be >> running a bit better... until the adapter went down :( >> >> This is the syslog output at the time the network failed: >> Jan 6 14:11:01 mail kernel: sky2 0000:06:00.0: error interrupt >> status=0x40000008 >> Jan 6 14:11:01 mail kernel: sky2 software interrupt status 0x40000008 >> > Could you go back to baseline sky2 driver. The display code might be buggy. > These bits indicate an error in the MAC. The interrupt source enabled > is Transmit FIFO underrun. > > Looking at how vendor driver handles this. > It looks like the Yukon EC_U chip doesn't really do Jumbo frames correctly. > Maybe not enough internal buffering to ensure that the whole packet > is in the chip. Of course, none of this is in the chip manual. > > Does this help > -------------- > --- a/drivers/net/sky2.c 2010-01-06 12:48:43.012318966 -0800 > +++ b/drivers/net/sky2.c 2010-01-06 13:05:31.273987255 -0800 > @@ -792,33 +792,21 @@ static void sky2_set_tx_stfwd(struct sky > { > struct net_device *dev = hw->dev[port]; > > - if ( (hw->chip_id == CHIP_ID_YUKON_EX&& > - hw->chip_rev != CHIP_REV_YU_EX_A0) || > - hw->chip_id>= CHIP_ID_YUKON_FE_P) { > - /* Yukon-Extreme B0 and further Extreme devices */ > - /* enable Store& Forward mode for TX */ > - > - if (dev->mtu<= ETH_DATA_LEN) > - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), > - TX_JUMBO_DIS | TX_STFW_ENA); > - > - else > - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), > - TX_JUMBO_ENA| TX_STFW_ENA); > - } else { > - if (dev->mtu<= ETH_DATA_LEN) > - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA); > - else { > - /* set Tx GMAC FIFO Almost Empty Threshold */ > - sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR), > - (ECU_JUMBO_WM<< 16) | ECU_AE_THR); > - > - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_DIS); > - > - /* Can't do offload because of lack of store/forward */ > - dev->features&= ~(NETIF_F_TSO | NETIF_F_SG | NETIF_F_ALL_CSUM); > - } > - } > + if ( (hw->chip_id == CHIP_ID_YUKON_EX&& hw->chip_rev != CHIP_REV_YU_EX_A0) || > + hw->chip_id>= CHIP_ID_YUKON_FE_P) { > + /* Yukon-Extreme B0 and further Extreme devices */ > + /* enable Store& Forward mode for TX */ > + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA); > + } else if (dev->mtu> ETH_DATA_LEN) { > + /* set Tx GMAC FIFO Almost Empty Threshold */ > + sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR), > + (ECU_JUMBO_WM<< 16) | ECU_AE_THR); > + /* disable Store& Forward mode for TX */ > + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_DIS); > + } else { > + /* enable Store& Forward mode for TX */ > + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA); > + } > } > > static void sky2_mac_init(struct sky2_hw *hw, unsigned port) > @@ -2185,11 +2173,16 @@ static int sky2_change_mtu(struct net_de > if (new_mtu< ETH_ZLEN || new_mtu> ETH_JUMBO_MTU) > return -EINVAL; > > + /* MTU> 1500 on yukon FE and FE+ not allowed */ > if (new_mtu> ETH_DATA_LEN&& > (hw->chip_id == CHIP_ID_YUKON_FE || > hw->chip_id == CHIP_ID_YUKON_FE_P)) > return -EINVAL; > > + /* TSO on Yukon Ultra and MTU> 1500 not supported */ > + if (new_mtu> ETH_DATA_LEN&& hw->chip_id == CHIP_ID_YUKON_EC_U) > + dev->features&= ~NETIF_F_TSO; > + > if (!netif_running(dev)) { > dev->mtu = new_mtu; > return 0; > @@ -2233,6 +2226,15 @@ static int sky2_change_mtu(struct net_de > if (err) > dev_close(dev); > else { > + /* WA for dev. #4.209 */ > + if (hw->chip_id == CHIP_ID_YUKON_EC_U&& > + hw->chip_rev == CHIP_REV_YU_EC_U_A1) { > + /* enable/disable Store& Forward mode for TX */ > + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), > + sky2->speed != SPEED_1000 > + ? TX_STFW_ENA : TX_STFW_DIS); > + } > + > gma_write16(hw, port, GM_GP_CTRL, ctl); > > netif_wake_queue(dev); > --- a/drivers/net/sky2.h 2010-01-06 12:48:48.632247424 -0800 > +++ b/drivers/net/sky2.h 2010-01-06 12:59:57.322078964 -0800 > @@ -1901,8 +1901,8 @@ enum { > TX_VLAN_TAG_ON = 1<<25,/* enable VLAN tagging */ > TX_VLAN_TAG_OFF = 1<<24,/* disable VLAN tagging */ > > - TX_JUMBO_ENA = 1<<23,/* PCI Jumbo Mode enable (Yukon-EC Ultra) */ > - TX_JUMBO_DIS = 1<<22,/* PCI Jumbo Mode enable (Yukon-EC Ultra) */ > + TX_PCI_JUM_ENA = 1<<23,/* Enable PCI Jumbo Mode (Yukon-EC Ultra) */ > + TX_PCI_JUM_DIS = 1<<22,/* Disable PCI Jumbo Mode (Yukon-EC Ultra) */ > > GMF_WSP_TST_ON = 1<<18,/* Write Shadow Pointer Test On */ > GMF_WSP_TST_OFF = 1<<17,/* Write Shadow Pointer Test Off */ > I'll try this a bit later today. However, early on, I saw the same issues with MTU=1500. Also, maybe I'm missing something, but I can only recreate the issue with a high receive rate. Given the interaction with DHCP, for example, I'm thinking that there is some precondition that is as yet unknown. May be buggy hardware, or perhaps a race condition resulting in a corrupt i/o buffer somewhere. I'm wondering whether there's some useful place to insert some diagnostics on the RX side - at least we can see if there are any consistent events on the RX side preceding the TX error. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michael Breuer on 6 Jan 2010 16:40 On 1/6/2010 4:09 PM, Jarek Poplawski wrote: > On Wed, Jan 06, 2010 at 03:33:05PM -0500, Michael Breuer wrote: > >> On 1/6/2010 3:22 PM, Jarek Poplawski wrote: >> >>> On Wed, Jan 06, 2010 at 02:49:38PM -0500, Michael Breuer wrote: >>> >>>> On 1/6/2010 2:22 AM, Jarek Poplawski wrote: >>>> >>>>> On Tue, Jan 05, 2010 at 09:36:28PM -0500, Michael Breuer wrote: >>>>> >>>>>> On 1/5/2010 6:07 PM, Jarek Poplawski wrote: >>>>>> >>>>>>> -----------------> >>>>>>> >>>>>>> Changing an skb after dev_queue_xmit() is illegal. And since it's >>>>>>> inconsistent to treat specially net_xmit_errno() non-zero return, >>>>>>> while ignoring other dev_queue_xmit() errors, there is no reason >>>>>>> to break the loop in tpacket_snd() in this case. >>>>>>> >>>>>>> With debugging by: Stephen Hemminger<shemminger(a)linux-foundation.org> >>>>>>> >>>>>>> Reported-by: Michael Breuer<mbreuer(a)majjas.com> >>>>>>> Signed-off-by: Jarek Poplawski<jarkao2(a)gmail.com> >>>>>>> --- >>>>>>> >>>>>>> net/packet/af_packet.c | 8 +++----- >>>>>>> 1 files changed, 3 insertions(+), 5 deletions(-) >>>>>>> >>>>>>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c >>>>>>> index e0516a2..984a1fa 100644 >>>>>>> --- a/net/packet/af_packet.c >>>>>>> +++ b/net/packet/af_packet.c >>>>>>> @@ -1021,8 +1021,9 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) >>>>>>> >>>>>>> status = TP_STATUS_SEND_REQUEST; >>>>>>> err = dev_queue_xmit(skb); >>>>>>> - if (unlikely(err> 0&& (err = net_xmit_errno(err)) != 0)) >>>>>>> - goto out_xmit; >>>>>>> + if (unlikely(err> 0)) >>>>>>> + err = net_xmit_errno(err); >>>>>>> + >>>>>>> packet_increment_head(&po->tx_ring); >>>>>>> len_sum += tp_len; >>>>>>> } while (likely((ph != NULL) || >>>>>>> @@ -1033,9 +1034,6 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) >>>>>>> err = len_sum; >>>>>>> goto out_put; >>>>>>> >>>>>>> -out_xmit: >>>>>>> - skb->destructor = sock_wfree; >>>>>>> - atomic_dec(&po->tx_ring.pending); >>>>>>> out_status: >>>>>>> __packet_set_status(po, ph, status); >>>>>>> kfree_skb(skb); >>>>>>> -- >>>>>>> > ... > >>>> This patch at first behaved similarly to the previous one - seemed >>>> to be running a bit better... until the adapter went down :( >>>> >>> I'm not sure: do you mean this patch above vs previous one by Stephen, >>> or did you manage to try my "alernative #2" patch already? >>> >>> BTW, I forgot to mention, and maybe it doesn't matter here, but it >>> would be better to (always) use my sky2 patch from Berck Nash's >>> thread. >>> >>> Jarek P. >>> >> This was using "alternative #2" patch. I didn't get the hang with >> alternative #1. Your sky2 patch from Berck Nash's thread was >> included in both cases; Stephen's was not. >> > OK, so I guess "alternative #1" (above) seems safer to recommend for > now (as I assumed earlier). > > On the other hand, we really don't know if it's only because it's > because it's nicer for your hardware (or still some other bug around), > so as before: let David choose ;-) > > BTW, I think you could still use Stephen's patch too (there might be > still something more like this). There was also mentioned this network > manager again. I might be wrong, but IMHO there could be some > interaction even if it doesn't use this device; so could/did you try > to disable it entirely? > > Thanks for testing! > Jarek P. > > > Just reran without the network manager - no change. Going to rerun with Stephen's new patch, alternative #1, and the patch from Berck Nash's thread. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michael Breuer on 6 Jan 2010 18:30 On 1/6/2010 4:10 PM, Stephen Hemminger wrote: > On Wed, 06 Jan 2010 14:49:38 -0500 > Michael Breuer<mbreuer(a)majjas.com> wrote: > > >> This patch at first behaved similarly to the previous one - seemed to be >> running a bit better... until the adapter went down :( >> >> This is the syslog output at the time the network failed: >> Jan 6 14:11:01 mail kernel: sky2 0000:06:00.0: error interrupt >> status=0x40000008 >> Jan 6 14:11:01 mail kernel: sky2 software interrupt status 0x40000008 >> > Could you go back to baseline sky2 driver. The display code might be buggy. > These bits indicate an error in the MAC. The interrupt source enabled > is Transmit FIFO underrun. > > Looking at how vendor driver handles this. > It looks like the Yukon EC_U chip doesn't really do Jumbo frames correctly. > Maybe not enough internal buffering to ensure that the whole packet > is in the chip. Of course, none of this is in the chip manual. > > Does this help > -------------- > --- a/drivers/net/sky2.c 2010-01-06 12:48:43.012318966 -0800 > +++ b/drivers/net/sky2.c 2010-01-06 13:05:31.273987255 -0800 > @@ -792,33 +792,21 @@ static void sky2_set_tx_stfwd(struct sky > { > struct net_device *dev = hw->dev[port]; > > - if ( (hw->chip_id == CHIP_ID_YUKON_EX&& > - hw->chip_rev != CHIP_REV_YU_EX_A0) || > - hw->chip_id>= CHIP_ID_YUKON_FE_P) { > - /* Yukon-Extreme B0 and further Extreme devices */ > - /* enable Store& Forward mode for TX */ > - > - if (dev->mtu<= ETH_DATA_LEN) > - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), > - TX_JUMBO_DIS | TX_STFW_ENA); > - > - else > - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), > - TX_JUMBO_ENA| TX_STFW_ENA); > - } else { > - if (dev->mtu<= ETH_DATA_LEN) > - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA); > - else { > - /* set Tx GMAC FIFO Almost Empty Threshold */ > - sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR), > - (ECU_JUMBO_WM<< 16) | ECU_AE_THR); > - > - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_DIS); > - > - /* Can't do offload because of lack of store/forward */ > - dev->features&= ~(NETIF_F_TSO | NETIF_F_SG | NETIF_F_ALL_CSUM); > - } > - } > + if ( (hw->chip_id == CHIP_ID_YUKON_EX&& hw->chip_rev != CHIP_REV_YU_EX_A0) || > + hw->chip_id>= CHIP_ID_YUKON_FE_P) { > + /* Yukon-Extreme B0 and further Extreme devices */ > + /* enable Store& Forward mode for TX */ > + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA); > + } else if (dev->mtu> ETH_DATA_LEN) { > + /* set Tx GMAC FIFO Almost Empty Threshold */ > + sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR), > + (ECU_JUMBO_WM<< 16) | ECU_AE_THR); > + /* disable Store& Forward mode for TX */ > + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_DIS); > + } else { > + /* enable Store& Forward mode for TX */ > + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA); > + } > } > > static void sky2_mac_init(struct sky2_hw *hw, unsigned port) > @@ -2185,11 +2173,16 @@ static int sky2_change_mtu(struct net_de > if (new_mtu< ETH_ZLEN || new_mtu> ETH_JUMBO_MTU) > return -EINVAL; > > + /* MTU> 1500 on yukon FE and FE+ not allowed */ > if (new_mtu> ETH_DATA_LEN&& > (hw->chip_id == CHIP_ID_YUKON_FE || > hw->chip_id == CHIP_ID_YUKON_FE_P)) > return -EINVAL; > > + /* TSO on Yukon Ultra and MTU> 1500 not supported */ > + if (new_mtu> ETH_DATA_LEN&& hw->chip_id == CHIP_ID_YUKON_EC_U) > + dev->features&= ~NETIF_F_TSO; > + > if (!netif_running(dev)) { > dev->mtu = new_mtu; > return 0; > @@ -2233,6 +2226,15 @@ static int sky2_change_mtu(struct net_de > if (err) > dev_close(dev); > else { > + /* WA for dev. #4.209 */ > + if (hw->chip_id == CHIP_ID_YUKON_EC_U&& > + hw->chip_rev == CHIP_REV_YU_EC_U_A1) { > + /* enable/disable Store& Forward mode for TX */ > + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), > + sky2->speed != SPEED_1000 > + ? TX_STFW_ENA : TX_STFW_DIS); > + } > + > gma_write16(hw, port, GM_GP_CTRL, ctl); > > netif_wake_queue(dev); > --- a/drivers/net/sky2.h 2010-01-06 12:48:48.632247424 -0800 > +++ b/drivers/net/sky2.h 2010-01-06 12:59:57.322078964 -0800 > @@ -1901,8 +1901,8 @@ enum { > TX_VLAN_TAG_ON = 1<<25,/* enable VLAN tagging */ > TX_VLAN_TAG_OFF = 1<<24,/* disable VLAN tagging */ > > - TX_JUMBO_ENA = 1<<23,/* PCI Jumbo Mode enable (Yukon-EC Ultra) */ > - TX_JUMBO_DIS = 1<<22,/* PCI Jumbo Mode enable (Yukon-EC Ultra) */ > + TX_PCI_JUM_ENA = 1<<23,/* Enable PCI Jumbo Mode (Yukon-EC Ultra) */ > + TX_PCI_JUM_DIS = 1<<22,/* Disable PCI Jumbo Mode (Yukon-EC Ultra) */ > > GMF_WSP_TST_ON = 1<<18,/* Write Shadow Pointer Test On */ > GMF_WSP_TST_OFF = 1<<17,/* Write Shadow Pointer Test Off */ > Ok ... results - and maybe some more clues... Running with this patch; Jarek's "alternative 1", and the patch from the other thread. Not so good. No reported errors (sky2, etc.) - however with mtu=9000, lots of stuff broke: XDMCP; http via MASQ/netfilter, ssh connections intermittently (when large frames involved perhaps), etc. Tried to change mtu to 1500 on the fly, got a bunch of errors - and network watchdog kicked in. Have now rebooted with the same patches and mtu=1500. .... with mtu=1500, Everything is again working (i.e., XDMCP, netfilter, etc.) Load test with mtu=1500 went well for a while - high throughput sustained for a few minutes - then similar crash as before... but no interrup error messages this time until after the oops: <nothing of note before this> Jan 6 18:17:54 mail kernel: DRHD: handling fault status reg 2 Jan 6 18:17:54 mail kernel: DMAR:[DMA Read] Request device [06:00.0] fault addr 1bbfe000 Jan 6 18:17:54 mail kernel: DMAR:[fault reason 06] PTE Read access is not set Jan 6 18:17:54 mail kernel: sky2 0000:06:00.0: error interrupt status=0x80000000 Jan 6 18:17:54 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010) Jan 6 18:18:04 mail kernel: ------------[ cut here ]------------ Jan 6 18:18:04 mail kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xf3/0x164() Jan 6 18:18:04 mail kernel: Hardware name: System Product Name Jan 6 18:18:04 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit queue 0 timed out Jan 6 18:18:04 mail kernel: Modules linked in: ip6table_filter ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat nf_nat iptable_mangle iptable_raw bridge stp appletalk psnap llc nfsd lockd nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc acpi_cpufreq sit tunnel4 ipt_LOG nf_conntrack_netbios_ns nf_conntrack_ftp xt_DSCP xt_dscp xt_MARK nf_conntrack_ipv6 xt_multiport ipv6 dm_multipath kvm_intel kvm snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_ac97_codec snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq snd_seq_device gspca_spca505 gspca_main videodev v4l1_compat snd_pcm v4l2_compat_ioctl32 pcspkr asus_atk0110 hwmon i2c_i801 iTCO_wdt firewire_ohci iTCO_vendor_support firewire_core crc_itu_t snd_timer snd sky2 soundcore wmi snd_page_alloc fbcon tileblit font bitblit softcursor raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 ata_generic pata_acpi pata_marvell nouveau ttm drm_kms_helper drm agpgart fb i2c_algo_bit cfbcopyarea i2c_core cfbimgblt cfbfil Jan 6 18:18:04 mail kernel: lrect [last unloaded: microcode] Jan 6 18:18:04 mail kernel: Pid: 0, comm: swapper Tainted: G W 2.6.32-00840-gec8257c-dirty #41 Jan 6 18:18:04 mail kernel: Call Trace: Jan 6 18:18:04 mail kernel: <IRQ> [<ffffffff8105365a>] warn_slowpath_common+0x7c/0x94 Jan 6 18:18:04 mail kernel: [<ffffffff810536c9>] warn_slowpath_fmt+0x41/0x43 Jan 6 18:18:04 mail kernel: [<ffffffff813e12bf>] ? netif_tx_lock+0x44/0x6c Jan 6 18:18:04 mail kernel: [<ffffffff813e1427>] dev_watchdog+0xf3/0x164 Jan 6 18:18:04 mail kernel: [<ffffffff81077696>] ? sched_clock_cpu+0x47/0xd1 Jan 6 18:18:04 mail kernel: [<ffffffff8106316b>] run_timer_softirq+0x1c8/0x270 Jan 6 18:18:04 mail kernel: [<ffffffff8105ae3b>] __do_softirq+0xf8/0x1cd Jan 6 18:18:04 mail kernel: [<ffffffff8107ef33>] ? tick_program_event+0x2a/0x2c Jan 6 18:18:04 mail kernel: [<ffffffff81012e1c>] call_softirq+0x1c/0x30 Jan 6 18:18:04 mail kernel: [<ffffffff810143a3>] do_softirq+0x4b/0xa6 Jan 6 18:18:04 mail kernel: [<ffffffff8105aa1b>] irq_exit+0x4a/0x8c Jan 6 18:18:04 mail kernel: [<ffffffff8146dd32>] smp_apic_timer_interrupt+0x86/0x94 Jan 6 18:18:04 mail kernel: [<ffffffff810127e3>] apic_timer_interrupt+0x13/0x20 Jan 6 18:18:04 mail kernel: <EOI> [<ffffffff812c4a06>] ? acpi_idle_enter_c1+0xb2/0xd0 Jan 6 18:18:04 mail kernel: [<ffffffff812c49ff>] ? acpi_idle_enter_c1+0xab/0xd0 Jan 6 18:18:04 mail kernel: [<ffffffff813a43b8>] ? cpuidle_idle_call+0x9e/0xfa Jan 6 18:18:04 mail kernel: [<ffffffff81010c90>] ? cpu_idle+0xb4/0xf6 Jan 6 18:18:04 mail kernel: [<ffffffff81463312>] ? start_secondary+0x201/0x242 Jan 6 18:18:04 mail kernel: ---[ end trace 57f7151f6a5def07 ]--- Jan 6 18:18:04 mail kernel: sky2 eth0: tx timeout Jan 6 18:18:04 mail kernel: sky2 eth0: transmit ring 21 .. 108 report=21 done=21 Jan 6 18:18:04 mail kernel: sky2 eth0: disabling interface Jan 6 18:18:04 mail kernel: sky2 eth0: enabling interface <eth0 dead after this> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michael Breuer on 6 Jan 2010 21:50
On 1/6/2010 6:26 PM, Michael Breuer wrote: > On 1/6/2010 4:10 PM, Stephen Hemminger wrote: >> On Wed, 06 Jan 2010 14:49:38 -0500 >> Michael Breuer<mbreuer(a)majjas.com> wrote: >> >>> This patch at first behaved similarly to the previous one - seemed >>> to be >>> running a bit better... until the adapter went down :( >>> >>> This is the syslog output at the time the network failed: >>> Jan 6 14:11:01 mail kernel: sky2 0000:06:00.0: error interrupt >>> status=0x40000008 >>> Jan 6 14:11:01 mail kernel: sky2 software interrupt status 0x40000008 >> Could you go back to baseline sky2 driver. The display code might be >> buggy. >> These bits indicate an error in the MAC. The interrupt source enabled >> is Transmit FIFO underrun. >> >> Looking at how vendor driver handles this. >> It looks like the Yukon EC_U chip doesn't really do Jumbo frames >> correctly. >> Maybe not enough internal buffering to ensure that the whole packet >> is in the chip. Of course, none of this is in the chip manual. >> >> Does this help >> -------------- >> --- a/drivers/net/sky2.c 2010-01-06 12:48:43.012318966 -0800 >> +++ b/drivers/net/sky2.c 2010-01-06 13:05:31.273987255 -0800 >> @@ -792,33 +792,21 @@ static void sky2_set_tx_stfwd(struct sky >> { >> struct net_device *dev = hw->dev[port]; >> >> - if ( (hw->chip_id == CHIP_ID_YUKON_EX&& >> - hw->chip_rev != CHIP_REV_YU_EX_A0) || >> - hw->chip_id>= CHIP_ID_YUKON_FE_P) { >> - /* Yukon-Extreme B0 and further Extreme devices */ >> - /* enable Store& Forward mode for TX */ >> - >> - if (dev->mtu<= ETH_DATA_LEN) >> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), >> - TX_JUMBO_DIS | TX_STFW_ENA); >> - >> - else >> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), >> - TX_JUMBO_ENA| TX_STFW_ENA); >> - } else { >> - if (dev->mtu<= ETH_DATA_LEN) >> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA); >> - else { >> - /* set Tx GMAC FIFO Almost Empty Threshold */ >> - sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR), >> - (ECU_JUMBO_WM<< 16) | ECU_AE_THR); >> - >> - sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_DIS); >> - >> - /* Can't do offload because of lack of store/forward */ >> - dev->features&= ~(NETIF_F_TSO | NETIF_F_SG | >> NETIF_F_ALL_CSUM); >> - } >> - } >> + if ( (hw->chip_id == CHIP_ID_YUKON_EX&& hw->chip_rev != >> CHIP_REV_YU_EX_A0) || >> + hw->chip_id>= CHIP_ID_YUKON_FE_P) { >> + /* Yukon-Extreme B0 and further Extreme devices */ >> + /* enable Store& Forward mode for TX */ >> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA); >> + } else if (dev->mtu> ETH_DATA_LEN) { >> + /* set Tx GMAC FIFO Almost Empty Threshold */ >> + sky2_write32(hw, SK_REG(port, TX_GMF_AE_THR), >> + (ECU_JUMBO_WM<< 16) | ECU_AE_THR); >> + /* disable Store& Forward mode for TX */ >> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_DIS); >> + } else { >> + /* enable Store& Forward mode for TX */ >> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), TX_STFW_ENA); >> + } >> } >> >> static void sky2_mac_init(struct sky2_hw *hw, unsigned port) >> @@ -2185,11 +2173,16 @@ static int sky2_change_mtu(struct net_de >> if (new_mtu< ETH_ZLEN || new_mtu> ETH_JUMBO_MTU) >> return -EINVAL; >> >> + /* MTU> 1500 on yukon FE and FE+ not allowed */ >> if (new_mtu> ETH_DATA_LEN&& >> (hw->chip_id == CHIP_ID_YUKON_FE || >> hw->chip_id == CHIP_ID_YUKON_FE_P)) >> return -EINVAL; >> >> + /* TSO on Yukon Ultra and MTU> 1500 not supported */ >> + if (new_mtu> ETH_DATA_LEN&& hw->chip_id == CHIP_ID_YUKON_EC_U) >> + dev->features&= ~NETIF_F_TSO; >> + >> if (!netif_running(dev)) { >> dev->mtu = new_mtu; >> return 0; >> @@ -2233,6 +2226,15 @@ static int sky2_change_mtu(struct net_de >> if (err) >> dev_close(dev); >> else { >> + /* WA for dev. #4.209 */ >> + if (hw->chip_id == CHIP_ID_YUKON_EC_U&& >> + hw->chip_rev == CHIP_REV_YU_EC_U_A1) { >> + /* enable/disable Store& Forward mode for TX */ >> + sky2_write32(hw, SK_REG(port, TX_GMF_CTRL_T), >> + sky2->speed != SPEED_1000 >> + ? TX_STFW_ENA : TX_STFW_DIS); >> + } >> + >> gma_write16(hw, port, GM_GP_CTRL, ctl); >> >> netif_wake_queue(dev); >> --- a/drivers/net/sky2.h 2010-01-06 12:48:48.632247424 -0800 >> +++ b/drivers/net/sky2.h 2010-01-06 12:59:57.322078964 -0800 >> @@ -1901,8 +1901,8 @@ enum { >> TX_VLAN_TAG_ON = 1<<25,/* enable VLAN tagging */ >> TX_VLAN_TAG_OFF = 1<<24,/* disable VLAN tagging */ >> >> - TX_JUMBO_ENA = 1<<23,/* PCI Jumbo Mode enable (Yukon-EC >> Ultra) */ >> - TX_JUMBO_DIS = 1<<22,/* PCI Jumbo Mode enable (Yukon-EC >> Ultra) */ >> + TX_PCI_JUM_ENA = 1<<23,/* Enable PCI Jumbo Mode (Yukon-EC >> Ultra) */ >> + TX_PCI_JUM_DIS = 1<<22,/* Disable PCI Jumbo Mode (Yukon-EC >> Ultra) */ >> >> GMF_WSP_TST_ON = 1<<18,/* Write Shadow Pointer Test On */ >> GMF_WSP_TST_OFF = 1<<17,/* Write Shadow Pointer Test Off */ > Ok ... results - and maybe some more clues... > > Running with this patch; Jarek's "alternative 1", and the patch from > the other thread. Not so good. > > No reported errors (sky2, etc.) - however with mtu=9000, lots of stuff > broke: XDMCP; http via MASQ/netfilter, ssh connections intermittently > (when large frames involved perhaps), etc. Tried to change mtu to 1500 > on the fly, got a bunch of errors - and network watchdog kicked in. > Have now rebooted with the same patches and mtu=1500. > ... with mtu=1500, Everything is again working (i.e., XDMCP, > netfilter, etc.) > Load test with mtu=1500 went well for a while - high throughput > sustained for a few minutes - then similar crash as before... but no > interrup error messages this time until after the oops: > <nothing of note before this> > Jan 6 18:17:54 mail kernel: DRHD: handling fault status reg 2 > Jan 6 18:17:54 mail kernel: DMAR:[DMA Read] Request device [06:00.0] > fault addr 1bbfe000 > Jan 6 18:17:54 mail kernel: DMAR:[fault reason 06] PTE Read access is > not set > Jan 6 18:17:54 mail kernel: sky2 0000:06:00.0: error interrupt > status=0x80000000 > Jan 6 18:17:54 mail kernel: sky2 0000:06:00.0: PCI hardware error > (0x2010) > Jan 6 18:18:04 mail kernel: ------------[ cut here ]------------ > Jan 6 18:18:04 mail kernel: WARNING: at net/sched/sch_generic.c:261 > dev_watchdog+0xf3/0x164() > Jan 6 18:18:04 mail kernel: Hardware name: System Product Name > Jan 6 18:18:04 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit > queue 0 timed out > Jan 6 18:18:04 mail kernel: Modules linked in: ip6table_filter > ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat nf_nat > iptable_mangle iptable_raw bridge stp appletalk psnap llc nfsd lockd > nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc acpi_cpufreq > sit tunnel4 ipt_LOG nf_conntrack_netbios_ns nf_conntrack_ftp xt_DSCP > xt_dscp xt_MARK nf_conntrack_ipv6 xt_multiport ipv6 dm_multipath > kvm_intel kvm snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi > snd_ac97_codec snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq > snd_seq_device gspca_spca505 gspca_main videodev v4l1_compat snd_pcm > v4l2_compat_ioctl32 pcspkr asus_atk0110 hwmon i2c_i801 iTCO_wdt > firewire_ohci iTCO_vendor_support firewire_core crc_itu_t snd_timer > snd sky2 soundcore wmi snd_page_alloc fbcon tileblit font bitblit > softcursor raid456 async_raid6_recov async_pq raid6_pq async_xor xor > async_memcpy async_tx raid1 ata_generic pata_acpi pata_marvell nouveau > ttm drm_kms_helper drm agpgart fb i2c_algo_bit cfbcopyarea i2c_core > cfbimgblt cfbfil > Jan 6 18:18:04 mail kernel: lrect [last unloaded: microcode] > Jan 6 18:18:04 mail kernel: Pid: 0, comm: swapper Tainted: G > W 2.6.32-00840-gec8257c-dirty #41 > Jan 6 18:18:04 mail kernel: Call Trace: > Jan 6 18:18:04 mail kernel: <IRQ> [<ffffffff8105365a>] > warn_slowpath_common+0x7c/0x94 > Jan 6 18:18:04 mail kernel: [<ffffffff810536c9>] > warn_slowpath_fmt+0x41/0x43 > Jan 6 18:18:04 mail kernel: [<ffffffff813e12bf>] ? > netif_tx_lock+0x44/0x6c > Jan 6 18:18:04 mail kernel: [<ffffffff813e1427>] dev_watchdog+0xf3/0x164 > Jan 6 18:18:04 mail kernel: [<ffffffff81077696>] ? > sched_clock_cpu+0x47/0xd1 > Jan 6 18:18:04 mail kernel: [<ffffffff8106316b>] > run_timer_softirq+0x1c8/0x270 > Jan 6 18:18:04 mail kernel: [<ffffffff8105ae3b>] __do_softirq+0xf8/0x1cd > Jan 6 18:18:04 mail kernel: [<ffffffff8107ef33>] ? > tick_program_event+0x2a/0x2c > Jan 6 18:18:04 mail kernel: [<ffffffff81012e1c>] call_softirq+0x1c/0x30 > Jan 6 18:18:04 mail kernel: [<ffffffff810143a3>] do_softirq+0x4b/0xa6 > Jan 6 18:18:04 mail kernel: [<ffffffff8105aa1b>] irq_exit+0x4a/0x8c > Jan 6 18:18:04 mail kernel: [<ffffffff8146dd32>] > smp_apic_timer_interrupt+0x86/0x94 > Jan 6 18:18:04 mail kernel: [<ffffffff810127e3>] > apic_timer_interrupt+0x13/0x20 > Jan 6 18:18:04 mail kernel: <EOI> [<ffffffff812c4a06>] ? > acpi_idle_enter_c1+0xb2/0xd0 > Jan 6 18:18:04 mail kernel: [<ffffffff812c49ff>] ? > acpi_idle_enter_c1+0xab/0xd0 > Jan 6 18:18:04 mail kernel: [<ffffffff813a43b8>] ? > cpuidle_idle_call+0x9e/0xfa > Jan 6 18:18:04 mail kernel: [<ffffffff81010c90>] ? cpu_idle+0xb4/0xf6 > Jan 6 18:18:04 mail kernel: [<ffffffff81463312>] ? > start_secondary+0x201/0x242 > Jan 6 18:18:04 mail kernel: ---[ end trace 57f7151f6a5def07 ]--- > Jan 6 18:18:04 mail kernel: sky2 eth0: tx timeout > Jan 6 18:18:04 mail kernel: sky2 eth0: transmit ring 21 .. 108 > report=21 done=21 > Jan 6 18:18:04 mail kernel: sky2 eth0: disabling interface > Jan 6 18:18:04 mail kernel: sky2 eth0: enabling interface > <eth0 dead after this> Walked through the code based on Jarek's patches... came upon NET_CLS_ACT. At least in some cases (sch_cbq.c for example), the net transmit error could be returned from here... after releasing the skb. A quick scan of the various files in net/sched suggests that with NET_CLS_ACT the skb may or may not have been freed in the event of an error. If I have time later I'll see if I can bypass NET_CLS_ACT and see whether this is even relevant. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |