Prev: Staging: cxt1e1: Fix build error
Next: [PATCH 2.6.34-rc6] mxc: gadget: Fix bitfield for calculating maximum packet size
From: Nils Radtke on 31 May 2010 16:20 Hi Reinette, First off: Linux mypole 2.6.33.4 #23 SMP PREEMPT Sat May 15 20:27:33 CEST 2010 i686 GNU/Linux On Thu 2010-05-20 @ 11-33-28AM -0700, reinette chatre wrote: # On Thu, 2010-05-20 at 05:15 -0700, Nils Radtke wrote: # > # # > # To address (1), could you please run with attached debug patch and also # > # enable rate scaling debugging. That will be "modprobe iwlagn # > # debug=0x143fff). # > drivers/net/wireless/iwlwifi/iwl-agn-rs.c: In function ârs_collect_tx_dataâ: # > drivers/net/wireless/iwlwifi/iwl-agn-rs.c:364: error: âprivâ undeclared (first use in this function) # > drivers/net/wireless/iwlwifi/iwl-agn-rs.c:364: error: (Each undeclared identifier is reported only once # > drivers/net/wireless/iwlwifi/iwl-agn-rs.c:364: error: for each function it appears in.) # > # > This happens when compiling w/ the patch applied cleanly against .33.3 # > I'll try to fix it, then conduct the field test. # # Sorry ... and thanks. # # > For the latter, do # > you need the same kind of log as for the previous one? # The goal of this patch is to find the reason behind the error # "expected_tpt should have been calculated by now". From what I # understand you only encountered that in one of your tests, not all. Any # test you can run to reproduce that error will be welcome. Yep. This expected_tpt stuff happens IRC (see mails for certainty, though) exclusively on site B. Ok, so that's the goal. What could I do to advance us an additional step at a time avoiding pushing hundreds of kilobytes of logs uplink? To reproduce it, I have to be on site B and just start surfing. It's a matter of _short_ time until the driver hits the wall. I'm suspecting still getting those expected_tpt thingies (see below). Though today it seems I've been lucky, it (maybe) only happened once (or never today). # Thinking about your question more ... I believe your previous debug logs # were created with debug flag 0x43fff. For this iteration, please use # debug flag 0x143fff. If you insist on a test using .33.3, I will do so but that will have to wait. Meanwhile I used this patch for .34 to fix the build err from your dbg patch: drivers/net/wireless/iwlwifi/iwl-agn-rs.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) Index: linux/drivers/net/wireless/iwlwifi/iwl-agn-rs.c =================================================================== --- linux.orig/drivers/net/wireless/iwlwifi/iwl-agn-rs.c 2010-05-31 13:21:48.000000000 +0200 +++ linux/drivers/net/wireless/iwlwifi/iwl-agn-rs.c 2010-05-31 13:25:52.000000000 +0200 @@ -365,7 +365,8 @@ * packets. */ static int rs_collect_tx_data(struct iwl_scale_tbl_info *tbl, - int scale_index, int attempts, int successes) + int scale_index, int attempts, int successes, + struct iwl_priv *priv) { struct iwl_rate_scale_data *window = NULL; static const u64 mask = (((u64)1) << (IWL_RATE_MAX_WINDOW - 1)); @@ -868,7 +869,7 @@ &rs_index); rs_collect_tx_data(curr_tbl, rs_index, info->status.ampdu_ack_len, - info->status.ampdu_ack_map); + info->status.ampdu_ack_map, priv); /* Update success/fail counts if not searching for new mode */ if (lq_sta->stay_in_tbl) { @@ -902,7 +903,7 @@ else continue; rs_collect_tx_data(tmp_tbl, rs_index, 1, - i < retries ? 0 : legacy_success); + i < retries ? 0 : legacy_success, priv); } /* Update success/fail counts if not searching for new mode */ Test conducted using debug flag 0x143fff. Test on Site B resulted immediately in a hard crash upon resume, that is I got to X, activated wireless, echoed 0x143fff to the sysfile and that was it. But I didn't count on that one so I had no console to watch for it. So right now I got no clue what caused the crash, there's nothing in the logs, of course.. Next thing was to redo test on site B, but this time I switched to console beforehand. Sure enough, this time nothing happened.. Log appended as bz2. Hopefully lines like the following don't indicate essential info getting dropped: May 31 17:23:17 localhost kernel: [91800.091565] net_ratelimit: 70 callbacks suppressed This line indicates the first timestamp _after_ the crash: May 31 17:35:19 localhost kernel: [ 69.488456] The crash happened after site A and on site B. Just arrived, opened lid and *crash*. I noticed in iwl-agn-rs.c:2080: BUG_ON(window->average_tpt != ((window->success_ratio * tbl->expected_tpt[index] + 64) / 128)); Could that be again the point that hit me today when the machine crashed once? Would you mind changing this into a milder WARN? That way I wouldn't hit the wall that hard. And I would notice it anyway while skimming the logs as we still are on the hunt. It's more maintainable if it's a WARN in the src instead of me patching it w/ any update.. Wasn't this BUG_ON a WARNING in .33.3? (didn't check..) The dbg log contains all types of the errs happening here: - "deauthenticated" msgs w/ "reason 2" - "request scan called when driver not ready" - iwl_tx_agg_start - of course the "expected_tpt should have been[..]" don't show up anymore, the source has no more WARN regarding this.. And the rest of the 9Mb dbg log.. Could you tell me a bit about your idea of how to track those down? Maybe we can speed things a little up. The logging and testing stuff takes a lot of time and most of it I have no clue why that might help or what the goal is.. Are you online in some #chan? # > # Regarding (2): This is a common issue in busy environments where AP # > # decides to deathenticate station after it does not receive an ack for # > # data sent after a few retries. Was this test done in busy environment? # > Both. This happens in busy environment as well as in an idle one. Can't tell # > yet whether there're more of those msgs in busy env. I start to feel against # > Cisco APs.. # # I don't know ... perhaps these APs have been set up to be strict wrt # delays. Sure. May well be.. For sure I'm no fan of this config/policy.. But wait. I noticed no such delays or disassociations using another notebook. Well, seems I got to investigate in this, too. # > # Regarding (3): Seems like driver is getting a request to scan after a # > # request to remove interface. I am still inquiring about this. # > Probably due to me switching of via RF_KILLSWITCH. But anyway I assume this # > msg should not happen.. # # Absolutely. What are the exact steps you run when you encounter this # issue? Nothing particular. I.e. after the tests conducted toggle the hw kill switch. Then a script gets called via acpi callback that in turn 1) kills the respective dhclient3, 2) terminates wpa_supplicant and 3) removes the wifi modules (for power saving). Cheers, Nils
From: reinette chatre on 8 Jun 2010 13:50 On Fri, 2010-06-04 at 09:57 -0700, Nils Radtke wrote: > I haven't yet received a comment of yours regarding my many other questions in > my previous message. I am willing to help investigate more, assist in other ways > than testing only (always only doing testing isn't a way to keep up fun..) Your messages contain references to many issues and it is becoming increasingly hard to keep track of them all in a single email thread. Since the system crash is clearly the big issue I would like to focus on that and get that resolved. This is why I proposed that you create bug reports to help track your various issues better. Reinette -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Nils Radtke on 10 Jun 2010 10:30 Hi Reinette, Thanks for your message. Yes, you're right about the multiple bugs one thread thing. Just today I got registered w/ the wireless ml because the system just did not send me a registration message. For the bug reports to be created it will take me some time. I'll firstly report the main issue, the 2 other ones afterwards. Would it be ok cross referencing i.e. to the log and such between the reports? Should I paste all the mail messages in separate report messages (belonging to one bug report, of course) or should I paste some links to the thread? Cheers, Nils @John: Yes, you're right but the 2.6.33.4 tree which for me still has the bug_on in. On Tue 2010-06-08 @ 10-46-29AM -0700, reinette chatre wrote: # On Fri, 2010-06-04 at 09:57 -0700, Nils Radtke wrote: # > I haven't yet received a comment of yours regarding my many other questions in # > my previous message. I am willing to help investigate more, assist in other ways # > than testing only (always only doing testing isn't a way to keep up fun..) # # Your messages contain references to many issues and it is becoming # increasingly hard to keep track of them all in a single email thread. # Since the system crash is clearly the big issue I would like to focus on # that and get that resolved. This is why I proposed that you create bug # reports to help track your various issues better. # # Reinette # # -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: reinette chatre on 10 Jun 2010 12:20
On Thu, 2010-06-10 at 07:22 -0700, Nils Radtke wrote: > For the bug reports to be created it will take me some time. > I'll firstly report the main issue, the 2 other ones afterwards. Sounds great. Thanks > Would it be ok cross referencing i.e. to the log and such > between the reports? > Should I paste all the mail messages in separate report messages > (belonging to one bug report, of course) or should I paste some > links to the thread? I find it most convenient if all information related to the bug is contained in the bug report. Links can be used. Reinette -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |