From cperciva@xps.daemonology.net Sun Aug 14 17:22:26 2011 Return-Path: Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35]) by hub.freebsd.org (Postfix) with ESMTP id 4FFC8106564A for ; Sun, 14 Aug 2011 17:22:26 +0000 (UTC) (envelope-from cperciva@xps.daemonology.net) Received: from xps.daemonology.net (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx2.freebsd.org (Postfix) with SMTP id 16CE314EAC5 for ; Sun, 14 Aug 2011 17:22:26 +0000 (UTC) Received: (qmail 16997 invoked by uid 1001); 14 Aug 2011 17:22:25 -0000 Message-Id: <20110814172225.16996.qmail@xps.daemonology.net> Date: 14 Aug 2011 17:22:25 -0000 From: Colin Percival Reply-To: Colin Percival To: FreeBSD-gnats-submit@freebsd.org Cc: Subject: [iwn] [panic] iwn_auth being called with wrong ni_chan X-Send-Pr-Version: 3.113 X-GNATS-Notify: >Number: 159768 >Category: kern >Synopsis: [iwn] [panic] iwn_auth being called with wrong ni_chan >Confidential: no >Severity: serious >Priority: medium >Responsible: bschmidt >State: closed >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Aug 14 17:30:09 UTC 2011 >Closed-Date: Sat Jan 28 10:31:10 UTC 2012 >Last-Modified: Sat Jan 28 10:31:10 UTC 2012 >Originator: Colin Percival >Release: FreeBSD 9.0-BETA1 amd64 >Organization: >Environment: Dell Latitude E5420, containing an Intel Centrino Advanced-N 6205 9.0-BETA1, amd64, GENERIC minus debug options. iwn0 is wlan0; wlan0 is running with wpa_supplicant. >Description: panic reading 0xffff. kgdb shows iwn0 taskq at line 5294 of if_iwn.c: if (IEEE80211_IS_CHAN_2GHZ(ni->ni_chan)) and ni_chan is 0xffff (aka. IEEE80211_CHAN_ANY). The last entry in dmesg before the panic is wlan0: ieee80211_new_state_locked: pending INIT -> AUTH transition lost which may explain how we got into iwn_auth without having ni_chan changed from IEEE80211_CHAN_ANY to a valid channel. >How-To-Repeat: Only happened once so far. My laptop was sitting at the edge of the access point's range and wlan0 lost its connection and re-connected 239 times before the panic; so this might be a disconnection/reconnection race condition of some sort. I can provide the crashdump (or load it up in kgdb here) if there's any more information to be gleaned from it. >Fix: >Release-Note: >Audit-Trail: Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Sun Aug 14 21:00:27 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=159768 Responsible-Changed-From-To: freebsd-net->bschmidt Responsible-Changed-By: bschmidt Responsible-Changed-When: Mon Aug 15 07:27:51 UTC 2011 Responsible-Changed-Why: grab http://www.freebsd.org/cgi/query-pr.cgi?pr=159768 From: Bernhard Schmidt To: bug-followup@freebsd.org, cperciva@freebsd.org Cc: Subject: Re: kern/159768: [iwn] [panic] iwn_auth being called with wrong ni_chan Date: Mon, 15 Aug 2011 09:31:44 +0200 --Boundary-01=_htMSOui1b3QOFSo Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, > I can provide the crashdump (or load it up in kgdb here) if there's > any more information to be gleaned from it. Can you provide a 'ps' from the core? I'm especially looking for the number of wpa_supplicant instances running at that time. -- Bernhard --Boundary-01=_htMSOui1b3QOFSo Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: 7bit

Hi,


> I can provide the crashdump (or load it up in kgdb here) if there's
> any more information to be gleaned from it.


Can you provide a 'ps' from the core? I'm especially looking for

the number of wpa_supplicant instances running at that time.


--

Bernhard

--Boundary-01=_htMSOui1b3QOFSo-- From: Colin Percival To: bschmidt@freebsd.org Cc: bug-followup@freebsd.org Subject: Re: kern/159768: [iwn] [panic] iwn_auth being called with wrong ni_chan Date: Mon, 15 Aug 2011 01:10:13 -0700 On 08/15/11 00:31, Bernhard Schmidt wrote: >> I can provide the crashdump (or load it up in kgdb here) if there's >> any more information to be gleaned from it. > > Can you provide a 'ps' from the core? I'm especially looking for > the number of wpa_supplicant instances running at that time. Only one wpa_supplicant process. It, devd, and syslogd are all marked as runnable, so presumably something just happened to wake them up. (Do you still want the ps output, or does that answer your question?) -- Colin Percival Security Officer, FreeBSD | freebsd.org | The power to serve Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid From: Bernhard Schmidt To: Colin Percival Cc: bug-followup@freebsd.org Subject: Re: kern/159768: [iwn] [panic] iwn_auth being called with wrong ni_chan Date: Mon, 15 Aug 2011 10:55:53 +0200 --Boundary-01=_Z8NSOti5rRRXNBK Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Monday, August 15, 2011 10:10:13 Colin Percival wrote: > On 08/15/11 00:31, Bernhard Schmidt wrote: > >> I can provide the crashdump (or load it up in kgdb here) if there's > >> any more information to be gleaned from it. > > > > Can you provide a 'ps' from the core? I'm especially looking for > > the number of wpa_supplicant instances running at that time. > > Only one wpa_supplicant process. It, devd, and syslogd are all marked as > runnable, so presumably something just happened to wake them up. > > (Do you still want the ps output, or does that answer your question?) Ok, thanks. Just wanted to know if wpa_supplicant is to blame, there is a known issue which might have caused this. That seems not be the case here. Hm.. have to figure out way to reproduce that actually. Can't think of a scenario where a INIT -> AUTH transition might get lost. If you say it re-connected 239 times, which message did you use to determine that? -- Bernhard --Boundary-01=_Z8NSOti5rRRXNBK Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: 7bit

On Monday, August 15, 2011 10:10:13 Colin Percival wrote:

> On 08/15/11 00:31, Bernhard Schmidt wrote:

> >> I can provide the crashdump (or load it up in kgdb here) if there's

> >> any more information to be gleaned from it.

> >

> > Can you provide a 'ps' from the core? I'm especially looking for

> > the number of wpa_supplicant instances running at that time.

>

> Only one wpa_supplicant process. It, devd, and syslogd are all marked as

> runnable, so presumably something just happened to wake them up.

>

> (Do you still want the ps output, or does that answer your question?)


Ok, thanks. Just wanted to know if wpa_supplicant is to blame, there

is a known issue which might have caused this. That seems not be the

case here. Hm.. have to figure out way to reproduce that actually.

Can't think of a scenario where a INIT -> AUTH transition might get

lost.


If you say it re-connected 239 times, which message did you use to

determine that?


--

Bernhard

--Boundary-01=_Z8NSOti5rRRXNBK-- From: Colin Percival To: bschmidt@freebsd.org Cc: bug-followup@freebsd.org Subject: Re: kern/159768: [iwn] [panic] iwn_auth being called with wrong ni_chan Date: Mon, 15 Aug 2011 02:01:53 -0700 On 08/15/11 01:55, Bernhard Schmidt wrote: > Ok, thanks. Just wanted to know if wpa_supplicant is to blame, there > is a known issue which might have caused this. That seems not be the > case here. Hm.. have to figure out way to reproduce that actually. > Can't think of a scenario where a INIT -> AUTH transition might get > lost. > > If you say it re-connected 239 times, which message did you use to > determine that? That was based on 'link state changed to UP' messages. -- Colin Percival Security Officer, FreeBSD | freebsd.org | The power to serve Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid From: Colin Percival To: bschmidt@freebsd.org Cc: bug-followup@freebsd.org Subject: Re: kern/159768: [iwn] [panic] iwn_auth being called with wrong ni_chan Date: Sun, 04 Sep 2011 21:16:10 -0700 I have another iwn panic, which I'm guessing might be related: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80517620 stack pointer = 0x28:0xffffff8000257360 frame pointer = 0x28:0xffffff8000257580 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi4: clock) trap number = 12 panic: page fault The relevant bits of backtrace are: softclock -> tcp_timer_rexmt -> if_transmit -> lagg_start -> if_transmit -> ieee80211_start -> if_transmit -> iwn_start -> iwn_start_locked -> iwn_tx_data. The panic occurs on line 3321 of if_iwn.c 1.115, due to txa_private being NULL. The last lines printed to the console before the panic -- late enough that they didn't get written to disk by syslog -- are: wpa_supplicant[470]: CTRL-EVENT-DISCONNECTED bssid=94:44:52:e2:aa:4a reason=0 kernel: wlan0: link state changed to DOWN kernel: lagg0: link state changed to DOWN wpa_supplicant[470]: Associated with 94:44:52:e2:aa:4a kernel: wlan0: link state changed to UP kernel: lagg0: link state changed to UP so it seems very likely that there's something going wrong in the interface start code (and possibly a race condition -- tcp timers are good at hitting those...) -- Colin Percival Security Officer, FreeBSD | freebsd.org | The power to serve Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid From: Bernhard Schmidt To: Colin Percival Cc: bug-followup@freebsd.org Subject: Re: kern/159768: [iwn] [panic] iwn_auth being called with wrong ni_chan Date: Thu, 13 Oct 2011 15:08:34 +0200 --Boundary-00=_TLulOnwAZb9JeWF Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit On Monday 05 September 2011 06:16:10 Colin Percival wrote: > I have another iwn panic, which I'm guessing might be related: > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x0 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff80517620 > stack pointer = 0x28:0xffffff8000257360 > frame pointer = 0x28:0xffffff8000257580 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 12 (swi4: clock) > trap number = 12 > panic: page fault > > The relevant bits of backtrace are: > softclock -> tcp_timer_rexmt -> if_transmit -> lagg_start -> if_transmit -> > ieee80211_start -> if_transmit -> iwn_start -> iwn_start_locked -> iwn_tx_data. > > The panic occurs on line 3321 of if_iwn.c 1.115, due to txa_private being NULL. > > The last lines printed to the console before the panic -- late enough that they > didn't get written to disk by syslog -- are: > > wpa_supplicant[470]: CTRL-EVENT-DISCONNECTED bssid=94:44:52:e2:aa:4a reason=0 > kernel: wlan0: link state changed to DOWN > kernel: lagg0: link state changed to DOWN > wpa_supplicant[470]: Associated with 94:44:52:e2:aa:4a > kernel: wlan0: link state changed to UP > kernel: lagg0: link state changed to UP > > so it seems very likely that there's something going wrong in the interface > start code (and possibly a race condition -- tcp timers are good at hitting > those...) Been able to reproduce the panic with an added hack (iwn_bmiss.diff) to simulate a beacon miss scenario. Once in RUN state and an AMPDU TX session has been started (check debug output of wlandebug +action), doing for x in $(seq 1 100); do sysctl dev.iwn.0.bmiss=1 sleep 0.01 done while trigger it. Attached patch (iwn_addba_stop.diff) fixed the issue for me. Wanna give it a shot? Thanks -- Bernhard --Boundary-00=_TLulOnwAZb9JeWF Content-Type: text/x-patch; charset="ISO-8859-1"; name="iwn_addba_stop.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="iwn_addba_stop.diff" Index: if_iwn.c =================================================================== --- if_iwn.c (revision 226336) +++ if_iwn.c (working copy) @@ -5641,6 +5641,7 @@ iwn_ampdu_tx_stop(struct ieee80211_node *ni, struc sc->qid2tap[qid] = NULL; free(tap->txa_private, M_DEVBUF); tap->txa_private = NULL; + sc->sc_addba_stop(ni, tap); } static void --Boundary-00=_TLulOnwAZb9JeWF Content-Type: text/x-patch; charset="ISO-8859-1"; name="iwn_bmiss.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="iwn_bmiss.diff" Index: if_iwnvar.h =================================================================== --- if_iwnvar.h (revision 226336) +++ if_iwnvar.h (working copy) @@ -200,6 +200,7 @@ struct iwn_softc { struct ifnet *sc_ifp; int sc_debug; + int sc_bmiss; struct mtx sc_mtx; Index: if_iwn.c =================================================================== --- if_iwn.c (revision 226336) +++ if_iwn.c (working copy) @@ -841,6 +841,10 @@ iwn_sysctlattach(struct iwn_softc *sc) SYSCTL_ADD_INT(ctx, SYSCTL_CHILDREN(tree), OID_AUTO, "debug", CTLFLAG_RW, &sc->sc_debug, 0, "control debugging printfs"); #endif + + sc->sc_bmiss = 0; + SYSCTL_ADD_INT(ctx, SYSCTL_CHILDREN(tree), OID_AUTO, + "bmiss", CTLFLAG_RW, &sc->sc_bmiss, 0, "simulate beacon miss"); } static struct ieee80211vap * @@ -3138,6 +3148,20 @@ iwn_intr(void *arg) /* Disable interrupts. */ IWN_WRITE(sc, IWN_INT_MASK, 0); + if (sc->sc_bmiss) { + struct ieee80211com *ic = ifp->if_l2com; + struct ieee80211vap *vap = TAILQ_FIRST(&ic->ic_vaps); + + sc->sc_bmiss = 0; + + if (vap->iv_state == IEEE80211_S_RUN && + (ic->ic_flags & IEEE80211_F_SCAN) == 0) { + IWN_UNLOCK(sc); + ieee80211_beacon_miss(ic); + IWN_LOCK(sc); + } + } + /* Read interrupts from ICT (fast) or from registers (slow). */ if (sc->sc_flags & IWN_FLAG_USE_ICT) { tmp = 0; --Boundary-00=_TLulOnwAZb9JeWF-- From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/159768: commit references a PR Date: Mon, 24 Oct 2011 07:37:16 +0000 (UTC) Author: bschmidt Date: Mon Oct 24 07:37:01 2011 New Revision: 226679 URL: http://svn.freebsd.org/changeset/base/226679 Log: Let net80211 also know about stopped BA sessions. This fixes some issues where the driver assumed that BA resources are still available due to net80211 saying so. PR: 161407, 159768 Tested by: cperciva, rene MFC after: 3 days Modified: head/sys/dev/iwn/if_iwn.c Modified: head/sys/dev/iwn/if_iwn.c ============================================================================== --- head/sys/dev/iwn/if_iwn.c Mon Oct 24 05:26:40 2011 (r226678) +++ head/sys/dev/iwn/if_iwn.c Mon Oct 24 07:37:01 2011 (r226679) @@ -5641,6 +5641,7 @@ iwn_ampdu_tx_stop(struct ieee80211_node sc->qid2tap[qid] = NULL; free(tap->txa_private, M_DEVBUF); tap->txa_private = NULL; + sc->sc_addba_stop(ni, tap); } static void _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/159768: commit references a PR Date: Wed, 16 Nov 2011 17:40:17 +0000 (UTC) Author: bschmidt Date: Wed Nov 16 17:39:59 2011 New Revision: 227570 URL: http://svn.freebsd.org/changeset/base/227570 Log: MFC r226679: Let net80211 also know about stopped BA sessions. This fixes some issues where the driver assumed that BA resources are still available due to net80211 saying so. PR: 161407, 159768 Tested by: cperciva, rene Approved by: re (kib) Modified: stable/9/sys/dev/iwn/if_iwn.c Directory Properties: stable/9/sys/ (props changed) Modified: stable/9/sys/dev/iwn/if_iwn.c ============================================================================== --- stable/9/sys/dev/iwn/if_iwn.c Wed Nov 16 17:11:13 2011 (r227569) +++ stable/9/sys/dev/iwn/if_iwn.c Wed Nov 16 17:39:59 2011 (r227570) @@ -5641,6 +5641,7 @@ iwn_ampdu_tx_stop(struct ieee80211_node sc->qid2tap[qid] = NULL; free(tap->txa_private, M_DEVBUF); tap->txa_private = NULL; + sc->sc_addba_stop(ni, tap); } static void _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/159768: commit references a PR Date: Wed, 16 Nov 2011 17:41:40 +0000 (UTC) Author: bschmidt Date: Wed Nov 16 17:41:31 2011 New Revision: 227571 URL: http://svn.freebsd.org/changeset/base/227571 Log: MFC r226679: Let net80211 also know about stopped BA sessions. This fixes some issues where the driver assumed that BA resources are still available due to net80211 saying so. PR: 161407, 159768 Tested by: cperciva, rene Approved by: re (kib) Modified: releng/9.0/sys/dev/iwn/if_iwn.c Directory Properties: releng/9.0/sys/ (props changed) Modified: releng/9.0/sys/dev/iwn/if_iwn.c ============================================================================== --- releng/9.0/sys/dev/iwn/if_iwn.c Wed Nov 16 17:39:59 2011 (r227570) +++ releng/9.0/sys/dev/iwn/if_iwn.c Wed Nov 16 17:41:31 2011 (r227571) @@ -5641,6 +5641,7 @@ iwn_ampdu_tx_stop(struct ieee80211_node sc->qid2tap[qid] = NULL; free(tap->txa_private, M_DEVBUF); tap->txa_private = NULL; + sc->sc_addba_stop(ni, tap); } static void _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" State-Changed-From-To: open->closed State-Changed-By: bschmidt State-Changed-When: Sat Jan 28 10:30:18 UTC 2012 State-Changed-Why: Fix committed and MFCed. http://www.freebsd.org/cgi/query-pr.cgi?pr=159768 >Unformatted: