From owner-freebsd-wireless@freebsd.org Thu Jan 26 15:20:25 2017 Return-Path: Delivered-To: freebsd-wireless@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2D403CC1C0C for ; Thu, 26 Jan 2017 15:20:25 +0000 (UTC) (envelope-from fbsd@opal.com) Received: from mail.opal.com (opalcom-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:113d::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "opal.com", Issuer "OpalCA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 05108E73 for ; Thu, 26 Jan 2017 15:20:24 +0000 (UTC) (envelope-from fbsd@opal.com) Received: from shibato (shibato.opal.com [IPv6:2001:470:8cb8:3:721a:4ff:fe77:9dff]) (authenticated bits=0) by mail.opal.com (8.15.2/8.15.2) with ESMTPSA id v0QFKNAm000975 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Thu, 26 Jan 2017 10:20:23 -0500 (EST) (envelope-from fbsd@opal.com) Date: Thu, 26 Jan 2017 10:20:17 -0500 From: "J.R. Oldroyd" To: Adrian Chadd Cc: "freebsd-wireless@freebsd.org" Subject: Re: Boot freeze 11.0p3 during network initialization Message-ID: <20170126102017.26e9a3eb@shibato> In-Reply-To: References: <20161208095719.30f3c60e@shibato> <20161208171926.7e182754@shibato> <20161220111808.5c277e21@shibato> <20161223143741.0cad961e@shibato> <20161227173012.1feb0c2f@shibato> X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.29; amd64-portbld-freebsd11.0) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/JHMjNF2HLUdgcpHdULXA/3d"; protocol="application/pgp-signature" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mail.opal.com [IPv6:2001:470:8cb8:2::1]); Thu, 26 Jan 2017 10:20:23 -0500 (EST) X-BeenThere: freebsd-wireless@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of 802.11 stack, tools device driver development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jan 2017 15:20:25 -0000 --Sig_/JHMjNF2HLUdgcpHdULXA/3d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sorry for the time gap, I had to deal with family matters. OK, I patched if_lagg.c to drop and re-acquire the lock around the call to init the underlying driver. I've been running this for some weeks now and haven't seen the boot-hang since. Hopefully I have tested long enough. Someone more familiar with this driver and use of this lock there should review this patch and comment. -jr Index: sys/net/if_lagg.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/net/if_lagg.c (revision 307319) +++ sys/net/if_lagg.c (working copy) @@ -995,6 +995,21 @@ LAGG_RUNLOCK(sc, &tracker); break; =20 + case SIOCADDMULTI: + case SIOCDELMULTI: + /* + * Drivers like if_re.c cause a LOR on WLOCK, so we must + * drop and re-aquire the lock around the call. + */ + if (lp->lp_ioctl =3D=3D NULL) { + error =3D EINVAL; + break; + } + LAGG_WUNLOCK(sc); + error =3D (*lp->lp_ioctl)(ifp, cmd, data); + LAGG_WLOCK(sc); + break; + case SIOCSIFCAP: if (lp->lp_ioctl =3D=3D NULL) { error =3D EINVAL; On Wed, 28 Dec 2016 00:24:09 -0800 Adrian Chadd wr= ote: > > hi, >=20 > yes, the LOR is why the boot hang occurs :( >=20 >=20 >=20 > -a >=20 >=20 > On 27 December 2016 at 14:30, J.R. Oldroyd wrote: > > Sorry, Adrian, I'm missing the back-story here and I'm not that > > familiar with the lagg code. > > > > Are you saying that this LOR is likely relevant to this boot hang, > > or are you saying that this is a known problem that's not relevant? > > > > Jan Kokem=C3=BCller posted some lagg patches. I don't know if they are > > likely applicable to this problem, but I could try those. > > > > https://reviews.freebsd.org/D6845 > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211689#c4 > > > > The first removes an RLOCK, but not the one referenced in the LOR > > report. The second is a patch for the ath/iwm panic. If you're > > unfamiliar with them, I will study up on this code and patches > > to get up to speed on it. > > > > -jr > > > > > > On Fri, 23 Dec 2016 11:41:33 -0800 Adrian Chadd wrote: =20 > >> > >> Right, that's the known lock order issue with lagg. :( > >> > >> > >> -adrian > >> > >> > >> On 23 December 2016 at 11:37, J.R. Oldroyd wrote: =20 > >> > On Fri, 23 Dec 2016 10:17:34 -0800 Adrian Chadd wrote: =20 > >> >> > >> >> On 20 December 2016 at 08:18, J.R. Oldroyd wrote: = =20 > >> >> > On Thu, 8 Dec 2016 17:19:26 -0500 "J.R. Oldroyd" = wrote: =20 > >> >> >> > >> >> >> On Thu, 08 Dec 2016 21:29:32 +0200 "Andriy Voskoboinyk" wrote: =20 > >> >> >> > > >> >> >> > Thu, 08 Dec 2016 16:57:19 +0200 =D0=B1=D1=83=D0=BB=D0=BE =D0= =BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BD=D0=BE J.R. Oldroyd : > >> >> >> > > >> >> >> > Is there any additional output with > >> >> >> > wlandebug_wlan0=3D"scan+state+auth+assoc" > >> >> >> > in /etc/rc.conf ? > >> >> >> > =20 > >> >> >> > >> >> >> I have put that in and rebooted several times, all times OK. > >> >> >> I will report back again in due course when it next hangs. > >> >> >> > >> >> >> -jr > >> >> >> =20 > >> >> > > >> >> > The boot hang occurred again today. I noted the point of the han= g and > >> >> > rebooted; the log from the good boot with annotation of the previ= ous hang > >> >> > point is here [1]. > >> >> > > >> >> > -jr > >> >> > > >> >> > [1] http://opal.com/jr/freebsd/20161220-fbsd11.3-boot_hang_wlan_d= ebug.txt > >> >> > _______________________________________________ > >> >> > freebsd-wireless@freebsd.org mailing list > >> >> > https://lists.freebsd.org/mailman/listinfo/freebsd-wireless > >> >> > To unsubscribe, send any mail to "freebsd-wireless-unsubscribe@fr= eebsd.org" =20 > >> >> > >> >> > >> >> can you compile with witness and invariants? I'd like to see if its > >> >> locking related. > >> >> > >> >> thanks > >> >> > >> >> > >> >> -adrian > >> >> > >> >> =20 > >> > > >> > Hmm, maybe: > >> > > >> > Dec 23 14:30:34 shibato kernel: wlan0: ieee80211_swscan_add_scan: ch= an 11g min dwell met (2146895553 > 2146895553) > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_mindwell: called > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: loop start= ; scandone=3D0 > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: chan 11g = -> 7g [active, dwell min 20ms max 200ms] > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan: calling; maxdwe= ll=3D200 > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: waiting > >> > Dec 23 14:30:34 shibato kernel: re0: link state changed to UP > >> > Dec 23 14:30:34 shibato kernel: lagg0: link state changed to UP > >> > Dec 23 14:30:34 shibato kernel: lock order reversal: > >> > Dec 23 14:30:34 shibato kernel: 1st 0xfffff800095d2208 if_lagg rmloc= k (if_lagg rmlock) @ /usr/src/sys/modules/if_lagg/../../net/if_lagg.c:1530 > >> > Dec 23 14:30:34 shibato kernel: 2nd 0xfffffe0000e10218 re0 (network = driver) @ dev/re/if_re.c:3433 > >> > Dec 23 14:30:34 shibato kernel: stack backtrace: > >> > Dec 23 14:30:34 shibato kernel: #0 0xffffffff80a98b60 at witness_deb= ugger+0x70 > >> > Dec 23 14:30:34 shibato kernel: #1 0xffffffff80a98a54 at witness_che= ckorder+0xe54 > >> > Dec 23 14:30:34 shibato kernel: #2 0xffffffff80a1c794 at __mtx_lock_= flags+0xa4 > >> > Dec 23 14:30:34 shibato kernel: #3 0xffffffff8078c279 at re_ioctl+0x= 3a9 > >> > Dec 23 14:30:34 shibato kernel: #4 0xffffffff8222428e at lagg_port_i= octl+0xde > >> > Dec 23 14:30:34 shibato kernel: #5 0xffffffff80b20bbf at if_addmulti= +0x39f > >> > Dec 23 14:30:34 shibato kernel: #6 0xffffffff82224708 at lagg_ether_= cmdmulti+0x158 > >> > Dec 23 14:30:34 shibato kernel: #7 0xffffffff822219dd at lagg_ioctl+= 0xdd > >> > Dec 23 14:30:34 shibato kernel: #8 0xffffffff80b20bbf at if_addmulti= +0x39f > >> > Dec 23 14:30:34 shibato kernel: #9 0xffffffff80c35a97 at in6_mc_join= _locked+0x1d7 > >> > Dec 23 14:30:34 shibato kernel: #10 0xffffffff80c35715 at in6_joingr= oup+0x75 > >> > Dec 23 14:30:34 shibato kernel: #11 0xffffffff80c2f9e9 at in6_update= _ifa+0x1339 > >> > Dec 23 14:30:34 shibato kernel: #12 0xffffffff80c33eb3 at in6_ifatta= ch+0x413 > >> > Dec 23 14:30:34 shibato kernel: #13 0xffffffff80b1fd84 at ifioctl+0x= fe4 > >> > Dec 23 14:30:34 shibato kernel: #14 0xffffffff80a9d946 at kern_ioctl= +0x246 > >> > Dec 23 14:30:34 shibato kernel: #15 0xffffffff80a9d691 at sys_ioctl+= 0x171 > >> > Dec 23 14:30:34 shibato kernel: #16 0xffffffff80e9d40b at amd64_sysc= all+0x2db > >> > Dec 23 14:30:34 shibato kernel: #17 0xffffffff80e7d8ab at Xfast_sysc= all+0xfb > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: loop start= ; scandone=3D0 > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: chan 7g = -> 36a [active, dwell min 20ms max 200ms] > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan: calling; maxdwe= ll=3D200 > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: waiting > >> > > >> > This boot then continued normally, no hang. > >> > > >> > -jr =20 > > =20 --Sig_/JHMjNF2HLUdgcpHdULXA/3d Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAliKE7EACgkQls33urr0k4mW5gCdEMiwnbcF+cszL3i4Y8E/Lcrq kXAAn3rG2U4frXQLn8hrFIsdfW+BDVV4 =04vD -----END PGP SIGNATURE----- --Sig_/JHMjNF2HLUdgcpHdULXA/3d--