Date: Mon, 09 Jun 2008 11:52:35 -0400 From: Coleman Kane <cokane@FreeBSD.org> To: "Paul B. Mahol" <onemda@gmail.com> Cc: freebsd-current@freebsd.org, thompsa@FreeBSD.org, jhb@FreeBSD.org Subject: Re: Call for Testers: Convert watchdog spinlock into a sleepable mutex in if_ndis Message-ID: <1213026755.1628.25.camel@localhost> In-Reply-To: <1213013020.25641.3.camel@localhost> References: <1212870158.1724.25.camel@localhost> <3a142e750806071855wb9f4cddk9be25e8c0f3d4984@mail.gmail.com> <1213013020.25641.3.camel@localhost>
next in thread | previous in thread | raw e-mail | index | archive | help
--=-CPStII3j/BeupPEhpxKt Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Mon, 2008-06-09 at 08:03 -0400, Coleman Kane wrote: > On Sun, 2008-06-08 at 03:55 +0200, Paul B. Mahol wrote: > > On 6/7/08, Coleman Kane <cokane@freebsd.org> wrote: > > > Hi, > > > > > > I've got another patch to if_ndis and I'd like to know if any NDIS us= ers > > > out there can test it for me and give me some feedback. I've converte= d > > > the ndis_spinlock in if_ndis into a normal sleepable mutex, and would > > > like to get some testers. The idea here is to eliminate an unnecessar= y > > > "spinning" in the kernel, where it would be preferable to sleep (for = CPU > > > usage, interactivity, etc...). > > > > > > You can download it here: > > > * > > > http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx.pat= ch > > > > > > -- > > > Coleman Kane > > > > >=20 > > I did not found any stability regressions in my environment with linked= patch. > >=20 > > But I noticed new LOR (it appears as soon as radio of device is tur= ned on): > >=20 > > lock order reversal: > > 1st 0xc401726c ndis0 (network driver) @ > > /usr/local/src/sys/modules/if_ndis/../../dev/if_ndis/if_ndis.c:1648 > > 2nd 0xc09b3e74 HAL preemption lock (HAL lock) @ > > /usr/local/src/sys/modules/ndis/../../compat/ndis/subr_hal.c:423 > > KDB: stack backtrace: > > db_trace_self_wrapper(c0708534,c3985bcc,c0541a6e,c070adf6,c09b3e74,...) > > at db_trace_self_wrapper+0x26 > > kdb_backtrace(c070adf6,c09b3e74,c09b0a51,c09b0a48,c09b09f8,...) at > > kdb_backtrace+0x29 > > witness_checkorder(c09b3e74,9,c09b09f8,1a7,c04f7364,...) at > > witness_checkorder+0x6de > > _mtx_lock_flags(c09b3e74,0,c09b09f8,1a7,c457ebd0,...) at _mtx_lock_flag= s+0xbc > > KfRaiseIrql(2,c457ebd0,c3985c48,c09aacdd,c3d45354,...) at KfRaiseIrql+0= x63 > > KfAcquireSpinLock(c3d45354,c094888a,c4017200,c3d6cc40,c4017200,...) at > > KfAcquireSpinLock+0x13 > > IoQueueWorkItem(c457ebd0,c3d6cc40,0,c4017200,c07040ef,...) at > > IoQueueWorkItem+0x2d > > ndis_tick(c4017200,0,c070699c,166,c077e594,...) at ndis_tick+0x143 > > softclock(c077e560,0,c0701ca6,4d4,c3cda6ec,...) at softclock+0x24a > > ithread_loop(c3cdb2e0,c3985d38,c0701a09,324,c3c99a70,...) at ithread_lo= op+0x1b5 > > fork_exit(c04e74f0,c3cdb2e0,c3985d38) at fork_exit+0xb8 > > fork_trampoline() at fork_trampoline+0x8 > > --- trap 0, eip =3D 0, esp =3D 0xc3985d70, ebp =3D 0 --- >=20 > Try this one: > * http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx= 2.patch >=20 > There's the same LOR in the existing code as well, but it is hidden by > WITNESS_SKIPSPIN. >=20 > If you can verify that this works, I'll just commit them both as a > single update. >=20 Paul, Ignore the previous patch (#2) and try this one instead: * http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx3.= patch I analyzed the code a bit more and decided to push the mtx acquisition so that it is inside the test for the TX-watchdog case. This should result in having less traffic on that mutex under normal running conditions. In this case, the mutex is only locked inside the ndis_ticktask, and also inside the TX-watchdog handling code. In general, when neither of these cases are true and the only action is a simple re-schedule of the callout, the lock never gets acquired. thompsa/jhb: What do you think of these last tweaks? I did some analysis and I don't think that the ndis_hang_timer (rw), nmb_checkforhangsecs (ro), and ndis_tx_timer (rw) are themselves subject to any races in the current design (the rw/ro indicate their use within the unlocked ndis_tick context). Furthermore, as callout_drain(9) is the only method acting upon the callout in any multithreaded context, it should be safe to perform callout_reset(9) on the unlocked structure. I also did some more research (just now) and it looks like ndis_start (called by ndis_starttask) and ndis_reset_nic (in sys/compat/ndis/kern_ndis.c, called by ndis_resettask) both perform their own locking. It is looking to me like we may not even need to acquire NDIS_LOCK() from inside ndis_tick() at all, but instead just use the existing locking in the task-specific functions that it calls in wd/timeout cases. --=20 Coleman Kane --=-CPStII3j/BeupPEhpxKt Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEABECAAYFAkhNUbwACgkQcMSxQcXat5doZACfd8KgxspwVDZ9WZVxLxgl2uDh UW0An0ZB+jYQ6mdhcXPy5WDmhAygXKIM =wORa -----END PGP SIGNATURE----- --=-CPStII3j/BeupPEhpxKt--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1213026755.1628.25.camel>