Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 09 Jun 2008 11:52:35 -0400
From:      Coleman Kane <cokane@FreeBSD.org>
To:        "Paul B. Mahol" <onemda@gmail.com>
Cc:        freebsd-current@freebsd.org, thompsa@FreeBSD.org, jhb@FreeBSD.org
Subject:   Re: Call for Testers: Convert watchdog spinlock into a sleepable mutex in if_ndis
Message-ID:  <1213026755.1628.25.camel@localhost>
In-Reply-To: <1213013020.25641.3.camel@localhost>
References:  <1212870158.1724.25.camel@localhost> <3a142e750806071855wb9f4cddk9be25e8c0f3d4984@mail.gmail.com> <1213013020.25641.3.camel@localhost>

next in thread | previous in thread | raw e-mail | index | archive | help

--=-CPStII3j/BeupPEhpxKt
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

On Mon, 2008-06-09 at 08:03 -0400, Coleman Kane wrote:
> On Sun, 2008-06-08 at 03:55 +0200, Paul B. Mahol wrote:
> > On 6/7/08, Coleman Kane <cokane@freebsd.org> wrote:
> > > Hi,
> > >
> > > I've got another patch to if_ndis and I'd like to know if any NDIS us=
ers
> > > out there can test it for me and give me some feedback. I've converte=
d
> > > the ndis_spinlock in if_ndis into a normal sleepable mutex, and would
> > > like to get some testers. The idea here is to eliminate an unnecessar=
y
> > > "spinning" in the kernel, where it would be preferable to sleep (for =
CPU
> > > usage, interactivity, etc...).
> > >
> > > You can download it here:
> > >       *
> > > http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx.pat=
ch
> > >
> > > --
> > > Coleman Kane
> > >
> >=20
> > I did not found any stability regressions in my environment with linked=
 patch.
> >=20
> > But I noticed   new   LOR (it appears as soon as radio of device is tur=
ned on):
> >=20
> > lock order reversal:
> >  1st 0xc401726c ndis0 (network driver) @
> > /usr/local/src/sys/modules/if_ndis/../../dev/if_ndis/if_ndis.c:1648
> >  2nd 0xc09b3e74 HAL preemption lock (HAL lock) @
> > /usr/local/src/sys/modules/ndis/../../compat/ndis/subr_hal.c:423
> > KDB: stack backtrace:
> > db_trace_self_wrapper(c0708534,c3985bcc,c0541a6e,c070adf6,c09b3e74,...)
> > at db_trace_self_wrapper+0x26
> > kdb_backtrace(c070adf6,c09b3e74,c09b0a51,c09b0a48,c09b09f8,...) at
> > kdb_backtrace+0x29
> > witness_checkorder(c09b3e74,9,c09b09f8,1a7,c04f7364,...) at
> > witness_checkorder+0x6de
> > _mtx_lock_flags(c09b3e74,0,c09b09f8,1a7,c457ebd0,...) at _mtx_lock_flag=
s+0xbc
> > KfRaiseIrql(2,c457ebd0,c3985c48,c09aacdd,c3d45354,...) at KfRaiseIrql+0=
x63
> > KfAcquireSpinLock(c3d45354,c094888a,c4017200,c3d6cc40,c4017200,...) at
> > KfAcquireSpinLock+0x13
> > IoQueueWorkItem(c457ebd0,c3d6cc40,0,c4017200,c07040ef,...) at
> > IoQueueWorkItem+0x2d
> > ndis_tick(c4017200,0,c070699c,166,c077e594,...) at ndis_tick+0x143
> > softclock(c077e560,0,c0701ca6,4d4,c3cda6ec,...) at softclock+0x24a
> > ithread_loop(c3cdb2e0,c3985d38,c0701a09,324,c3c99a70,...) at ithread_lo=
op+0x1b5
> > fork_exit(c04e74f0,c3cdb2e0,c3985d38) at fork_exit+0xb8
> > fork_trampoline() at fork_trampoline+0x8
> > --- trap 0, eip =3D 0, esp =3D 0xc3985d70, ebp =3D 0 ---
>=20
> Try this one:
>       * http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx=
2.patch
>=20
> There's the same LOR in the existing code as well, but it is hidden by
> WITNESS_SKIPSPIN.
>=20
> If you can verify that this works, I'll just commit them both as a
> single update.
>=20

Paul,

Ignore the previous patch (#2) and try this one instead:
      * http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx3.=
patch

I analyzed the code a bit more and decided to push the mtx acquisition
so that it is inside the test for the TX-watchdog case. This should
result in having less traffic on that mutex under normal running
conditions. In this case, the mutex is only locked inside the
ndis_ticktask, and also inside the TX-watchdog handling code. In
general, when neither of these cases are true and the only action is a
simple re-schedule of the callout, the lock never gets acquired.

thompsa/jhb: What do you think of these last tweaks? I did some analysis
and I don't think that the ndis_hang_timer (rw), nmb_checkforhangsecs
(ro), and ndis_tx_timer (rw) are themselves subject to any races in the
current design (the rw/ro indicate their use within the unlocked
ndis_tick context).

Furthermore, as callout_drain(9) is the only method acting upon the
callout in any multithreaded context, it should be safe to perform
callout_reset(9) on the unlocked structure. I also did some more
research (just now) and it looks like ndis_start (called by
ndis_starttask) and ndis_reset_nic (in sys/compat/ndis/kern_ndis.c,
called by ndis_resettask) both perform their own locking. It is looking
to me like we may not even need to acquire NDIS_LOCK() from inside
ndis_tick() at all, but instead just use the existing locking in the
task-specific functions that it calls in wd/timeout cases.

--=20
Coleman Kane

--=-CPStII3j/BeupPEhpxKt
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (FreeBSD)

iEYEABECAAYFAkhNUbwACgkQcMSxQcXat5doZACfd8KgxspwVDZ9WZVxLxgl2uDh
UW0An0ZB+jYQ6mdhcXPy5WDmhAygXKIM
=wORa
-----END PGP SIGNATURE-----

--=-CPStII3j/BeupPEhpxKt--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1213026755.1628.25.camel>