From owner-freebsd-current@FreeBSD.ORG Mon Jun 9 15:54:32 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 45FC91065683 for ; Mon, 9 Jun 2008 15:54:32 +0000 (UTC) (envelope-from cokane@freebsd.org) Received: from QMTA02.westchester.pa.mail.comcast.net (qmta02.westchester.pa.mail.comcast.net [76.96.62.24]) by mx1.freebsd.org (Postfix) with ESMTP id D6C698FC13 for ; Mon, 9 Jun 2008 15:54:31 +0000 (UTC) (envelope-from cokane@freebsd.org) Received: from OMTA13.westchester.pa.mail.comcast.net ([76.96.62.52]) by QMTA02.westchester.pa.mail.comcast.net with comcast id boNY1Z00H17dt5G520d400; Mon, 09 Jun 2008 15:54:31 +0000 Received: from mail.cokane.org ([24.60.133.163]) by OMTA13.westchester.pa.mail.comcast.net with comcast id bruS1Z00K3Xh0XL3ZruSrt; Mon, 09 Jun 2008 15:54:27 +0000 X-Authority-Analysis: v=1.0 c=1 a=CK24QABrBtkA:10 a=RwsSqX3J4hAA:10 a=6I5d2MoRAAAA:8 a=FF83IL09uoHXGw9gLswA:9 a=PpmsRWhHudGY848VJjoA:7 a=wogEtuuXR_2mY6TvcBCqGl9xkqEA:4 a=SV7veod9ZcQA:10 a=LY0hPdMaydYA:10 a=FhdDn_9PNY0He1rkusIA:9 a=CR2BPuoWrF9lI5vQnQBDDDxbaMcA:4 a=rPt6xJ-oxjAA:10 Received: by mail.cokane.org (Postfix, from userid 103) id B8DAE35A7D5; Mon, 9 Jun 2008 11:54:26 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 3.1.8-gr1 (2007-02-13) on discordia X-Spam-Level: X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.1.8-gr1 Received: from [172.20.1.3] (erwin.int.cokane.org [172.20.1.3]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.cokane.org (Postfix) with ESMTP id 8729235A7D4; Mon, 9 Jun 2008 11:54:15 -0400 (EDT) From: Coleman Kane To: "Paul B. Mahol" In-Reply-To: <1213013020.25641.3.camel@localhost> References: <1212870158.1724.25.camel@localhost> <3a142e750806071855wb9f4cddk9be25e8c0f3d4984@mail.gmail.com> <1213013020.25641.3.camel@localhost> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-CPStII3j/BeupPEhpxKt" Organization: FreeBSD Project Date: Mon, 09 Jun 2008 11:52:35 -0400 Message-Id: <1213026755.1628.25.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.22.2 FreeBSD GNOME Team Port Cc: freebsd-current@freebsd.org, thompsa@FreeBSD.org, jhb@FreeBSD.org Subject: Re: Call for Testers: Convert watchdog spinlock into a sleepable mutex in if_ndis X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jun 2008 15:54:32 -0000 --=-CPStII3j/BeupPEhpxKt Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Mon, 2008-06-09 at 08:03 -0400, Coleman Kane wrote: > On Sun, 2008-06-08 at 03:55 +0200, Paul B. Mahol wrote: > > On 6/7/08, Coleman Kane wrote: > > > Hi, > > > > > > I've got another patch to if_ndis and I'd like to know if any NDIS us= ers > > > out there can test it for me and give me some feedback. I've converte= d > > > the ndis_spinlock in if_ndis into a normal sleepable mutex, and would > > > like to get some testers. The idea here is to eliminate an unnecessar= y > > > "spinning" in the kernel, where it would be preferable to sleep (for = CPU > > > usage, interactivity, etc...). > > > > > > You can download it here: > > > * > > > http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx.pat= ch > > > > > > -- > > > Coleman Kane > > > > >=20 > > I did not found any stability regressions in my environment with linked= patch. > >=20 > > But I noticed new LOR (it appears as soon as radio of device is tur= ned on): > >=20 > > lock order reversal: > > 1st 0xc401726c ndis0 (network driver) @ > > /usr/local/src/sys/modules/if_ndis/../../dev/if_ndis/if_ndis.c:1648 > > 2nd 0xc09b3e74 HAL preemption lock (HAL lock) @ > > /usr/local/src/sys/modules/ndis/../../compat/ndis/subr_hal.c:423 > > KDB: stack backtrace: > > db_trace_self_wrapper(c0708534,c3985bcc,c0541a6e,c070adf6,c09b3e74,...) > > at db_trace_self_wrapper+0x26 > > kdb_backtrace(c070adf6,c09b3e74,c09b0a51,c09b0a48,c09b09f8,...) at > > kdb_backtrace+0x29 > > witness_checkorder(c09b3e74,9,c09b09f8,1a7,c04f7364,...) at > > witness_checkorder+0x6de > > _mtx_lock_flags(c09b3e74,0,c09b09f8,1a7,c457ebd0,...) at _mtx_lock_flag= s+0xbc > > KfRaiseIrql(2,c457ebd0,c3985c48,c09aacdd,c3d45354,...) at KfRaiseIrql+0= x63 > > KfAcquireSpinLock(c3d45354,c094888a,c4017200,c3d6cc40,c4017200,...) at > > KfAcquireSpinLock+0x13 > > IoQueueWorkItem(c457ebd0,c3d6cc40,0,c4017200,c07040ef,...) at > > IoQueueWorkItem+0x2d > > ndis_tick(c4017200,0,c070699c,166,c077e594,...) at ndis_tick+0x143 > > softclock(c077e560,0,c0701ca6,4d4,c3cda6ec,...) at softclock+0x24a > > ithread_loop(c3cdb2e0,c3985d38,c0701a09,324,c3c99a70,...) at ithread_lo= op+0x1b5 > > fork_exit(c04e74f0,c3cdb2e0,c3985d38) at fork_exit+0xb8 > > fork_trampoline() at fork_trampoline+0x8 > > --- trap 0, eip =3D 0, esp =3D 0xc3985d70, ebp =3D 0 --- >=20 > Try this one: > * http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx= 2.patch >=20 > There's the same LOR in the existing code as well, but it is hidden by > WITNESS_SKIPSPIN. >=20 > If you can verify that this works, I'll just commit them both as a > single update. >=20 Paul, Ignore the previous patch (#2) and try this one instead: * http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx3.= patch I analyzed the code a bit more and decided to push the mtx acquisition so that it is inside the test for the TX-watchdog case. This should result in having less traffic on that mutex under normal running conditions. In this case, the mutex is only locked inside the ndis_ticktask, and also inside the TX-watchdog handling code. In general, when neither of these cases are true and the only action is a simple re-schedule of the callout, the lock never gets acquired. thompsa/jhb: What do you think of these last tweaks? I did some analysis and I don't think that the ndis_hang_timer (rw), nmb_checkforhangsecs (ro), and ndis_tx_timer (rw) are themselves subject to any races in the current design (the rw/ro indicate their use within the unlocked ndis_tick context). Furthermore, as callout_drain(9) is the only method acting upon the callout in any multithreaded context, it should be safe to perform callout_reset(9) on the unlocked structure. I also did some more research (just now) and it looks like ndis_start (called by ndis_starttask) and ndis_reset_nic (in sys/compat/ndis/kern_ndis.c, called by ndis_resettask) both perform their own locking. It is looking to me like we may not even need to acquire NDIS_LOCK() from inside ndis_tick() at all, but instead just use the existing locking in the task-specific functions that it calls in wd/timeout cases. --=20 Coleman Kane --=-CPStII3j/BeupPEhpxKt Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEABECAAYFAkhNUbwACgkQcMSxQcXat5doZACfd8KgxspwVDZ9WZVxLxgl2uDh UW0An0ZB+jYQ6mdhcXPy5WDmhAygXKIM =wORa -----END PGP SIGNATURE----- --=-CPStII3j/BeupPEhpxKt--