From owner-freebsd-smp@FreeBSD.ORG Mon Mar 17 22:29:51 2008 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 24950106566B; Mon, 17 Mar 2008 22:29:51 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id 2FEDB8FC20; Mon, 17 Mar 2008 22:29:50 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8s) with ESMTP id 235838216-1834499 for multiple; Mon, 17 Mar 2008 18:28:01 -0400 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m2HMTbv0081565; Mon, 17 Mar 2008 18:29:37 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: Alfred Perlstein Date: Mon, 17 Mar 2008 16:59:33 -0400 User-Agent: KMail/1.9.7 References: <20080315024114.GD67856@elvis.mu.org> <200803171127.20561.jhb@freebsd.org> <20080317201014.GA67856@elvis.mu.org> In-Reply-To: <20080317201014.GA67856@elvis.mu.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200803171659.33547.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Mon, 17 Mar 2008 18:29:37 -0400 (EDT) X-Virus-Scanned: ClamAV 0.91.2/6277/Mon Mar 17 14:08:59 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: stable@freebsd.org, freebsd-smp@freebsd.org Subject: Re: timeout/untimeout race conditions/crash [patch] X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Mar 2008 22:29:51 -0000 On Monday 17 March 2008 04:10:14 pm Alfred Perlstein wrote: > * John Baldwin [080317 09:43] wrote: > > > > This is not a bug. Don't use untimeout(9) as it is not guaranteed to be > > reliable. Instead, use callout_*(). Your patch doesn't solve any races as > > the driver detach routine needs to use callout_drain() and not just > > callout_stop/untimeout anyways. Fix your broken drivers. > > I understand that some old Giant locked code can issue timeout/untimeout > without Giant held, which would certainly cause this issue to happen > and is uncorrectable, however, this is with completely Giant locked > code. > > We are not trying to use timeout(9) for mpsafe code, this is old > code and relies upon Giant. > > Giant locked code should be timeout/untimeout safe. As explained > in my email, there exists a condition where the Giant locked code > can have a timer fire even though proper Giant locking is observed. > > For a Giant locked subsystem, one should be able to have the following > code work: > > mtx_lock(&Giant); /* formerly spl higher than softclock */ > untimeout(&func, arg, &sc->handle); > free(sc); > mtx_unlock(&Giant); /* formerly splx() */ > > Normally splsoftclock would completely block the timeout from firing > and this sort of code would be safe. It is no longer safe due to > a BUG in the way that Giant is used. > > Please reread the original mail to better understand the synopsis > of the problem. Hmm. My worry is about leaving the callout structure around while invoking the timeout routine itself, but it is already off the callwheel so it shouldn't be visible via untimeout() to any other code. I guess the patch is ok, but I'll be happy when we can axe timeout/untimeout altogether. -- John Baldwin