From owner-freebsd-smp@FreeBSD.ORG  Mon Mar 17 16:44:10 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0B423106566B;
	Mon, 17 Mar 2008 16:44:10 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219])
	by mx1.freebsd.org (Postfix) with ESMTP id 4BCC58FC1C;
	Mon, 17 Mar 2008 16:44:08 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (unverified [66.23.211.162]) 
	by speedfactory.net (SurgeMail 3.8s) with ESMTP id 235801822-1834499 
	for multiple; Mon, 17 Mar 2008 12:42:28 -0400
Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m2HGi3qT078981;
	Mon, 17 Mar 2008 12:44:05 -0400 (EDT) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-smp@freebsd.org
Date: Mon, 17 Mar 2008 11:27:20 -0400
User-Agent: KMail/1.9.7
References: <20080315024114.GD67856@elvis.mu.org>
In-Reply-To: <20080315024114.GD67856@elvis.mu.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200803171127.20561.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]);
	Mon, 17 Mar 2008 12:44:05 -0400 (EDT)
X-Virus-Scanned: ClamAV 0.91.2/6275/Mon Mar 17 10:08:48 2008 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: stable@freebsd.org, Alfred Perlstein <alfred@freebsd.org>
Subject: Re: timeout/untimeout race conditions/crash [patch]
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Mar 2008 16:44:10 -0000

On Friday 14 March 2008 10:41:14 pm Alfred Perlstein wrote:
> We think we tracked down a defect in timeout/untimeout in
> FreeBSD.
> 
> We have reduced the problem to the following scenario:
> 
> 2+ cpu system, one cpu is running softclock at the same time
> another thread is running on another cpu which makes use of
> timeout/untimeout.
> 
> CPU 0 is running "softclock"
> CPU 1 is running "driver" with Giant held.
> 
> softclock: mtx_lock_spin(&callout_lock)
> softclock: CACHES the callout structure's fields.
> softclock: sees that it's a CALLOUT_LOCAL_ALLOC
> softclock: executes this code:
>   if (c->c_flags & CALLOUT_LOCAL_ALLOC) {
>   	c->c_func = NULL;
>   	c->c_flags = CALLOUT_LOCAL_ALLOC;
>   	SLIST_INSERT_HEAD(&callfree, c,
>   		c_links.sle);
>   	curr_callout = NULL;
>   } else {
>   
>   NOTE: that c->c_func has been set to NULL and curr_callout
>         is also NULL.
> softclock: mtx_unlock_spin(&callout_lock)
> driver: calls untimeout(), the following sequence happens:
>         mtx_lock_spin(&callout_lock);
>         if (handle.callout->c_func == ftn && handle.callout->c_arg == arg)
>                 callout_stop(handle.callout);
>         mtx_unlock_spin(&callout_lock);
> 
>   NOTE: untimeout() sees that handle.callout->c_func is not set
>         to the function so it does NOT call callout_stop(9)!
> driver: free's backing structure for c->c_arg.
> softclock: executes callout.
> softclock: likely crashes at this point due to access after free.
> 
> I have a patch I'm trying out here, but I need feedback on it.
> 
> The way the patch works is to treat CALLOUT_LOCAL_ALLOC (timeout/untimeout)
> callouts the same as ~CALLOUT_LOCAL_ALLOC allocs, and moves the
> freelist manipulation to the end of the callout dispatch.
> 
> Some light testing seems to have the system work.
> 
> We are doing some testing in-house to also make sure this works.
> 
> Please provide feedback.
> 
> See attached delta.

This is not a bug.  Don't use untimeout(9) as it is not guaranteed to be 
reliable.  Instead, use callout_*().  Your patch doesn't solve any races as 
the driver detach routine needs to use callout_drain() and not just 
callout_stop/untimeout anyways.  Fix your broken drivers.

-- 
John Baldwin

From owner-freebsd-smp@FreeBSD.ORG  Mon Mar 17 20:26:53 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8D0F11065671
	for <freebsd-smp@freebsd.org>; Mon, 17 Mar 2008 20:26:53 +0000 (UTC)
	(envelope-from bright@elvis.mu.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.freebsd.org (Postfix) with ESMTP id 7F7EE8FC16
	for <freebsd-smp@freebsd.org>; Mon, 17 Mar 2008 20:26:53 +0000 (UTC)
	(envelope-from bright@elvis.mu.org)
Received: by elvis.mu.org (Postfix, from userid 1192)
	id AE7C51A4D84; Mon, 17 Mar 2008 13:10:14 -0700 (PDT)
Date: Mon, 17 Mar 2008 13:10:14 -0700
From: Alfred Perlstein <alfred@freebsd.org>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20080317201014.GA67856@elvis.mu.org>
References: <20080315024114.GD67856@elvis.mu.org>
	<200803171127.20561.jhb@freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200803171127.20561.jhb@freebsd.org>
User-Agent: Mutt/1.4.2.3i
Cc: stable@freebsd.org, freebsd-smp@freebsd.org
Subject: Re: timeout/untimeout race conditions/crash [patch]
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Mar 2008 20:26:53 -0000

* John Baldwin <jhb@freebsd.org> [080317 09:43] wrote:
> 
> This is not a bug.  Don't use untimeout(9) as it is not guaranteed to be 
> reliable.  Instead, use callout_*().  Your patch doesn't solve any races as 
> the driver detach routine needs to use callout_drain() and not just 
> callout_stop/untimeout anyways.  Fix your broken drivers.

I understand that some old Giant locked code can issue timeout/untimeout
without Giant held, which would certainly cause this issue to happen
and is uncorrectable, however, this is with completely Giant locked
code.

We are not trying to use timeout(9) for mpsafe code, this is old
code and relies upon Giant. 

Giant locked code should be timeout/untimeout safe.  As explained
in my email, there exists a condition where the Giant locked code
can have a timer fire even though proper Giant locking is observed.

For a Giant locked subsystem, one should be able to have the following
code work:

mtx_lock(&Giant);	/* formerly spl higher than softclock */
untimeout(&func, arg, &sc->handle);
free(sc);
mtx_unlock(&Giant);	/* formerly splx() */

Normally splsoftclock would completely block the timeout from firing
and this sort of code would be safe.  It is no longer safe due to
a BUG in the way that Giant is used.

Please reread the original mail to better understand the synopsis
of the problem.

thank you,
-Alfred

From owner-freebsd-smp@FreeBSD.ORG  Mon Mar 17 22:29:51 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 24950106566B;
	Mon, 17 Mar 2008 22:29:51 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219])
	by mx1.freebsd.org (Postfix) with ESMTP id 2FEDB8FC20;
	Mon, 17 Mar 2008 22:29:50 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (unverified [66.23.211.162]) 
	by speedfactory.net (SurgeMail 3.8s) with ESMTP id 235838216-1834499 
	for multiple; Mon, 17 Mar 2008 18:28:01 -0400
Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m2HMTbv0081565;
	Mon, 17 Mar 2008 18:29:37 -0400 (EDT) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: Alfred Perlstein <alfred@freebsd.org>
Date: Mon, 17 Mar 2008 16:59:33 -0400
User-Agent: KMail/1.9.7
References: <20080315024114.GD67856@elvis.mu.org>
	<200803171127.20561.jhb@freebsd.org>
	<20080317201014.GA67856@elvis.mu.org>
In-Reply-To: <20080317201014.GA67856@elvis.mu.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200803171659.33547.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]);
	Mon, 17 Mar 2008 18:29:37 -0400 (EDT)
X-Virus-Scanned: ClamAV 0.91.2/6277/Mon Mar 17 14:08:59 2008 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: stable@freebsd.org, freebsd-smp@freebsd.org
Subject: Re: timeout/untimeout race conditions/crash [patch]
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Mar 2008 22:29:51 -0000

On Monday 17 March 2008 04:10:14 pm Alfred Perlstein wrote:
> * John Baldwin <jhb@freebsd.org> [080317 09:43] wrote:
> > 
> > This is not a bug.  Don't use untimeout(9) as it is not guaranteed to be 
> > reliable.  Instead, use callout_*().  Your patch doesn't solve any races 
as 
> > the driver detach routine needs to use callout_drain() and not just 
> > callout_stop/untimeout anyways.  Fix your broken drivers.
> 
> I understand that some old Giant locked code can issue timeout/untimeout
> without Giant held, which would certainly cause this issue to happen
> and is uncorrectable, however, this is with completely Giant locked
> code.
> 
> We are not trying to use timeout(9) for mpsafe code, this is old
> code and relies upon Giant. 
> 
> Giant locked code should be timeout/untimeout safe.  As explained
> in my email, there exists a condition where the Giant locked code
> can have a timer fire even though proper Giant locking is observed.
> 
> For a Giant locked subsystem, one should be able to have the following
> code work:
> 
> mtx_lock(&Giant);	/* formerly spl higher than softclock */
> untimeout(&func, arg, &sc->handle);
> free(sc);
> mtx_unlock(&Giant);	/* formerly splx() */
> 
> Normally splsoftclock would completely block the timeout from firing
> and this sort of code would be safe.  It is no longer safe due to
> a BUG in the way that Giant is used.
> 
> Please reread the original mail to better understand the synopsis
> of the problem.

Hmm.  My worry is about leaving the callout structure around while invoking 
the timeout routine itself, but it is already off the callwheel so it 
shouldn't be visible via untimeout() to any other code.  I guess the patch is 
ok, but I'll be happy when we can axe timeout/untimeout altogether.

-- 
John Baldwin

From owner-freebsd-smp@FreeBSD.ORG  Fri Mar 21 04:32:26 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1E69B1065672
	for <smp@freebsd.org>; Fri, 21 Mar 2008 04:32:26 +0000 (UTC)
	(envelope-from bright@elvis.mu.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.freebsd.org (Postfix) with ESMTP id 094D08FC1F
	for <smp@freebsd.org>; Fri, 21 Mar 2008 04:32:25 +0000 (UTC)
	(envelope-from bright@elvis.mu.org)
Received: by elvis.mu.org (Postfix, from userid 1192)
	id D77C61A4D7E; Thu, 20 Mar 2008 21:32:25 -0700 (PDT)
Date: Thu, 20 Mar 2008 21:32:25 -0700
From: Alfred Perlstein <alfred@freebsd.org>
To: smp@freebsd.org
Message-ID: <20080321043225.GZ67856@elvis.mu.org>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="6lXr1rPCNTf1w0X8"
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
Cc: 
Subject: request for review, callout fix.
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Mar 2008 04:32:26 -0000


--6lXr1rPCNTf1w0X8
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hi guys, attached is a fix for the timeout/untimeout race with
Giant locked code.

Basically the old code would make the callout inaccessable
right before calling it inside of softclock.

However only the callout lock is held, so when switching to
the callout's associated mutex (in this case Giant) there's
a race where a "local" (timeout/untimeout) callout would be
fired even if stopped.

This patch fixes that.  We've run several hours of regression
testing on a version of this for 6.x.

People internal to Juniper and iedowse@ helped with this.

Please review/comment.

thank you,
-- 
- Alfred Perlstein

--6lXr1rPCNTf1w0X8
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment; filename="kern_timeout.diff"

cvs diff: Diffing .
Index: kern_timeout.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_timeout.c,v
retrieving revision 1.110
diff -u -r1.110 kern_timeout.c
--- kern_timeout.c	12 Mar 2008 06:31:06 -0000	1.110
+++ kern_timeout.c	21 Mar 2008 04:28:35 -0000
@@ -230,11 +230,8 @@
 				c_arg = c->c_arg;
 				c_flags = c->c_flags;
 				if (c->c_flags & CALLOUT_LOCAL_ALLOC) {
-					c->c_func = NULL;
 					c->c_flags = CALLOUT_LOCAL_ALLOC;
-					SLIST_INSERT_HEAD(&callfree, c,
-							  c_links.sle);
-					curr_callout = NULL;
+					curr_callout = c;
 				} else {
 					c->c_flags =
 					    (c->c_flags & ~CALLOUT_PENDING);
@@ -299,6 +296,24 @@
 					class->lc_unlock(c_lock);
 			skip:
 				mtx_lock_spin(&callout_lock);
+				/*
+				 * If the current callout is locally
+				 * allocated (timeout(9)) and if it has not
+				 * been killed by untimeout(9)
+				 * then put it on the freelist.
+				 *
+				 * Note: we need to check the cached
+				 * copy of c_flags because if it was not
+				 * local, then it's not safe to deref the
+				 * callout pointer.
+				 */
+				if (c_flags & CALLOUT_LOCAL_ALLOC &&
+				    !(c->c_flags &
+					(CALLOUT_PENDING | CALLOUT_ACTIVE))) {
+					c->c_func = NULL;
+					SLIST_INSERT_HEAD(&callfree, c,
+					    c_links.sle);
+				}
 				curr_callout = NULL;
 				if (callout_wait) {
 					/*

--6lXr1rPCNTf1w0X8--

From owner-freebsd-smp@FreeBSD.ORG  Fri Mar 21 19:08:03 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6C2581065679;
	Fri, 21 Mar 2008 19:08:03 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219])
	by mx1.freebsd.org (Postfix) with ESMTP id B58408FC1E;
	Fri, 21 Mar 2008 19:08:02 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (unverified [66.23.211.162]) 
	by speedfactory.net (SurgeMail 3.8s) with ESMTP id 236285243-1834499 
	for multiple; Fri, 21 Mar 2008 15:09:00 -0400
Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m2LJ7d5t029644;
	Fri, 21 Mar 2008 15:07:39 -0400 (EDT) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-smp@freebsd.org
Date: Fri, 21 Mar 2008 13:14:03 -0400
User-Agent: KMail/1.9.7
References: <20080321043225.GZ67856@elvis.mu.org>
In-Reply-To: <20080321043225.GZ67856@elvis.mu.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200803211314.04002.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]);
	Fri, 21 Mar 2008 15:07:40 -0400 (EDT)
X-Virus-Scanned: ClamAV 0.91.2/6316/Fri Mar 21 10:29:54 2008 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: Alfred Perlstein <alfred@freebsd.org>
Subject: Re: request for review, callout fix.
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Mar 2008 19:08:03 -0000

On Friday 21 March 2008 12:32:25 am Alfred Perlstein wrote:
> Hi guys, attached is a fix for the timeout/untimeout race with
> Giant locked code.
> 
> Basically the old code would make the callout inaccessable
> right before calling it inside of softclock.
> 
> However only the callout lock is held, so when switching to
> the callout's associated mutex (in this case Giant) there's
> a race where a "local" (timeout/untimeout) callout would be
> fired even if stopped.
> 
> This patch fixes that.  We've run several hours of regression
> testing on a version of this for 6.x.
> 
> People internal to Juniper and iedowse@ helped with this.
> 
> Please review/comment.

Curious as to how c->c_flags could change if CALLOUT_LOCAL_ALLOC is set?  
Since it hasn't been enqueued on the callfree list, it isn't visible to any 
other code, so nothing should be able to mark it active or pending.  IOW, you 
should be able to do this in your second hunk:

	if (c_flags & CALLOUT_LOCAL_ALLOC) {
		KASSERT(c->c_flags == CALLOUT_LOCAL_ALLOC, ("corrupted callout"));
		c->c_func = NULL;
		SLIST_INSERT_HEAD(...);
	}

-- 
John Baldwin

From owner-freebsd-smp@FreeBSD.ORG  Fri Mar 21 20:30:40 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 03AA11065701;
	Fri, 21 Mar 2008 20:30:40 +0000 (UTC)
	(envelope-from bright@elvis.mu.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.freebsd.org (Postfix) with ESMTP id DCF928FC23;
	Fri, 21 Mar 2008 20:30:39 +0000 (UTC)
	(envelope-from bright@elvis.mu.org)
Received: by elvis.mu.org (Postfix, from userid 1192)
	id CF41D1A4D7E; Fri, 21 Mar 2008 13:30:39 -0700 (PDT)
Date: Fri, 21 Mar 2008 13:30:39 -0700
From: Alfred Perlstein <alfred@freebsd.org>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20080321203039.GJ67856@elvis.mu.org>
References: <20080321043225.GZ67856@elvis.mu.org>
	<200803211314.04002.jhb@freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200803211314.04002.jhb@freebsd.org>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-smp@freebsd.org
Subject: Re: request for review, callout fix.
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Mar 2008 20:30:40 -0000

* John Baldwin <jhb@freebsd.org> [080321 12:08] wrote:
> On Friday 21 March 2008 12:32:25 am Alfred Perlstein wrote:
> > Hi guys, attached is a fix for the timeout/untimeout race with
> > Giant locked code.
> > 
> > Basically the old code would make the callout inaccessable
> > right before calling it inside of softclock.
> > 
> > However only the callout lock is held, so when switching to
> > the callout's associated mutex (in this case Giant) there's
> > a race where a "local" (timeout/untimeout) callout would be
> > fired even if stopped.
> > 
> > This patch fixes that.  We've run several hours of regression
> > testing on a version of this for 6.x.
> > 
> > People internal to Juniper and iedowse@ helped with this.
> > 
> > Please review/comment.
> 
> Curious as to how c->c_flags could change if CALLOUT_LOCAL_ALLOC is set?  
> Since it hasn't been enqueued on the callfree list, it isn't visible to any 
> other code, so nothing should be able to mark it active or pending.  IOW, you 
> should be able to do this in your second hunk:
> 
> 	if (c_flags & CALLOUT_LOCAL_ALLOC) {
> 		KASSERT(c->c_flags == CALLOUT_LOCAL_ALLOC, ("corrupted callout"));
> 		c->c_func = NULL;
> 		SLIST_INSERT_HEAD(...);
> 	}

It's more hairy than that.

Actually, I think you're right...

The confusion is the race for "callout_stop_safe",
but now that I think about it, the softclock will only
"grab" this callout if untimeout has NOT yet been called.

If untimeout HAS been called, then softclock won't even see
the callout.
If untimeout HAS NOT been called and softclock grabs the
callout, it will have cleared CALLOUT_PENDING and then
untimeout (callout_stop_safe) will no longer free it.

Therefore it is safe to omit the check for flags as you
suggest.

Is that right?

-- 
- Alfred Perlstein

From owner-freebsd-smp@FreeBSD.ORG  Fri Mar 21 20:55:12 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A656B106566C;
	Fri, 21 Mar 2008 20:55:12 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219])
	by mx1.freebsd.org (Postfix) with ESMTP id B21528FC13;
	Fri, 21 Mar 2008 20:55:11 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (unverified [66.23.211.162]) 
	by speedfactory.net (SurgeMail 3.8s) with ESMTP id 236295102-1834499 
	for multiple; Fri, 21 Mar 2008 16:56:23 -0400
Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m2LKt8Mf030415;
	Fri, 21 Mar 2008 16:55:08 -0400 (EDT) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: Alfred Perlstein <alfred@freebsd.org>
Date: Fri, 21 Mar 2008 16:54:36 -0400
User-Agent: KMail/1.9.7
References: <20080321043225.GZ67856@elvis.mu.org>
	<200803211314.04002.jhb@freebsd.org>
	<20080321203039.GJ67856@elvis.mu.org>
In-Reply-To: <20080321203039.GJ67856@elvis.mu.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200803211654.36299.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]);
	Fri, 21 Mar 2008 16:55:08 -0400 (EDT)
X-Virus-Scanned: ClamAV 0.91.2/6317/Fri Mar 21 13:33:40 2008 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: freebsd-smp@freebsd.org
Subject: Re: request for review, callout fix.
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Mar 2008 20:55:12 -0000

On Friday 21 March 2008 04:30:39 pm Alfred Perlstein wrote:
> * John Baldwin <jhb@freebsd.org> [080321 12:08] wrote:
> > On Friday 21 March 2008 12:32:25 am Alfred Perlstein wrote:
> > > Hi guys, attached is a fix for the timeout/untimeout race with
> > > Giant locked code.
> > > 
> > > Basically the old code would make the callout inaccessable
> > > right before calling it inside of softclock.
> > > 
> > > However only the callout lock is held, so when switching to
> > > the callout's associated mutex (in this case Giant) there's
> > > a race where a "local" (timeout/untimeout) callout would be
> > > fired even if stopped.
> > > 
> > > This patch fixes that.  We've run several hours of regression
> > > testing on a version of this for 6.x.
> > > 
> > > People internal to Juniper and iedowse@ helped with this.
> > > 
> > > Please review/comment.
> > 
> > Curious as to how c->c_flags could change if CALLOUT_LOCAL_ALLOC is set?  
> > Since it hasn't been enqueued on the callfree list, it isn't visible to 
any 
> > other code, so nothing should be able to mark it active or pending.  IOW, 
you 
> > should be able to do this in your second hunk:
> > 
> > 	if (c_flags & CALLOUT_LOCAL_ALLOC) {
> > 		KASSERT(c->c_flags == CALLOUT_LOCAL_ALLOC, ("corrupted callout"));
> > 		c->c_func = NULL;
> > 		SLIST_INSERT_HEAD(...);
> > 	}
> 
> It's more hairy than that.
> 
> Actually, I think you're right...
> 
> The confusion is the race for "callout_stop_safe",
> but now that I think about it, the softclock will only
> "grab" this callout if untimeout has NOT yet been called.
>
> If untimeout HAS been called, then softclock won't even see
> the callout.

Yes.

> If untimeout HAS NOT been called and softclock grabs the
> callout, it will have cleared CALLOUT_PENDING and then
> untimeout (callout_stop_safe) will no longer free it.

Yes.
 
> Therefore it is safe to omit the check for flags as you
> suggest.
>
> Is that right?

Yes, I believe so.  The tricky case is the race you are originally trying to 
fix which is that the untimeout() comes in after the spin lock is dropped but 
before Giant is acquired, but that falls under your second case above.

-- 
John Baldwin

From owner-freebsd-smp@FreeBSD.ORG  Fri Mar 21 23:52:05 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E21301065672;
	Fri, 21 Mar 2008 23:52:05 +0000 (UTC)
	(envelope-from bright@elvis.mu.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.freebsd.org (Postfix) with ESMTP id C22CC8FC17;
	Fri, 21 Mar 2008 23:52:05 +0000 (UTC)
	(envelope-from bright@elvis.mu.org)
Received: by elvis.mu.org (Postfix, from userid 1192)
	id 330691A4D7E; Fri, 21 Mar 2008 16:52:05 -0700 (PDT)
Date: Fri, 21 Mar 2008 16:52:05 -0700
From: Alfred Perlstein <alfred@freebsd.org>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20080321235205.GO67856@elvis.mu.org>
References: <20080321043225.GZ67856@elvis.mu.org>
	<200803211314.04002.jhb@freebsd.org>
	<20080321203039.GJ67856@elvis.mu.org>
	<200803211654.36299.jhb@freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200803211654.36299.jhb@freebsd.org>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-smp@freebsd.org
Subject: Re: request for review, callout fix.
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Mar 2008 23:52:06 -0000

* John Baldwin <jhb@freebsd.org> [080321 13:55] wrote:
> On Friday 21 March 2008 04:30:39 pm Alfred Perlstein wrote:
> > * John Baldwin <jhb@freebsd.org> [080321 12:08] wrote:
> > > On Friday 21 March 2008 12:32:25 am Alfred Perlstein wrote:
> > > > Hi guys, attached is a fix for the timeout/untimeout race with
> > > > Giant locked code.
> > > > 
> > > > Basically the old code would make the callout inaccessable
> > > > right before calling it inside of softclock.
> > > > 
> > > > However only the callout lock is held, so when switching to
> > > > the callout's associated mutex (in this case Giant) there's
> > > > a race where a "local" (timeout/untimeout) callout would be
> > > > fired even if stopped.
> > > > 
> > > > This patch fixes that.  We've run several hours of regression
> > > > testing on a version of this for 6.x.
> > > > 
> > > > People internal to Juniper and iedowse@ helped with this.
> > > > 
> > > > Please review/comment.
> > > 
> > > Curious as to how c->c_flags could change if CALLOUT_LOCAL_ALLOC is set?  
> > > Since it hasn't been enqueued on the callfree list, it isn't visible to 
> any 
> > > other code, so nothing should be able to mark it active or pending.  IOW, 
> you 
> > > should be able to do this in your second hunk:
> > > 
> > > 	if (c_flags & CALLOUT_LOCAL_ALLOC) {
> > > 		KASSERT(c->c_flags == CALLOUT_LOCAL_ALLOC, ("corrupted callout"));
> > > 		c->c_func = NULL;
> > > 		SLIST_INSERT_HEAD(...);
> > > 	}
> > 
> > It's more hairy than that.
> > 
> > Actually, I think you're right...
> > 
> > The confusion is the race for "callout_stop_safe",
> > but now that I think about it, the softclock will only
> > "grab" this callout if untimeout has NOT yet been called.
> >
> > If untimeout HAS been called, then softclock won't even see
> > the callout.
> 
> Yes.
> 
> > If untimeout HAS NOT been called and softclock grabs the
> > callout, it will have cleared CALLOUT_PENDING and then
> > untimeout (callout_stop_safe) will no longer free it.
> 
> Yes.
>  
> > Therefore it is safe to omit the check for flags as you
> > suggest.
> >
> > Is that right?
> 
> Yes, I believe so.  The tricky case is the race you are originally trying to 
> fix which is that the untimeout() comes in after the spin lock is dropped but 
> before Giant is acquired, but that falls under your second case above.

Thank you, I'll be committing later tonight.

-Alfred