From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 10 01:41:16 2003
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 8145737B401; Thu, 10 Jul 2003 01:41:16 -0700 (PDT)
Received: from mailhub.fokus.fraunhofer.de (mailhub.fokus.fraunhofer.de
	[193.174.154.14])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id AEEB743FBD; Thu, 10 Jul 2003 01:41:14 -0700 (PDT)
	(envelope-from brandt@fokus.fraunhofer.de)
Received: from beagle (beagle [193.175.132.100])h6A8fDQ21635;
	Thu, 10 Jul 2003 10:41:13 +0200 (MEST)
Date: Thu, 10 Jul 2003 10:41:13 +0200 (CEST)
From: Harti Brandt <brandt@fokus.fraunhofer.de>
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <XFMail.20030709174722.jhb@FreeBSD.org>
Message-ID: <20030710103146.R30571@beagle.fokus.fraunhofer.de>
References: <XFMail.20030709174722.jhb@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: hackers@FreeBSD.org
Subject: RE: Race in kevent
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: harti@FreeBSD.org
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Jul 2003 08:41:16 -0000

On Wed, 9 Jul 2003, John Baldwin wrote:

JB>On 09-Jul-2003 Harti Brandt wrote:
JB>>
JB>> Hi,
JB>>
JB>> I just had a crash while typing ^C to a program that has a kevent timer
JB>> running. The crash was:
JB>>
JB>> callout_stop
JB>> callout_reset
JB>> filt_timerexpire
JB>> softclock
JB>>
JB>> and callout_stop was accessing freed memory (0xdeadc0e2). After looking
JB>> some time at the filt_timerdetach, callout_stop and softclock I think the
JB>> following happened:

JB>This is becoming a common race unfortunately. :(  See the hacks in
JB>msleep() that use TDF_TIMEOUT in coooperationg with endtsleep() and
JB>the recent commit to the realtimer callout code for ways to work around
JB>this race.

In both places the thread just sleeps until the timeout has fired (when I
understand this correctly). While this is a possible workaround also for
kevent() (which only holds Giant as far as I can see) this is by no means
a solution for other callers. While looking through the tree I have found
several issues with timeouts which probably should be resolved or they
will hit us with SMP:

- the CALLOUT_ACTIVE flag is not maintained correctly. softclock() fails
to clear this flag after the timeout has fired. callout_stop() clears
CALLOUT_ACTIVE if it finds the callout not PENDING. This is wrong if
the callout is just about to be called (in this case it is !PENDING
but ACTIVE). This makes callout_active() useless.

- using callout_active() on a callout_handle. Callouts for
callout_handles (timeout(9)) are allocated from a common pool. So you may
just check the wrong callout if the callout has already fired and has been
reallocated to another user. Handles allocated with timeout(9) can only
be passed to untimeout(9)

I think we should try to make the callout interface usable without races
for the !MPSAFE case (see mail from Eric Jacobs). For the MPSAFE case the
caller should be responsible for this. And we should probably better
document the interface.

Going to think about this...

harti
-- 
harti brandt,
http://www.fokus.fraunhofer.de/research/cc/cats/employees/hartmut.brandt/private
brandt@fokus.fraunhofer.de, harti@freebsd.org