From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 10 01:41:16 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8145737B401; Thu, 10 Jul 2003 01:41:16 -0700 (PDT) Received: from mailhub.fokus.fraunhofer.de (mailhub.fokus.fraunhofer.de [193.174.154.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id AEEB743FBD; Thu, 10 Jul 2003 01:41:14 -0700 (PDT) (envelope-from brandt@fokus.fraunhofer.de) Received: from beagle (beagle [193.175.132.100])h6A8fDQ21635; Thu, 10 Jul 2003 10:41:13 +0200 (MEST) Date: Thu, 10 Jul 2003 10:41:13 +0200 (CEST) From: Harti Brandt To: John Baldwin In-Reply-To: Message-ID: <20030710103146.R30571@beagle.fokus.fraunhofer.de> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: hackers@FreeBSD.org Subject: RE: Race in kevent X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: harti@FreeBSD.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jul 2003 08:41:16 -0000 On Wed, 9 Jul 2003, John Baldwin wrote: JB>On 09-Jul-2003 Harti Brandt wrote: JB>> JB>> Hi, JB>> JB>> I just had a crash while typing ^C to a program that has a kevent timer JB>> running. The crash was: JB>> JB>> callout_stop JB>> callout_reset JB>> filt_timerexpire JB>> softclock JB>> JB>> and callout_stop was accessing freed memory (0xdeadc0e2). After looking JB>> some time at the filt_timerdetach, callout_stop and softclock I think the JB>> following happened: JB>This is becoming a common race unfortunately. :( See the hacks in JB>msleep() that use TDF_TIMEOUT in coooperationg with endtsleep() and JB>the recent commit to the realtimer callout code for ways to work around JB>this race. In both places the thread just sleeps until the timeout has fired (when I understand this correctly). While this is a possible workaround also for kevent() (which only holds Giant as far as I can see) this is by no means a solution for other callers. While looking through the tree I have found several issues with timeouts which probably should be resolved or they will hit us with SMP: - the CALLOUT_ACTIVE flag is not maintained correctly. softclock() fails to clear this flag after the timeout has fired. callout_stop() clears CALLOUT_ACTIVE if it finds the callout not PENDING. This is wrong if the callout is just about to be called (in this case it is !PENDING but ACTIVE). This makes callout_active() useless. - using callout_active() on a callout_handle. Callouts for callout_handles (timeout(9)) are allocated from a common pool. So you may just check the wrong callout if the callout has already fired and has been reallocated to another user. Handles allocated with timeout(9) can only be passed to untimeout(9) I think we should try to make the callout interface usable without races for the !MPSAFE case (see mail from Eric Jacobs). For the MPSAFE case the caller should be responsible for this. And we should probably better document the interface. Going to think about this... harti -- harti brandt, http://www.fokus.fraunhofer.de/research/cc/cats/employees/hartmut.brandt/private brandt@fokus.fraunhofer.de, harti@freebsd.org