From owner-freebsd-smp@FreeBSD.ORG Sat Mar 15 03:01:41 2008 Return-Path: Delivered-To: smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B366F106567A; Sat, 15 Mar 2008 03:01:41 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 98B1B8FC16; Sat, 15 Mar 2008 03:01:41 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id ECF0C1A4D82; Fri, 14 Mar 2008 19:41:14 -0700 (PDT) Date: Fri, 14 Mar 2008 19:41:14 -0700 From: Alfred Perlstein To: stable@freebsd.org Message-ID: <20080315024114.GD67856@elvis.mu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="F8dlzb82+Fcn6AgP" Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Cc: smp@freebsd.org Subject: timeout/untimeout race conditions/crash [patch] X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Mar 2008 03:01:41 -0000 --F8dlzb82+Fcn6AgP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline We think we tracked down a defect in timeout/untimeout in FreeBSD. We have reduced the problem to the following scenario: 2+ cpu system, one cpu is running softclock at the same time another thread is running on another cpu which makes use of timeout/untimeout. CPU 0 is running "softclock" CPU 1 is running "driver" with Giant held. softclock: mtx_lock_spin(&callout_lock) softclock: CACHES the callout structure's fields. softclock: sees that it's a CALLOUT_LOCAL_ALLOC softclock: executes this code: if (c->c_flags & CALLOUT_LOCAL_ALLOC) { c->c_func = NULL; c->c_flags = CALLOUT_LOCAL_ALLOC; SLIST_INSERT_HEAD(&callfree, c, c_links.sle); curr_callout = NULL; } else { NOTE: that c->c_func has been set to NULL and curr_callout is also NULL. softclock: mtx_unlock_spin(&callout_lock) driver: calls untimeout(), the following sequence happens: mtx_lock_spin(&callout_lock); if (handle.callout->c_func == ftn && handle.callout->c_arg == arg) callout_stop(handle.callout); mtx_unlock_spin(&callout_lock); NOTE: untimeout() sees that handle.callout->c_func is not set to the function so it does NOT call callout_stop(9)! driver: free's backing structure for c->c_arg. softclock: executes callout. softclock: likely crashes at this point due to access after free. I have a patch I'm trying out here, but I need feedback on it. The way the patch works is to treat CALLOUT_LOCAL_ALLOC (timeout/untimeout) callouts the same as ~CALLOUT_LOCAL_ALLOC allocs, and moves the freelist manipulation to the end of the callout dispatch. Some light testing seems to have the system work. We are doing some testing in-house to also make sure this works. Please provide feedback. See attached delta. -- - Alfred Perlstein --F8dlzb82+Fcn6AgP Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="kern_timeout.diff" Index: kern_timeout.c =================================================================== RCS file: /cvs/ncvs/src/sys/kern/kern_timeout.c,v retrieving revision 1.97.2.2 diff -u -r1.97.2.2 kern_timeout.c --- kern_timeout.c 26 Sep 2005 19:49:12 -0000 1.97.2.2 +++ kern_timeout.c 15 Mar 2008 02:28:48 -0000 @@ -241,17 +241,8 @@ c_arg = c->c_arg; c_mtx = c->c_mtx; c_flags = c->c_flags; - if (c->c_flags & CALLOUT_LOCAL_ALLOC) { - c->c_func = NULL; - c->c_flags = CALLOUT_LOCAL_ALLOC; - SLIST_INSERT_HEAD(&callfree, c, - c_links.sle); - curr_callout = NULL; - } else { - c->c_flags = - (c->c_flags & ~CALLOUT_PENDING); - curr_callout = c; - } + c->c_flags &= ~CALLOUT_PENDING; + curr_callout = c; curr_cancelled = 0; mtx_unlock_spin(&callout_lock); if (c_mtx != NULL) { @@ -310,6 +301,12 @@ mtx_unlock(c_mtx); mtx_lock_spin(&callout_lock); done_locked: + if (c->c_flags & CALLOUT_LOCAL_ALLOC) { + c->c_func = NULL; + c->c_flags = CALLOUT_LOCAL_ALLOC; + SLIST_INSERT_HEAD(&callfree, c, + c_links.sle); + } curr_callout = NULL; if (wakeup_needed) { /* --F8dlzb82+Fcn6AgP--