From owner-freebsd-bugs@FreeBSD.ORG Wed Mar 3 21:20:03 2010 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D72BA1065670 for ; Wed, 3 Mar 2010 21:20:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 999968FC18 for ; Wed, 3 Mar 2010 21:20:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id o23LK3Gh073963 for ; Wed, 3 Mar 2010 21:20:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id o23LK3t3073962; Wed, 3 Mar 2010 21:20:03 GMT (envelope-from gnats) Resent-Date: Wed, 3 Mar 2010 21:20:03 GMT Resent-Message-Id: <201003032120.o23LK3t3073962@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Alexander Sack Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 70AC21065670 for ; Wed, 3 Mar 2010 21:14:46 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 44CC78FC0A for ; Wed, 3 Mar 2010 21:14:46 +0000 (UTC) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o23LEjYK082344 for ; Wed, 3 Mar 2010 21:14:45 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id o23LEj6h082341; Wed, 3 Mar 2010 21:14:45 GMT (envelope-from nobody) Message-Id: <201003032114.o23LEj6h082341@www.freebsd.org> Date: Wed, 3 Mar 2010 21:14:45 GMT From: Alexander Sack To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/144453: bpf(4) can panic due to a race condition on descriptor destruction X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Mar 2010 21:20:04 -0000 >Number: 144453 >Category: kern >Synopsis: bpf(4) can panic due to a race condition on descriptor destruction >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Mar 03 21:20:03 UTC 2010 >Closed-Date: >Last-Modified: >Originator: Alexander Sack >Release: CURRENT, 7.2-amd64 >Organization: Niksun >Environment: SA5000PAL Intel board with 8GB of RAM, em(4) network interface card >Description: When an application polls on a particular bpf descriptor, a timeout is scheduled, bpf_timed_out() via callout_reset(). If a buffer is not available within the poll period, bpf_timed_out() is fired which will change the bpf_d state and wakeup any threads waiting for an event. When bpf_timed_out() is attempts to acquire the descriptor lock. Now if a process is in the middle of a poll/select and closes (gracefully or otherwise) when the bpf descriptor is closed, bpf_dtor() is called. This will acquire the descriptor lock and do callout_stop() if the bpf state is in BPF_WAITING (i.e. select was called and callout_reset has completed scheduling the callout). After calling callout_stop() it released the descriptor lock where now a race condition can occur. If callout_stop() can't stop bpf_timed_out() from firing (say it has already fired) then bpf_timed_out() is sitting waiting on the descriptor lock to continue. When bpf_dtor() drops the lock, bpf_timed_out() is allowed to continue. But bpf_dtor() is going to free the descriptor that bpf_timed_out() is currently changing. This can lead to panic. The patch attached addresses this situation by just doing a callout_active() and if so do a callout_drain() which will wait until bpf_timed_out() has finished. This allows bpf_dtor() to confidently free the descriptor during close operation. >How-To-Repeat: Loads of pollers on a descriptor with high load during a shutdown. >Fix: See patch attached. I tested this on my Intel machine issuing 200 tcpdump processes with zerocopy disabled and enabled (even though with zerocopy libpcap doens't poll on it) capturing 100% utilization gige traffic. No panic occured during shutdown. We also saw this using our own custom packet capture application which is where I discovered and fixed the problem. Patch attached with submission follows: ? bpf.patch Index: bpf.c =================================================================== RCS file: /home/ncvs/src/sys/net/bpf.c,v retrieving revision 1.219 diff -u -r1.219 bpf.c --- bpf.c 20 Feb 2010 00:19:21 -0000 1.219 +++ bpf.c 3 Mar 2010 21:04:48 -0000 @@ -614,6 +614,15 @@ mac_bpfdesc_destroy(d); #endif /* MAC */ knlist_destroy(&d->bd_sel.si_note); + /* + * If we could not stop the callout above, + * then when we release the descriptor lock, + * there is a race between when bpf_timed_out() + * finishes and descriptor tear down. Check + * for it and drain. + */ + if (callout_active(&d->bd_callout)) + callout_drain(&d->bd_callout); bpf_freed(d); free(d, M_BPF); } >Release-Note: >Audit-Trail: >Unformatted: