From owner-cvs-all@FreeBSD.ORG Tue Aug 12 17:23:59 2008 Return-Path: Delivered-To: cvs-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 977441065679; Tue, 12 Aug 2008 17:23:59 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au [211.29.132.191]) by mx1.freebsd.org (Postfix) with ESMTP id 3D2E98FC3A; Tue, 12 Aug 2008 17:23:59 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m7CHNme0007853 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 13 Aug 2008 03:23:57 +1000 Date: Wed, 13 Aug 2008 03:23:47 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: John Baldwin In-Reply-To: <200808121115.01483.jhb@freebsd.org> Message-ID: <20080813014505.H1310@besplex.bde.org> References: <200808081343.m78DhwYE068477@repoman.freebsd.org> <20080812014937.E21092@besplex.bde.org> <20080812231130.D760@besplex.bde.org> <200808121115.01483.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, Ed Schouten , cvs-all@FreeBSD.org, Bruce Evans Subject: Re: cvs commit: src/sys/dev/io iodev.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Aug 2008 17:23:59 -0000 On Tue, 12 Aug 2008, John Baldwin wrote: > On Tuesday 12 August 2008 10:07:43 am Bruce Evans wrote: >> I checked that bpf panics (even under UP) due to the obvious bugs in >> its d_close(): >> >> # Generate lots of network activity using something like: >> sysctl net.inet.icmp.icmplim=0; ping -fq localhost & >> >> # Race to panic eventually: >> while :; do tcpdump -i lo0 & sleep 0.001; revoke /dev/bpf0 >> >> Most or all device drivers have obvious bugs in their d_close(); bpf >> is just a bit easier to understand and more likely to cause a panic >> than most device drivers, since it is simple and frees resources. A >> panic is very likely when si_drv1 is freed, and si_drv1 is only locked >> accidentally. > > I think revoke(2) should EINVAL (or ENOTTY) for non-ttys. That might work, but revoke() is supposed to work for all types of files, and there may other paths to a forced d_close(). umount -f should be one (umount -f should call vflush() with flags FORCECLOSE; then vflush() calls vgonel(). revoke() is essentially vgone()). However, devfs seems to be already missing support for umount -f. > Of course bpf is > broken with revoke, but nobody uses revoke with bpf. What people do do in > the normal course of using bpf is lots of concurrent bpf accesses, and w/o > D_TRACKCLOSE, bpf devices don't get closed. Why not fix the actual bug? Your commit doesn't give enough details on the actual bug, so I will try to guess it: libpcap has to probe for for a free bpf unit, so it does lots of failing opens of bpfN (especially when N == 0) when bpfN is already in use. Failing opens break last closes with which they are concurrent, because the relevant reference count (si_usecount) is increased during the failing open (I think it is the vref() in _fgetvp() that does it). Then when the opens fail, si_usecount is decremented to 1, but devfs_close() is not called again because only 1 real last close is possible (I think -- at least without races with revoke()), so d_close() is never called twice for 1 real least close. Failing opens shouldn't take long, so it is surprising that the race is often lost. Apparently there is some synchronization. This bug probably also affects the probe for a free pty. The bug affects callin/callout serial devices in a non-race fashion. I committed a work-around for it in FreeBSD-1 and still use this, but never committed the work-around to FreeBSD-2+. The workaround is to not count non-returned d_open()s in count_dev() or vcount(). This is only a workaround because it might be correct to count non-returned d_open()s in some cases. One interesting case is the vcount(vp) > 1 check for revoke(). We hold 1 reference to the vp, and non-returned d_open()s (or even to-be-called d_open()s) may hold more, so vcount() may be > 1 for non-open devices. Then we eventually call devfs_close(), and devfs_close() may later be confused into calling d_close() on a non-open device (I think devfs_close() is confused enough iff there is precisely 1 non-returned d_open() later). This is bogus, but we should do something to cancel any partially complete d_open()s if there is >= 1 of them. I once thought that using D_TRACKCLOSE was the right fix for this in serial drivers. Now I know better :-). The bug affects all devices, and has a tricky interaction with revoke() and with open()s and closes() racing each other. vfs should handle it as an easy special case of handling the others. Bruce