From owner-freebsd-hackers@FreeBSD.ORG Wed Dec 3 12:59:05 2008 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6F4481065675 for ; Wed, 3 Dec 2008 12:59:05 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.16.84]) by mx1.freebsd.org (Postfix) with ESMTP id 24CD58FC0A for ; Wed, 3 Dec 2008 12:59:04 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from pampa.cs.huji.ac.il ([132.65.80.32]) by kabab.cs.huji.ac.il with esmtp id 1L7rJn-0004KA-OQ; Wed, 03 Dec 2008 14:59:03 +0200 X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.2 To: David Wolfskill In-reply-to: <20081203124507.GE96383@bunrab.catwhisker.org> References: <20081203001538.GC96383@bunrab.catwhisker.org> <20081203124507.GE96383@bunrab.catwhisker.org> Comments: In-reply-to David Wolfskill message dated "Wed, 03 Dec 2008 04:45:07 -0800." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 03 Dec 2008 14:59:03 +0200 From: Danny Braniss Message-ID: Cc: hackers@freebsd.org Subject: Re: NFS (& amd?) dysfunction descending a hierarchy X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Dec 2008 12:59:05 -0000 > > --vmttodhTwj0NAgWp > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > Content-Transfer-Encoding: quoted-printable > > On Wed, Dec 03, 2008 at 02:20:32PM +0200, Danny Braniss wrote: > > ... > > i'll try to check it here soon, but in the meantime, could you try the sa= > me > > but mounting directly, not via amd, to remove one item from the equation? > > (I don't know how much amd is involved here, but if you are running on a > > 64bit host, amd could be swapped out, in which case it tends to realy scr= > ew > > things up, which is not your case, but ...) > > Sorry; I should have mentioned that the NFS client was running > RELENG_7_1 as of Monday morning, i386 arch. The amd.conf file specifies > "plock" for amd(8). > > Note that merely telling amd(8) to kick the interval of attempted > unmounts from 2 minutes to 12 hours appears to avoid the observed > symptoms, so I'm fairly confident that bypassing amd(8) altogether would > do so as well. > > In looking at the output from ktrace against amd(8), I recall having > seen that shortly before an observed failure, the (master) amd > process forks a child to attempt the unmount; the child issues an > unmount, the return for which is EBUSY (IIRC -- I'm not in a good > position to check just at the moment), so the child terminates with an > "interrupted system call". > > I'd have thought that since the attempted unmount failed, it wouldn't > make any difference, but it's right around that point that rm(1) is told > that a directory entry it found earlier doesn't exist, which rather > snowballs into the previously-described symptoms. so it does point to amd - or something inocent it does - which triggers the error. btw, there are some patches (5 I think), that try to fix some of amd problems. I've installed them, and things are quiet/ok -most of the time- but I get a glitch once in a while. would love to iron them out though. cheers, danny