From owner-freebsd-stable@FreeBSD.ORG Wed Dec 1 00:54:10 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E9E4106566B; Wed, 1 Dec 2010 00:54:10 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mail.egr.msu.edu (dauterive.egr.msu.edu [35.9.37.168]) by mx1.freebsd.org (Postfix) with ESMTP id DCD568FC16; Wed, 1 Dec 2010 00:54:09 +0000 (UTC) Received: from dauterive (localhost [127.0.0.1]) by mail.egr.msu.edu (Postfix) with ESMTP id D8BCF8F5D6; Tue, 30 Nov 2010 19:34:46 -0500 (EST) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mail.egr.msu.edu ([127.0.0.1]) by dauterive (dauterive.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id c9f+jpm5PFyv; Tue, 30 Nov 2010 19:34:46 -0500 (EST) Received: from [35.9.44.65] (daemon.egr.msu.edu [35.9.44.65]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: mcdouga9) by mail.egr.msu.edu (Postfix) with ESMTPSA id B11548F5D0; Tue, 30 Nov 2010 19:34:46 -0500 (EST) Message-ID: <4CF59826.5090305@egr.msu.edu> Date: Tue, 30 Nov 2010 19:34:46 -0500 From: Adam McDougall User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.12) Gecko/20101104 Thunderbird/3.1.6 MIME-Version: 1.0 To: John Baldwin References: <4CF44E2E.4070700@egr.msu.edu> <201011300933.18505.jhb@freebsd.org> In-Reply-To: <201011300933.18505.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: Stale NFS file handles on 8.x amd64 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Dec 2010 00:54:10 -0000 On 11/30/10 09:33, John Baldwin wrote: > On Monday, November 29, 2010 8:06:54 pm Adam McDougall wrote: >> I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare >> minimum of NFS problems, but it got worse with 8.x. I have 2-4 servers >> (usually just 2) accessing mail on a Netapp over NFSv3 via imapd. >> delivery is via procmail which doesn't touch the dovecot metadata and >> webmail uses imapd. Client connections to imapd go to random servers >> and I don't yet have solid means to keep certain users on certain >> servers. I upgraded some of the servers to 8.x and dovecot 1.2 and ran >> into Stale NFS file handles causing index/uidlist corruption causing >> inboxes to appear as empty when they were not. In some situations their >> corrupt index had to be deleted manually. I first suspected dovecot 1.2 >> since it was upgraded at the same time but I downgraded to 1.1 and its >> doing the same thing. I don't really have a wealth of details to go on >> yet and I usually stay quiet until I do, and half the time it is >> difficult to reproduce myself so I've had to put it in production to get >> a feel for progress. This only happens a dozen or so times per weekday >> but I feel the need to start taking bigger steps. I'll probably do what >> I can to get IMAP back on a stable base (7.x?) and also try to debug 8.x >> on the remaining servers. A binary search is within possibility if I >> can reproduce the symptoms often enough even if I have to put a test >> server in production for a few hours. > > There were some changes to allow more concurrency in the NFS client in 8 (and > 7.2+) that caused ESTALE errors to occur on open(2) more frequently. You can > try setting 'vfs.lookup_shared=0' to disable the extra concurrency (but at a > performance cost) as a workaround. The most recent 7.x and 8.x have some > changes to open(2) to minimize ESTALE errors that I think get it back to the > same level as when lookup_shared is set to 0. > I tried vfs.lookup_shared=0 on two of the three already with no help (forgot what it was called or I would have mentioned it), and I also tried vfs.nfs.prime_access_cache=1 on a guess on all three but that didn't help either. I'll go through the other suggestions and see where it gets me. Thanks all for the input.