From owner-freebsd-stable@FreeBSD.ORG Tue Nov 30 14:43:50 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8868710657A7 for ; Tue, 30 Nov 2010 14:43:50 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5C6698FC16 for ; Tue, 30 Nov 2010 14:43:50 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id F16B046B29; Tue, 30 Nov 2010 09:43:49 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 5CB7A8A009; Tue, 30 Nov 2010 09:43:49 -0500 (EST) From: John Baldwin To: freebsd-stable@freebsd.org Date: Tue, 30 Nov 2010 09:33:18 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <4CF44E2E.4070700@egr.msu.edu> In-Reply-To: <4CF44E2E.4070700@egr.msu.edu> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201011300933.18505.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 30 Nov 2010 09:43:49 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: Adam McDougall Subject: Re: Stale NFS file handles on 8.x amd64 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Nov 2010 14:43:50 -0000 On Monday, November 29, 2010 8:06:54 pm Adam McDougall wrote: > I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare > minimum of NFS problems, but it got worse with 8.x. I have 2-4 servers > (usually just 2) accessing mail on a Netapp over NFSv3 via imapd. > delivery is via procmail which doesn't touch the dovecot metadata and > webmail uses imapd. Client connections to imapd go to random servers > and I don't yet have solid means to keep certain users on certain > servers. I upgraded some of the servers to 8.x and dovecot 1.2 and ran > into Stale NFS file handles causing index/uidlist corruption causing > inboxes to appear as empty when they were not. In some situations their > corrupt index had to be deleted manually. I first suspected dovecot 1.2 > since it was upgraded at the same time but I downgraded to 1.1 and its > doing the same thing. I don't really have a wealth of details to go on > yet and I usually stay quiet until I do, and half the time it is > difficult to reproduce myself so I've had to put it in production to get > a feel for progress. This only happens a dozen or so times per weekday > but I feel the need to start taking bigger steps. I'll probably do what > I can to get IMAP back on a stable base (7.x?) and also try to debug 8.x > on the remaining servers. A binary search is within possibility if I > can reproduce the symptoms often enough even if I have to put a test > server in production for a few hours. There were some changes to allow more concurrency in the NFS client in 8 (and 7.2+) that caused ESTALE errors to occur on open(2) more frequently. You can try setting 'vfs.lookup_shared=0' to disable the extra concurrency (but at a performance cost) as a workaround. The most recent 7.x and 8.x have some changes to open(2) to minimize ESTALE errors that I think get it back to the same level as when lookup_shared is set to 0. -- John Baldwin