From owner-freebsd-stable@FreeBSD.ORG Wed Dec 1 05:36:05 2010 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E329D10656A5 for ; Wed, 1 Dec 2010 05:36:05 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mail.egr.msu.edu (dauterive.egr.msu.edu [35.9.37.168]) by mx1.freebsd.org (Postfix) with ESMTP id B2DBE8FC1D for ; Wed, 1 Dec 2010 05:36:05 +0000 (UTC) Received: from dauterive (localhost [127.0.0.1]) by mail.egr.msu.edu (Postfix) with ESMTP id 913B77F909; Wed, 1 Dec 2010 00:36:04 -0500 (EST) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mail.egr.msu.edu ([127.0.0.1]) by dauterive (dauterive.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YTwA5oHDDXDF; Wed, 1 Dec 2010 00:36:04 -0500 (EST) Received: from [35.9.44.65] (daemon.egr.msu.edu [35.9.44.65]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: mcdouga9) by mail.egr.msu.edu (Postfix) with ESMTPSA id 5C7737F902; Wed, 1 Dec 2010 00:36:04 -0500 (EST) Message-ID: <4CF5DEC4.3070901@egr.msu.edu> Date: Wed, 01 Dec 2010 00:36:04 -0500 From: Adam McDougall User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.12) Gecko/20101104 Thunderbird/3.1.6 MIME-Version: 1.0 To: Rick Macklem References: <1061723738.903502.1291123996095.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <1061723738.903502.1291123996095.JavaMail.root@erie.cs.uoguelph.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: stable@freebsd.org Subject: Re: Stale NFS file handles on 8.x amd64 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Dec 2010 05:36:06 -0000 On 11/30/10 08:33, Rick Macklem wrote: >> I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare >> minimum of NFS problems, but it got worse with 8.x. I have 2-4 servers >> (usually just 2) accessing mail on a Netapp over NFSv3 via imapd. >> delivery is via procmail which doesn't touch the dovecot metadata and >> webmail uses imapd. Client connections to imapd go to random servers >> and I don't yet have solid means to keep certain users on certain >> servers. I upgraded some of the servers to 8.x and dovecot 1.2 and ran >> into Stale NFS file handles causing index/uidlist corruption causing >> inboxes to appear as empty when they were not. In some situations >> their >> corrupt index had to be deleted manually. I first suspected dovecot >> 1.2 >> since it was upgraded at the same time but I downgraded to 1.1 and its >> doing the same thing. I don't really have a wealth of details to go on >> yet and I usually stay quiet until I do, and half the time it is >> difficult to reproduce myself so I've had to put it in production to >> get >> a feel for progress. This only happens a dozen or so times per weekday >> but I feel the need to start taking bigger steps. I'll probably do >> what >> I can to get IMAP back on a stable base (7.x?) and also try to debug >> 8.x >> on the remaining servers. A binary search is within possibility if I >> can reproduce the symptoms often enough even if I have to put a test >> server in production for a few hours. >> >> Any tips on where we could start looking, or alterations I could try >> making such as sysctls to return to older behavior? It might be worth >> noting that I've seen a considerable increase in traffic from my mail >> servers since the 8.x upgrade timeframe, on the order of 5-10x as much >> traffic to the NFS server. dovecot tries its hardest to flush out the >> access cache when needed and it was working well enough since about >> 1.0.16 (years ago). It seems like FreeBSD is what regressed in this >> scenario. dovecot 2.x is going in a different direction from my >> situation and I'm not ready to start testing that immediately if I can >> avoid it as it will involve some restructuring. >> >> Thanks for any input. For now the following errors are about all I >> have >> to go on: >> >> Nov 29 11:07:54 server1 dovecot: IMAP(user1): >> o_stream_send(/home/user1/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) >> failed: Stale NFS file handle >> Nov 29 13:19:51 server1 dovecot: IMAP(user1): >> o_stream_send(/home/user1/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) >> failed: Stale NFS file handle >> Nov 29 14:35:41 server1 dovecot: IMAP(user2): >> o_stream_send(/home/user2/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) >> failed: Stale NFS file handle >> Nov 29 15:07:05 server1 dovecot: IMAP(user3): read(mail, uid=128990) >> failed: Stale NFS file handle >> >> Nov 29 11:57:22 server2 dovecot: IMAP(user4): >> open(/egr/mail/shared/vprgs/dovecot-acl-list) failed: Stale NFS file >> handle >> Nov 29 14:04:22 server2 dovecot: IMAP(user5): >> o_stream_send(/home/user5/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) >> failed: Stale NFS file handle >> Nov 29 14:27:21 server2 dovecot: IMAP(user6): >> o_stream_send(/home/user6/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) >> failed: Stale NFS file handle >> Nov 29 15:44:38 server2 dovecot: IMAP(user7): >> open(/egr/mail/shared/decs/dovecot-acl-list) failed: Stale NFS file >> handle >> Nov 29 19:04:54 server2 dovecot: IMAP(user8): >> o_stream_send(/home/user8/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) >> failed: Stale NFS file handle >> >> Nov 29 06:32:11 server3 dovecot: IMAP(user9): >> open(/egr/mail/shared/cmsc/dovecot-acl-list) failed: Stale NFS file >> handle >> Nov 29 10:03:58 server3 dovecot: IMAP(user10): >> o_stream_send(/home/user10/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) >> failed: Stale NFS file handle >> > Others have made good suggestions. One more you could try is disabling the negative > name caching by setting the option "negnametimeo=0". The addition of negative name > caching is also in FreeBSD7, but it is a fairly recent change, so your FreeBSD7 boxes > may not have had it. I also think trying the "dot-locking" and running without statd > and lockd (you can mount with the "nolock" option) would be worth trying. And, of course, > disabling attribute caching is mentioned on the web page others cited. > > Good luck with it, rick > ps: Unfortunately the NFS protocol cannot support for POSIX file system semantics, so > some apps can never run correctly on NFS mounted volumes. NFSv4 comes closer, but > it still can't provide full POSIX semantics. > I'll give negnametimeo=0 a try on one server starting tonight, I'll be busy tomorrow and don't want to risk making anything potentially worse than it is yet. I can't figure out how to disable the attr cache in FreeBSD. Neither suggestions seem to be valid, and years ago when I looked into it I got the impression that you can't, but I'd love to be proven wrong. I'll try dotlock when I can. Would disabling statd and lockd be the same as using nolock on all mounts? The vacation binary is the only thing I can think of that might use it, not sure how well it would like missing it which is how I discovered I needed it in the first place. Also, if disabling lockd shows an improvement, could it lead to further investigation or is it just a workaround? Just trying to understand the possibilities better. I know ESTALE means the file vanished but for the files I had an error on, it is expected that multiple systems are going to spontaneously replace the file. Thanks.