From owner-freebsd-stable@FreeBSD.ORG Tue Nov 30 13:33:17 2010 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 49AC4106564A for ; Tue, 30 Nov 2010 13:33:17 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 05EC98FC13 for ; Tue, 30 Nov 2010 13:33:16 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAJ+L9EyDaFvO/2dsb2JhbACDUKAxshKRG4EhgV6BVXMEhFyGBosb X-IronPort-AV: E=Sophos;i="4.59,280,1288584000"; d="scan'208";a="100741151" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 30 Nov 2010 08:33:16 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 26795B3F30; Tue, 30 Nov 2010 08:33:16 -0500 (EST) Date: Tue, 30 Nov 2010 08:33:16 -0500 (EST) From: Rick Macklem To: Adam McDougall Message-ID: <1061723738.903502.1291123996095.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4CF44E2E.4070700@egr.msu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [12.16.49.138] X-Mailer: Zimbra 6.0.7_GA_2476.RHEL4 (ZimbraWebClient - IE8 (Win)/6.0.7_GA_2473.RHEL4_64) Cc: stable@freebsd.org Subject: Re: Stale NFS file handles on 8.x amd64 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Nov 2010 13:33:17 -0000 > I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare > minimum of NFS problems, but it got worse with 8.x. I have 2-4 servers > (usually just 2) accessing mail on a Netapp over NFSv3 via imapd. > delivery is via procmail which doesn't touch the dovecot metadata and > webmail uses imapd. Client connections to imapd go to random servers > and I don't yet have solid means to keep certain users on certain > servers. I upgraded some of the servers to 8.x and dovecot 1.2 and ran > into Stale NFS file handles causing index/uidlist corruption causing > inboxes to appear as empty when they were not. In some situations > their > corrupt index had to be deleted manually. I first suspected dovecot > 1.2 > since it was upgraded at the same time but I downgraded to 1.1 and its > doing the same thing. I don't really have a wealth of details to go on > yet and I usually stay quiet until I do, and half the time it is > difficult to reproduce myself so I've had to put it in production to > get > a feel for progress. This only happens a dozen or so times per weekday > but I feel the need to start taking bigger steps. I'll probably do > what > I can to get IMAP back on a stable base (7.x?) and also try to debug > 8.x > on the remaining servers. A binary search is within possibility if I > can reproduce the symptoms often enough even if I have to put a test > server in production for a few hours. > > Any tips on where we could start looking, or alterations I could try > making such as sysctls to return to older behavior? It might be worth > noting that I've seen a considerable increase in traffic from my mail > servers since the 8.x upgrade timeframe, on the order of 5-10x as much > traffic to the NFS server. dovecot tries its hardest to flush out the > access cache when needed and it was working well enough since about > 1.0.16 (years ago). It seems like FreeBSD is what regressed in this > scenario. dovecot 2.x is going in a different direction from my > situation and I'm not ready to start testing that immediately if I can > avoid it as it will involve some restructuring. > > Thanks for any input. For now the following errors are about all I > have > to go on: > > Nov 29 11:07:54 server1 dovecot: IMAP(user1): > o_stream_send(/home/user1/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) > failed: Stale NFS file handle > Nov 29 13:19:51 server1 dovecot: IMAP(user1): > o_stream_send(/home/user1/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) > failed: Stale NFS file handle > Nov 29 14:35:41 server1 dovecot: IMAP(user2): > o_stream_send(/home/user2/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) > failed: Stale NFS file handle > Nov 29 15:07:05 server1 dovecot: IMAP(user3): read(mail, uid=128990) > failed: Stale NFS file handle > > Nov 29 11:57:22 server2 dovecot: IMAP(user4): > open(/egr/mail/shared/vprgs/dovecot-acl-list) failed: Stale NFS file > handle > Nov 29 14:04:22 server2 dovecot: IMAP(user5): > o_stream_send(/home/user5/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) > failed: Stale NFS file handle > Nov 29 14:27:21 server2 dovecot: IMAP(user6): > o_stream_send(/home/user6/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) > failed: Stale NFS file handle > Nov 29 15:44:38 server2 dovecot: IMAP(user7): > open(/egr/mail/shared/decs/dovecot-acl-list) failed: Stale NFS file > handle > Nov 29 19:04:54 server2 dovecot: IMAP(user8): > o_stream_send(/home/user8/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) > failed: Stale NFS file handle > > Nov 29 06:32:11 server3 dovecot: IMAP(user9): > open(/egr/mail/shared/cmsc/dovecot-acl-list) failed: Stale NFS file > handle > Nov 29 10:03:58 server3 dovecot: IMAP(user10): > o_stream_send(/home/user10/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist) > failed: Stale NFS file handle > Others have made good suggestions. One more you could try is disabling the negative name caching by setting the option "negnametimeo=0". The addition of negative name caching is also in FreeBSD7, but it is a fairly recent change, so your FreeBSD7 boxes may not have had it. I also think trying the "dot-locking" and running without statd and lockd (you can mount with the "nolock" option) would be worth trying. And, of course, disabling attribute caching is mentioned on the web page others cited. Good luck with it, rick ps: Unfortunately the NFS protocol cannot support for POSIX file system semantics, so some apps can never run correctly on NFS mounted volumes. NFSv4 comes closer, but it still can't provide full POSIX semantics.