Date: Tue, 30 Nov 2010 19:34:46 -0500 From: Adam McDougall <mcdouga9@egr.msu.edu> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: Stale NFS file handles on 8.x amd64 Message-ID: <4CF59826.5090305@egr.msu.edu> In-Reply-To: <201011300933.18505.jhb@freebsd.org> References: <4CF44E2E.4070700@egr.msu.edu> <201011300933.18505.jhb@freebsd.org>
index | next in thread | previous in thread | raw e-mail
On 11/30/10 09:33, John Baldwin wrote: > On Monday, November 29, 2010 8:06:54 pm Adam McDougall wrote: >> I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare >> minimum of NFS problems, but it got worse with 8.x. I have 2-4 servers >> (usually just 2) accessing mail on a Netapp over NFSv3 via imapd. >> delivery is via procmail which doesn't touch the dovecot metadata and >> webmail uses imapd. Client connections to imapd go to random servers >> and I don't yet have solid means to keep certain users on certain >> servers. I upgraded some of the servers to 8.x and dovecot 1.2 and ran >> into Stale NFS file handles causing index/uidlist corruption causing >> inboxes to appear as empty when they were not. In some situations their >> corrupt index had to be deleted manually. I first suspected dovecot 1.2 >> since it was upgraded at the same time but I downgraded to 1.1 and its >> doing the same thing. I don't really have a wealth of details to go on >> yet and I usually stay quiet until I do, and half the time it is >> difficult to reproduce myself so I've had to put it in production to get >> a feel for progress. This only happens a dozen or so times per weekday >> but I feel the need to start taking bigger steps. I'll probably do what >> I can to get IMAP back on a stable base (7.x?) and also try to debug 8.x >> on the remaining servers. A binary search is within possibility if I >> can reproduce the symptoms often enough even if I have to put a test >> server in production for a few hours. > > There were some changes to allow more concurrency in the NFS client in 8 (and > 7.2+) that caused ESTALE errors to occur on open(2) more frequently. You can > try setting 'vfs.lookup_shared=0' to disable the extra concurrency (but at a > performance cost) as a workaround. The most recent 7.x and 8.x have some > changes to open(2) to minimize ESTALE errors that I think get it back to the > same level as when lookup_shared is set to 0. > I tried vfs.lookup_shared=0 on two of the three already with no help (forgot what it was called or I would have mentioned it), and I also tried vfs.nfs.prime_access_cache=1 on a guess on all three but that didn't help either. I'll go through the other suggestions and see where it gets me. Thanks all for the input.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CF59826.5090305>
