Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Nov 2010 19:34:46 -0500
From:      Adam McDougall <mcdouga9@egr.msu.edu>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Stale NFS file handles on 8.x amd64
Message-ID:  <4CF59826.5090305@egr.msu.edu>
In-Reply-To: <201011300933.18505.jhb@freebsd.org>
References:  <4CF44E2E.4070700@egr.msu.edu> <201011300933.18505.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On 11/30/10 09:33, John Baldwin wrote:
> On Monday, November 29, 2010 8:06:54 pm Adam McDougall wrote:
>> I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare
>> minimum of NFS problems, but it got worse with 8.x.  I have 2-4 servers
>> (usually just 2) accessing mail on a Netapp over NFSv3 via imapd.
>> delivery is via procmail which doesn't touch the dovecot metadata and
>> webmail uses imapd.  Client connections to imapd go to random servers
>> and I don't yet have solid means to keep certain users on certain
>> servers.  I upgraded some of the servers to 8.x and dovecot 1.2 and ran
>> into Stale NFS file handles causing index/uidlist corruption causing
>> inboxes to appear as empty when they were not.  In some situations their
>> corrupt index had to be deleted manually.  I first suspected dovecot 1.2
>> since it was upgraded at the same time but I downgraded to 1.1 and its
>> doing the same thing.  I don't really have a wealth of details to go on
>> yet and I usually stay quiet until I do, and half the time it is
>> difficult to reproduce myself so I've had to put it in production to get
>> a feel for progress.  This only happens a dozen or so times per weekday
>> but I feel the need to start taking bigger steps.  I'll probably do what
>> I can to get IMAP back on a stable base (7.x?) and also try to debug 8.x
>> on the remaining servers.  A binary search is within possibility if I
>> can reproduce the symptoms often enough even if I have to put a test
>> server in production for a few hours.
>
> There were some changes to allow more concurrency in the NFS client in 8 (and
> 7.2+) that caused ESTALE errors to occur on open(2) more frequently.  You can
> try setting 'vfs.lookup_shared=0' to disable the extra concurrency (but at a
> performance cost) as a workaround.  The most recent 7.x and 8.x have some
> changes to open(2) to minimize ESTALE errors that I think get it back to the
> same level as when lookup_shared is set to 0.
>

I tried vfs.lookup_shared=0 on two of the three already with no help 
(forgot what it was called or I would have mentioned it), and I also 
tried vfs.nfs.prime_access_cache=1 on a guess on all three but that 
didn't help either.  I'll go through the other suggestions and see where 
it gets me.  Thanks all for the input.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CF59826.5090305>