Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Nov 2010 19:34:46 -0500
From:      Adam McDougall <mcdouga9@egr.msu.edu>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Stale NFS file handles on 8.x amd64
Message-ID:  <4CF59826.5090305@egr.msu.edu>
In-Reply-To: <201011300933.18505.jhb@freebsd.org>
References:  <4CF44E2E.4070700@egr.msu.edu> <201011300933.18505.jhb@freebsd.org>

index | next in thread | previous in thread | raw e-mail

On 11/30/10 09:33, John Baldwin wrote:
> On Monday, November 29, 2010 8:06:54 pm Adam McDougall wrote:
>> I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare
>> minimum of NFS problems, but it got worse with 8.x.  I have 2-4 servers
>> (usually just 2) accessing mail on a Netapp over NFSv3 via imapd.
>> delivery is via procmail which doesn't touch the dovecot metadata and
>> webmail uses imapd.  Client connections to imapd go to random servers
>> and I don't yet have solid means to keep certain users on certain
>> servers.  I upgraded some of the servers to 8.x and dovecot 1.2 and ran
>> into Stale NFS file handles causing index/uidlist corruption causing
>> inboxes to appear as empty when they were not.  In some situations their
>> corrupt index had to be deleted manually.  I first suspected dovecot 1.2
>> since it was upgraded at the same time but I downgraded to 1.1 and its
>> doing the same thing.  I don't really have a wealth of details to go on
>> yet and I usually stay quiet until I do, and half the time it is
>> difficult to reproduce myself so I've had to put it in production to get
>> a feel for progress.  This only happens a dozen or so times per weekday
>> but I feel the need to start taking bigger steps.  I'll probably do what
>> I can to get IMAP back on a stable base (7.x?) and also try to debug 8.x
>> on the remaining servers.  A binary search is within possibility if I
>> can reproduce the symptoms often enough even if I have to put a test
>> server in production for a few hours.
>
> There were some changes to allow more concurrency in the NFS client in 8 (and
> 7.2+) that caused ESTALE errors to occur on open(2) more frequently.  You can
> try setting 'vfs.lookup_shared=0' to disable the extra concurrency (but at a
> performance cost) as a workaround.  The most recent 7.x and 8.x have some
> changes to open(2) to minimize ESTALE errors that I think get it back to the
> same level as when lookup_shared is set to 0.
>

I tried vfs.lookup_shared=0 on two of the three already with no help 
(forgot what it was called or I would have mentioned it), and I also 
tried vfs.nfs.prime_access_cache=1 on a guess on all three but that 
didn't help either.  I'll go through the other suggestions and see where 
it gets me.  Thanks all for the input.


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4CF59826.5090305>