FreeBSD Mail Archives

Date:      Mon, 06 Jul 2015 13:52:19 -0500
From:      Graham Allan <allan@physics.umn.edu>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Strange NFS problem implicating nfsuserd?
Message-ID:  <559ACE63.7060409@physics.umn.edu>
In-Reply-To: <1203156989.2786078.1435799642755.JavaMail.zimbra@uoguelph.ca>
References:  <55946FFE.8070402@physics.umn.edu> <972685551.2776991.1435795831472.JavaMail.zimbra@uoguelph.ca> <55948916.4080405@physics.umn.edu> <1203156989.2786078.1435799642755.JavaMail.zimbra@uoguelph.ca>

On 7/1/2015 8:14 PM, Rick Macklem wrote:
> Graham Allan wrote:
>>
>> I was always able to get a failure within 10-60 minutes or so, so having
>> the nfsuserd cache timeout at 600 minutes seems like it should eliminate
>> any intermittent id lookup issues.
>>
> I'll take another look at nfsuserd.c. Maybe it does something stupid like
> getting the length of the argument wrong (trailing blank or null or something
> like that, that doesn't show up when it is printed out). All I can think of
> is a subtle bug in nfsuserd.c when the argument is specified.
>
>> I guess I could try...
>> (1) rpcdebug on the linux client, though I'm not sure which flags to
>> enable to log idmapping issues.
>> (2) watch nfsuserd with truss and look for different behaviors.
>> (3) capture NFS traffic, examine with wireshark
>>
> I'd try #3 if I were you and see if the owner and owner_group names look
> right.
>
> I'll post if I find anything in nfsuserd.c, rick

Thanks for indulging me Rick. As you might have expected though, it's 
time for me to follow up with my mea culpa that my problem 
identification was entirely wrong. I knew none of it made sense, but 
perhaps it's fate that I need to post something embarrassingly wrong to 
find the true cause :-)

The reason things became stable when I altered the nfsuserd flags is 
that I also stopped our configuration management system on the affected 
systems so they wouldn't get reverted during testing. And of course that 
was doing something else which was responsible.

We've had a lot of workstation movement over the last few months, with 
machines being moved to new buildings and new ip addresses though the 
hostname remains the same. To try and address this, a periodic reload of 
mountd was added - the list of permitted hostnames are in /etc/netgroup, 
and it seems that mountd doesn't pick up on changed DNS values in the 
netgroup without a HUP.

I guess I never thought that reloading mountd could cause i/o 
disruption, but the man page does of course allude to this when 
discussing the "-S" flag. I've used lots of types of unix for a long 
time; I never thought I needed to read the mountd man page! For now I 
simply stopped doing any reloads, but I could probably start using that 
flag instead...

Graham

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?559ACE63.7060409>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation