Date: Mon, 06 Jul 2015 13:52:19 -0500 From: Graham Allan <allan@physics.umn.edu> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: freebsd-fs@freebsd.org Subject: Re: Strange NFS problem implicating nfsuserd? Message-ID: <559ACE63.7060409@physics.umn.edu> In-Reply-To: <1203156989.2786078.1435799642755.JavaMail.zimbra@uoguelph.ca> References: <55946FFE.8070402@physics.umn.edu> <972685551.2776991.1435795831472.JavaMail.zimbra@uoguelph.ca> <55948916.4080405@physics.umn.edu> <1203156989.2786078.1435799642755.JavaMail.zimbra@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On 7/1/2015 8:14 PM, Rick Macklem wrote: > Graham Allan wrote: >> >> I was always able to get a failure within 10-60 minutes or so, so having >> the nfsuserd cache timeout at 600 minutes seems like it should eliminate >> any intermittent id lookup issues. >> > I'll take another look at nfsuserd.c. Maybe it does something stupid like > getting the length of the argument wrong (trailing blank or null or something > like that, that doesn't show up when it is printed out). All I can think of > is a subtle bug in nfsuserd.c when the argument is specified. > >> I guess I could try... >> (1) rpcdebug on the linux client, though I'm not sure which flags to >> enable to log idmapping issues. >> (2) watch nfsuserd with truss and look for different behaviors. >> (3) capture NFS traffic, examine with wireshark >> > I'd try #3 if I were you and see if the owner and owner_group names look > right. > > I'll post if I find anything in nfsuserd.c, rick Thanks for indulging me Rick. As you might have expected though, it's time for me to follow up with my mea culpa that my problem identification was entirely wrong. I knew none of it made sense, but perhaps it's fate that I need to post something embarrassingly wrong to find the true cause :-) The reason things became stable when I altered the nfsuserd flags is that I also stopped our configuration management system on the affected systems so they wouldn't get reverted during testing. And of course that was doing something else which was responsible. We've had a lot of workstation movement over the last few months, with machines being moved to new buildings and new ip addresses though the hostname remains the same. To try and address this, a periodic reload of mountd was added - the list of permitted hostnames are in /etc/netgroup, and it seems that mountd doesn't pick up on changed DNS values in the netgroup without a HUP. I guess I never thought that reloading mountd could cause i/o disruption, but the man page does of course allude to this when discussing the "-S" flag. I've used lots of types of unix for a long time; I never thought I needed to read the mountd man page! For now I simply stopped doing any reloads, but I could probably start using that flag instead... Graham
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?559ACE63.7060409>