Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 31 Mar 2012 15:28:54 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Josh Beard <josh@signalboxes.net>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: NFS:  rpc.statd/lockd becomes unresponsive
Message-ID:  <2115805571.2042848.1333222134633.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <4F764712.2010407@signalboxes.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Josh Beard wrote:
> Originally sent to freebsd-net, but I realized this is probably a more
> appropriate list. Sorry!
> 
> 
> Hello,
> 
> We've recently setup a FreeBSD 9.0-RELEASE (x64) system to test as an
> NFS server for "live" network homes for Mac clients (mostly 10.5 and
> 10.6 clients).
> 
> We're a public school district and normally have around 150-200 users
> logged in at a time with network homes. Currently, we're using
> netatalk
> (AFP) on a Linux box, after migrating from an aging Mac OS X server.
> Unfortunately, netatalk has some serious performance issues under the
> load we're putting it under and we'd like to migrate to NFS.
> 
> We've tried several Linux distributions and various kernels and we're
> now testing FreeBSD (and tested FreeNAS) with similar setups.
> Unfortunately, they all suffer the same issue.
> 
> As a test, I have a series of scripts to simulate user activity on the
> clients (e.g. opening Word, opening a browser, doing some read/writes
> with dd, etc). After a while, NFS on the server runs into an issue
> where (what I think happens) rpc.statd can't talk to rpc.lockd. Being
> Mac clients, they all get a rather ugly dialog box stating that their
> connection to the server has been lost.
> 
> It's worth mentioning that this server is a KVM 'guest' on a Linux
> server. I'm aware of some I/O issues there, but I don't have a decent
> piece of hardware to really test this on. I allocated 4 CPUs to it and
> 10GB of RAM. I've tested with the virtio net drivers and without.
> Considering I've seen the same symptoms on around 6 Linux
> distributions,
> with various kernels, FreeNAS, and FreeBSD, I wouldn't be surprised to
> get the same results if I weren't virtualized.
> 
> I haven't really done any tuning on the FreeBSD server, it's fairly
> vanilla.
> 
> We have around ~2600 machines throughout our campus, with limited
> remote
> management capabilities (that's on the big agenda to tackle), so
> changing NFS mount options there would be rather difficult. These are
> LDAP accounts with the NFS mounts in LDAP as well, for what it's
> worth.
> The clients mount it pretty vanilla (output of 'mount' on client):
> freenas.dsdk12.schoollocal:/mnt/homes on
> /net/freenas.dsdk12.schoollocal/mnt/homes (nfs, nodev, nosuid,
> automounted, nobrowse)
> 
Well, if you look at the mailing list archives, you'll figure out
what I think about the NLM and NSM (ie. avoid using them if at all
possible).

If multiple clients do not do locking on the same files concurrently
in such a way as they need to see each others file locks (home directories
are typically accessed by one client at any time), then have the client
mounts use "nolockd" (I think it's "nolock" on Linux and not sure what
it's called on Mac OS X), which makes the file locking happen locally
within the client.
--> Then you can get rid of rpc.lockd and rpc.statd.

The only other way to avoid rpc.lockd, rpc.statd is to do all the
mounts as NFSv4. The Linux NFSv4 client is now stable from what I've
seen and Lion shipped with a client, although I haven't had a chance
to test it. (If your Macs are Snow Leopard, this isn't an option.)

I can't explain why statd wouldn't be able to talk to lockd, but will
note that, for these to work, they must have a reliable network
connection without any firewall stuff, because rpc.statd uses IP
broadcast and they typically choose port#s dynamically and use
rpcbind to find out what they are.

Good luck with it, rick

> On the server, my /etc/exports looks like this:
> /srv/homes -alldirs -network 172.30.0.0/16
> 
> This export doesn't have a lot of data - it's 150 small home
> directories
> of test accounts. No other activity is being done on this server. The
> filesystem if UFS.
> 
> /etc/rc.conf on the server:
> rpcbind_enable="YES"
> nfs_server_enable="YES"
> mountd_flags="-r -l"
> nfsd_enable="YES"
> mountd_enable="YES"
> rpc_lockd_enable="YES"
> rpc_statd_enable="YES"
> nfs_server_flags="-t -n 128"
> 
> When this occurs, /var/log/messages starts to fill up with this:
> 
> Mar 30 16:35:18 freefs kernel: Failed to contact local NSM - rpc error
> 5
> Mar 30 16:35:20 freefs rpc.statd: unmon request from localhost, no
> matching monitor
> Mar 30 16:35:44 freefs rpc.statd: unmon request from localhost, no
> matching monitor
> -- repeated a few times every few seconds --
> Mar 30 16:54:50 freefs rpc.statd: Unsolicited notification from host
> hs00508s4434.dsdk12.schoollocal
> Mar 30 16:55:01 freefs rpc.statd: Unsolicited notification from host
> hs00520s4539.dsdk12.schoollocal
> Mar 30 16:55:10 freefs rpc.statd: Failed to call rpc.statd client at
> host localhost
> 
> nfsstat shortly after a failure:
> Rpc Info:
> TimedOut Invalid X Replies Retries Requests
> 0 0 0 0 1208
> Cache Info:
> Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits
> Misses
> 177 951 226 28 3 6 0
> 2
> BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs Hits
> Misses
> 49 3 13 5 9 0 148
> 9
> 
> Server Info:
> Getattr Setattr Lookup Readlink Read Write Create
> Remove
> 262698 101012 1575347 29 1924761 2172712 0
> 43792
> Rename Link Symlink Mkdir Rmdir Readdir RdirPlus
> Access
> 27447 0 21 5596 1691 118073 0
> 2596146
> Mknod Fsstat Fsinfo PathConf Commit
> 0 83638 108 108 183632
> Server Ret-Failed
> 0
> Server Faults
> 0
> Server Cache Stats:
> Inprog Idem Non-idem Misses
> 0 0 0 9172982
> Server Write Gathering:
> WriteOps WriteRPC Opsaved
> 2172712 2172712 0
> 
> rpcinfo shortly after a failure:
> program version netid address service owner
> 100000 4 tcp 0.0.0.0.0.111 rpcbind superuser
> 100000 3 tcp 0.0.0.0.0.111 rpcbind superuser
> 100000 2 tcp 0.0.0.0.0.111 rpcbind superuser
> 100000 4 udp 0.0.0.0.0.111 rpcbind superuser
> 100000 3 udp 0.0.0.0.0.111 rpcbind superuser
> 100000 2 udp 0.0.0.0.0.111 rpcbind superuser
> 100000 4 tcp6 ::.0.111 rpcbind superuser
> 100000 3 tcp6 ::.0.111 rpcbind superuser
> 100000 4 udp6 ::.0.111 rpcbind superuser
> 100000 3 udp6 ::.0.111 rpcbind superuser
> 100000 4 local /var/run/rpcbind.sock rpcbind superuser
> 100000 3 local /var/run/rpcbind.sock rpcbind superuser
> 100000 2 local /var/run/rpcbind.sock rpcbind superuser
> 100005 1 udp6 ::.2.119 mountd superuser
> 100005 3 udp6 ::.2.119 mountd superuser
> 100005 1 tcp6 ::.2.119 mountd superuser
> 100005 3 tcp6 ::.2.119 mountd superuser
> 100005 1 udp 0.0.0.0.2.119 mountd superuser
> 100005 3 udp 0.0.0.0.2.119 mountd superuser
> 100005 1 tcp 0.0.0.0.2.119 mountd superuser
> 100005 3 tcp 0.0.0.0.2.119 mountd superuser
> 100024 1 udp6 ::.3.191 status superuser
> 100024 1 tcp6 ::.3.191 status superuser
> 100024 1 udp 0.0.0.0.3.191 status superuser
> 100024 1 tcp 0.0.0.0.3.191 status superuser
> 100003 2 tcp 0.0.0.0.8.1 nfs superuser
> 100003 3 tcp 0.0.0.0.8.1 nfs superuser
> 100003 2 tcp6 ::.8.1 nfs superuser
> 100003 3 tcp6 ::.8.1 nfs superuser
> 100021 0 udp6 ::.3.248 nlockmgr superuser
> 100021 0 tcp6 ::.2.220 nlockmgr superuser
> 100021 0 udp 0.0.0.0.3.202 nlockmgr superuser
> 100021 0 tcp 0.0.0.0.2.255 nlockmgr superuser
> 100021 1 udp6 ::.3.248 nlockmgr superuser
> 100021 1 tcp6 ::.2.220 nlockmgr superuser
> 100021 1 udp 0.0.0.0.3.202 nlockmgr superuser
> 100021 1 tcp 0.0.0.0.2.255 nlockmgr superuser
> 100021 3 udp6 ::.3.248 nlockmgr superuser
> 100021 3 tcp6 ::.2.220 nlockmgr superuser
> 100021 3 udp 0.0.0.0.3.202 nlockmgr superuser
> 100021 3 tcp 0.0.0.0.2.255 nlockmgr superuser
> 100021 4 udp6 ::.3.248 nlockmgr superuser
> 100021 4 tcp6 ::.2.220 nlockmgr superuser
> 100021 4 udp 0.0.0.0.3.202 nlockmgr superuser
> 100021 4 tcp 0.0.0.0.2.255 nlockmgr superuser
> 300019 1 tcp 0.0.0.0.2.185 amd superuser
> 300019 1 udp 0.0.0.0.2.162 amd superuser
> 
> The load can get fairly high during my 'stress' tests, but not *that*
> high. I'm surprised to see these particular symptoms that affect every
> connected user at the same time and would expect slowdowns rather than
> the issue I'm seeing.
> 
> Any ideas or nudges in the right direction are most welcome. This is
> severely plaguing us and our students :\
> 
> Thanks,
> Josh
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2115805571.2042848.1333222134633.JavaMail.root>