Date: Fri, 17 Jul 2009 23:51:52 -0700 From: Brian DeFreitas <briandef@rescomp.berkeley.edu> To: freebsd-questions@freebsd.org Subject: Stability issues after upgrading to 7.1 - NFS related? Message-ID: <20090718065152.GB60636@hal.rescomp.berkeley.edu>
next in thread | raw e-mail | index | archive | help
--CUfgB8w4ZwR/yMy5 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello all, We recently upgraded an NFS server from 7.0-p6 to 7.1-p6. The following Monday morning, we found the server's networking to be wedged, and console error messages that strongly resemble this post [1]. In an effort to try the mentioned fixes, we upgraded to 7-STABLE. This did not seem to help matters; the NFS server keeps wedging 1-2x a day, requiring soft reboots (via console) at times and hard reboots at others. Heavy NFS load seems to trigger everything. Initially, we thought there might be a problem with rpc.statd because we started seeing "RPC: Port mapper failure - RPC : Timed out" messages. All the hosts that timed out were previously-working Linux (CentOS) NFS clients. We have IPsec configured in transport mode between all FreeBSD and Linux NFS clients, but only see the RPC error for CentOS (not RHEL) hosts, (and no errors from FreeBSD clients). Before the system wedges completely, `top` reports that most nfsd processes are in the *ipsec state. These are all the troubleshooting steps we have taken: - disabled NFS locking on the Linux NFS clients - RPC timed out messages still appear - set up RPC to use static ports for NFS on our CentOS clients (to work better with our firewalls, which needed no such rules before) - RPC timed out messages still appear - added 'rpc_lockd_enable=3D"NO"' to /etc/rc.conf - after rebooting, `rpcinfo -p` showed no lock manager running, but the crashes persisted - added "nooptions NFSLOCKD" to the kernel configuration - this only caused things to crash faster (few minutes after boot, with very little NFS load) Unfortunately, one of the issues we've run into in debugging this problem is the lack of useful logs and debugging information. Some info we have managed to gather: - before one reboot, we noticed console messages about mbuf's filling up. Running `netstat -m` right before crashes seems to confirm this. If anyone could provide some insight into what's happening, or help us get more debugging information, it would be very helpful. [1] http://lists.freebsd.org/pipermail/freebsd-current/2009-May/006434.html --=20 Brian DeFreitas Lead Unix Systems Administrator Network Infrastructure, RSSP-IT UC Berkeley --CUfgB8w4ZwR/yMy5 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (FreeBSD) iEYEARECAAYFAkphcQgACgkQvmVuM4nx8fHvHACfQ5txu7QqKZ/F+aVFYAvY2pp1 SH4AnAg82woUcwDyUhEX+eRtylzibiXj =47TQ -----END PGP SIGNATURE----- --CUfgB8w4ZwR/yMy5--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090718065152.GB60636>