Date: Wed, 17 Nov 2010 18:05:36 +0100 (CET) From: Oliver Fromme <olli@lurza.secnetix.de> To: freebsd-fs@FreeBSD.ORG Subject: NFS hangs (7.3) Message-ID: <201011171705.oAHH5age003849@lurza.secnetix.de>
next in thread | raw e-mail | index | archive | help
I've got a problem on a server farm. Every now and then, some NFS mounts hang. This happens after a few days or after a few weeks. All processes trying to access files from the hanging mount go to state "D" and freeze. The only way to resolve the problem is to reboot the server. "umount -f" als hangs and does not remove the hanging mount (even though it disappears from the output of the mount(8) command). Here's one example from an attempt to run df(1) which also hangs: ps -uww: USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 61930 0.0 0.0 5728 1280 p4- D 5:15PM 0:00.01 /bin/df ps -lww: UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 61930 1 0 -4 0 5728 1280 nfs D p4- 0:00.01 /bin/df procstat -kk: PID TID COMM TDNAME KSTACK 61930 100489 df - mi_switch+0x18e sleepq_wait+0x3b _sleep+0x367 acquire+0x7c _lockmgr+0x203 VOP_LOCK1_APV+0x46 _vn_lock+0x83 vget+0xf9 vfs_hash_get+0xf4 nfs_nget+0xa8 nfs_statfs+0x8b __vfs_statfs+0x2b kern_getfsstat+0x2d6 syscall+0x256 Xfast_syscall+0xab And this is a hanging umount(8) command (I used fsid syntax, hoping that it would work better than accessing the mont by its path, but it doesn't seem to make a difference): ps -uww: USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 62791 0.0 0.0 4640 1272 p4- D 5:18PM 0:00.08 umount -f a5ff000505000000 ps -lww: UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 62791 1 0 -4 0 4640 1272 vfsloc D p4- 0:00.08 umount -f a5ff000505000000 procstat -kk: PID TID COMM TDNAME KSTACK 62791 100239 umount - mi_switch+0x18e sleepq_wait+0x3b _sleep+0x367 _lockmgr+0x4f3 dounmount+0x474 unmount+0x30a syscall+0x256 Xfast_syscall+0xab The machine is quite busy. The hangs seem to always occur in the night when lots of cron jobs are running. The machine has 221 NFS mounts and 26 nullfs mounts, and it has 26 jails, if that matters. All NFS shares are mounted from a virtual filer running on a NetApp filer. The mounts use the default settings, so they should be v3 TCP (this is the default, right?). The only extra option we use is -L in order to "fake" locking locally. The machine is running FreeBSD 7.3-PRERELEASE-20100311 amd64. Updating is somewhat complicated in that server farm, so I haven't tried that so far because I'm not sure if it would help. Any suggestions or ideas? Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "Software gets slower faster than hardware gets faster." -- Niklaus Wirth
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201011171705.oAHH5age003849>