Date: Tue, 4 Dec 2012 09:26:10 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: olivier olivier <olivier777a7@gmail.com> Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE Message-ID: <1769356561.1118634.1354631170432.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <CALC5%2B1Ptc=c_hxfc_On9iDN4AC_Xmrfdbc1NgyJH2ZxP6fE0Aw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Olivier wrote: > Hi all > After upgrading from 9.0-RELEASE to 9.1-PRERELEASE #0 r243679 I'm > having > severe problems with NFS sharing of a ZFS volume. nfsd appears to hang > at > random times (between once every couple hours to once every two days) > while > accessing a ZFS volume, and the only way I have found of resolving the > problem is to reboot. The server console is sometimes still responsive > during the nfsd hang, and I can read and write files to the same ZFS > volume > while nfsd is hung. I am pasting below the output of procstat -kk on > nfsd, > and details of my pool (nfsstat on the server gets hung when the > problem > has started occurring, and does not produce any output). The pool is > v28 > and was created from a bunch of volumes attached over Fibre Channel > using > the mpt driver. My system has a Supermicro board and 4 AMD Opteron > 6274 > CPUs. > > I did not experience any nfsd hangs with 9.0-RELEASE (same machine, > essentially same configuration, same usage pattern). > > I would greatly appreciate any help to resolve this problem! > Thank you > Olivier > > PID TID COMM TDNAME KSTACK > 1511 102751 nfsd nfsd: master > mi_switch+0x186 > sleepq_wait+0x42 > __lockmgr_args+0x5ae > vop_stdlock+0x39 > VOP_LOCK1_APV+0x46 > _vn_lock+0x47 > zfs_fhtovp+0x338 > nfsvno_fhtovp+0x87 > nfsd_fhtovp+0x7a > nfsrvd_dorpc+0x9cf > nfssvc_program+0x447 > svc_run_internal+0x687 > svc_run+0x8f > nfsrvd_nfsd+0x193 > nfssvc_nfsd+0x9b > sys_nfssvc+0x90 > amd64_syscall+0x540 > Xfast_syscall+0xf7 > 1511 102752 nfsd nfsd: service > mi_switch+0x186 > sleepq_wait+0x42 > __lockmgr_args+0x5ae > vop_stdlock+0x39 > VOP_LOCK1_APV+0x46 > _vn_lock+0x47 > zfs_fhtovp+0x338 > nfsvno_fhtovp+0x87 > nfsd_fhtovp+0x7a > nfsrvd_dorpc+0x9cf > nfssvc_program+0x447 > svc_run_internal+0x687 > svc_thread_start+0xb > fork_exit+0x11f > fork_trampoline+0xe > 1511 102753 nfsd nfsd: service > mi_switch+0x186 > sleepq_wait+0x42 > _cv_wait+0x112 > zio_wait+0x61 > zil_commit+0x764 > zfs_freebsd_write+0xba0 > VOP_WRITE_APV+0xb2 > nfsvno_write+0x14d > nfsrvd_write+0x362 > nfsrvd_dorpc+0x3c0 > nfssvc_program+0x447 > svc_run_internal+0x687 > svc_thread_start+0xb > fork_exit+0x11f > fork_trampoline+0xe > 1511 102754 nfsd nfsd: service > mi_switch+0x186 > sleepq_wait+0x42 > _cv_wait+0x112 > zio_wait+0x61 > zil_commit+0x3cf > zfs_freebsd_fsync+0xdc > nfsvno_fsync+0x2f2 > nfsrvd_commit+0xe7 > nfsrvd_dorpc+0x3c0 > nfssvc_program+0x447 > svc_run_internal+0x687 > svc_thread_start+0xb > fork_exit+0x11f > fork_trampoline+0xe > 1511 102755 nfsd nfsd: service > mi_switch+0x186 > sleepq_wait+0x42 > __lockmgr_args+0x5ae > vop_stdlock+0x39 > VOP_LOCK1_APV+0x46 > _vn_lock+0x47 > zfs_fhtovp+0x338 > nfsvno_fhtovp+0x87 > nfsd_fhtovp+0x7a > nfsrvd_dorpc+0x9cf > nfssvc_program+0x447 > svc_run_internal+0x687 > svc_thread_start+0xb > fork_exit+0x11f > fork_trampoline+0xe > 1511 102756 nfsd nfsd: service > mi_switch+0x186 > sleepq_wait+0x42 > _cv_wait+0x112 > zil_commit+0x6d > zfs_freebsd_write+0xba0 > VOP_WRITE_APV+0xb2 > nfsvno_write+0x14d > nfsrvd_write+0x362 > nfsrvd_dorpc+0x3c0 > nfssvc_program+0x447 > svc_run_internal+0x687 > svc_thread_start+0xb > fork_exit+0x11f > fork_trampoline+0xe > These threads are either waiting for a vnode lock or waiting inside zil_commit() { at 3 different locations in zil_commit() }. A guess would be that the ZIL hasn`t completed a write for some reason, so 3 threads are waiting for it when one of them is holding a lock on the vnode being written and the remaining threads are waiting for that vnode lock. I am not a ZFS guy, so I cannot help further, except to suggest that you try and determine what might cause a write to the ZIL to stall. (Different device, different device driver...) Good luck with it, rick > > PID TID COMM TDNAME KSTACK > 1507 102750 nfsd - > mi_switch+0x186 > sleepq_catch_signals+0x2e1 > sleepq_wait_sig+0x16 > _cv_wait_sig+0x12a > seltdwait+0xf6 > kern_select+0x6ef > sys_select+0x5d > amd64_syscall+0x540 > Xfast_syscall+0xf7 > > > pool: tank > state: ONLINE > status: The pool is formatted using a legacy on-disk format. The pool > can > still be used, but some features are unavailable. > action: Upgrade the pool using 'zpool upgrade'. Once this is done, the > pool will no longer be accessible on software that does not support > feature > flags. > scan: scrub repaired 0 in 45h37m with 0 errors on Mon Dec 3 03:07:11 > 2012 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > da19 ONLINE 0 0 0 > da31 ONLINE 0 0 0 > da32 ONLINE 0 0 0 > da33 ONLINE 0 0 0 > da34 ONLINE 0 0 0 > raidz1-1 ONLINE 0 0 0 > da20 ONLINE 0 0 0 > da36 ONLINE 0 0 0 > da37 ONLINE 0 0 0 > da38 ONLINE 0 0 0 > da39 ONLINE 0 0 0 > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1769356561.1118634.1354631170432.JavaMail.root>