Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Dec 2012 09:26:10 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        olivier olivier <olivier777a7@gmail.com>
Cc:        freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE
Message-ID:  <1769356561.1118634.1354631170432.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <CALC5%2B1Ptc=c_hxfc_On9iDN4AC_Xmrfdbc1NgyJH2ZxP6fE0Aw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Olivier wrote:
> Hi all
> After upgrading from 9.0-RELEASE to 9.1-PRERELEASE #0 r243679 I'm
> having
> severe problems with NFS sharing of a ZFS volume. nfsd appears to hang
> at
> random times (between once every couple hours to once every two days)
> while
> accessing a ZFS volume, and the only way I have found of resolving the
> problem is to reboot. The server console is sometimes still responsive
> during the nfsd hang, and I can read and write files to the same ZFS
> volume
> while nfsd is hung. I am pasting below the output of procstat -kk on
> nfsd,
> and details of my pool (nfsstat on the server gets hung when the
> problem
> has started occurring, and does not produce any output). The pool is
> v28
> and was created from a bunch of volumes attached over Fibre Channel
> using
> the mpt driver. My system has a Supermicro board and 4 AMD Opteron
> 6274
> CPUs.
> 
> I did not experience any nfsd hangs with 9.0-RELEASE (same machine,
> essentially same configuration, same usage pattern).
> 
> I would greatly appreciate any help to resolve this problem!
> Thank you
> Olivier
> 
> PID TID COMM TDNAME KSTACK
> 1511 102751 nfsd nfsd: master
> mi_switch+0x186
> sleepq_wait+0x42
> __lockmgr_args+0x5ae
> vop_stdlock+0x39
> VOP_LOCK1_APV+0x46
> _vn_lock+0x47
> zfs_fhtovp+0x338
> nfsvno_fhtovp+0x87
> nfsd_fhtovp+0x7a
> nfsrvd_dorpc+0x9cf
> nfssvc_program+0x447
> svc_run_internal+0x687
> svc_run+0x8f
> nfsrvd_nfsd+0x193
> nfssvc_nfsd+0x9b
> sys_nfssvc+0x90
> amd64_syscall+0x540
> Xfast_syscall+0xf7
> 1511 102752 nfsd nfsd: service
> mi_switch+0x186
> sleepq_wait+0x42
> __lockmgr_args+0x5ae
> vop_stdlock+0x39
> VOP_LOCK1_APV+0x46
> _vn_lock+0x47
> zfs_fhtovp+0x338
> nfsvno_fhtovp+0x87
> nfsd_fhtovp+0x7a
> nfsrvd_dorpc+0x9cf
> nfssvc_program+0x447
> svc_run_internal+0x687
> svc_thread_start+0xb
> fork_exit+0x11f
> fork_trampoline+0xe
> 1511 102753 nfsd nfsd: service
> mi_switch+0x186
> sleepq_wait+0x42
> _cv_wait+0x112
> zio_wait+0x61
> zil_commit+0x764
> zfs_freebsd_write+0xba0
> VOP_WRITE_APV+0xb2
> nfsvno_write+0x14d
> nfsrvd_write+0x362
> nfsrvd_dorpc+0x3c0
> nfssvc_program+0x447
> svc_run_internal+0x687
> svc_thread_start+0xb
> fork_exit+0x11f
> fork_trampoline+0xe
> 1511 102754 nfsd nfsd: service
> mi_switch+0x186
> sleepq_wait+0x42
> _cv_wait+0x112
> zio_wait+0x61
> zil_commit+0x3cf
> zfs_freebsd_fsync+0xdc
> nfsvno_fsync+0x2f2
> nfsrvd_commit+0xe7
> nfsrvd_dorpc+0x3c0
> nfssvc_program+0x447
> svc_run_internal+0x687
> svc_thread_start+0xb
> fork_exit+0x11f
> fork_trampoline+0xe
> 1511 102755 nfsd nfsd: service
> mi_switch+0x186
> sleepq_wait+0x42
> __lockmgr_args+0x5ae
> vop_stdlock+0x39
> VOP_LOCK1_APV+0x46
> _vn_lock+0x47
> zfs_fhtovp+0x338
> nfsvno_fhtovp+0x87
> nfsd_fhtovp+0x7a
> nfsrvd_dorpc+0x9cf
> nfssvc_program+0x447
> svc_run_internal+0x687
> svc_thread_start+0xb
> fork_exit+0x11f
> fork_trampoline+0xe
> 1511 102756 nfsd nfsd: service
> mi_switch+0x186
> sleepq_wait+0x42
> _cv_wait+0x112
> zil_commit+0x6d
> zfs_freebsd_write+0xba0
> VOP_WRITE_APV+0xb2
> nfsvno_write+0x14d
> nfsrvd_write+0x362
> nfsrvd_dorpc+0x3c0
> nfssvc_program+0x447
> svc_run_internal+0x687
> svc_thread_start+0xb
> fork_exit+0x11f
> fork_trampoline+0xe
> 
These threads are either waiting for a vnode lock or waiting inside
zil_commit() { at 3 different locations in zil_commit() }. A guess
would be that the ZIL hasn`t completed a write for some reason, so
3 threads are waiting for it when one of them is holding a lock on
the vnode being written and the remaining threads are waiting for
that vnode lock.

I am not a ZFS guy, so I cannot help further, except to suggest
that you try and determine what might cause a write to the ZIL to
stall. (Different device, different device driver...)

Good luck with it, rick

> 
> PID TID COMM TDNAME KSTACK
> 1507 102750 nfsd -
> mi_switch+0x186
> sleepq_catch_signals+0x2e1
> sleepq_wait_sig+0x16
> _cv_wait_sig+0x12a
> seltdwait+0xf6
> kern_select+0x6ef
> sys_select+0x5d
> amd64_syscall+0x540
> Xfast_syscall+0xf7
> 
> 
> pool: tank
> state: ONLINE
> status: The pool is formatted using a legacy on-disk format. The pool
> can
> still be used, but some features are unavailable.
> action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
> pool will no longer be accessible on software that does not support
> feature
> flags.
> scan: scrub repaired 0 in 45h37m with 0 errors on Mon Dec 3 03:07:11
> 2012
> config:
> 
> NAME STATE READ WRITE CKSUM
> tank ONLINE 0 0 0
> raidz1-0 ONLINE 0 0 0
> da19 ONLINE 0 0 0
> da31 ONLINE 0 0 0
> da32 ONLINE 0 0 0
> da33 ONLINE 0 0 0
> da34 ONLINE 0 0 0
> raidz1-1 ONLINE 0 0 0
> da20 ONLINE 0 0 0
> da36 ONLINE 0 0 0
> da37 ONLINE 0 0 0
> da38 ONLINE 0 0 0
> da39 ONLINE 0 0 0
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1769356561.1118634.1354631170432.JavaMail.root>