From owner-freebsd-fs@FreeBSD.ORG Tue Dec 4 14:26:19 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 196AE91B; Tue, 4 Dec 2012 14:26:19 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id B02C98FC0C; Tue, 4 Dec 2012 14:26:17 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAJ0HvlCDaFvO/2dsb2JhbABEhjO4EHOCHgEBAQMBAQEBICsgCxsYAgINGQIpAQkmBggHBAEcBIdpBgyuWYI/kE+BIosVARoNgwaBEwOIX4p2gi6BHI8rgxCBRwcXHg X-IronPort-AV: E=Sophos;i="4.84,215,1355115600"; d="scan'208";a="3254280" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 04 Dec 2012 09:26:10 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 6EF50B3F15; Tue, 4 Dec 2012 09:26:10 -0500 (EST) Date: Tue, 4 Dec 2012 09:26:10 -0500 (EST) From: Rick Macklem To: olivier olivier Message-ID: <1769356561.1118634.1354631170432.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Dec 2012 14:26:19 -0000 Olivier wrote: > Hi all > After upgrading from 9.0-RELEASE to 9.1-PRERELEASE #0 r243679 I'm > having > severe problems with NFS sharing of a ZFS volume. nfsd appears to hang > at > random times (between once every couple hours to once every two days) > while > accessing a ZFS volume, and the only way I have found of resolving the > problem is to reboot. The server console is sometimes still responsive > during the nfsd hang, and I can read and write files to the same ZFS > volume > while nfsd is hung. I am pasting below the output of procstat -kk on > nfsd, > and details of my pool (nfsstat on the server gets hung when the > problem > has started occurring, and does not produce any output). The pool is > v28 > and was created from a bunch of volumes attached over Fibre Channel > using > the mpt driver. My system has a Supermicro board and 4 AMD Opteron > 6274 > CPUs. > > I did not experience any nfsd hangs with 9.0-RELEASE (same machine, > essentially same configuration, same usage pattern). > > I would greatly appreciate any help to resolve this problem! > Thank you > Olivier > > PID TID COMM TDNAME KSTACK > 1511 102751 nfsd nfsd: master > mi_switch+0x186 > sleepq_wait+0x42 > __lockmgr_args+0x5ae > vop_stdlock+0x39 > VOP_LOCK1_APV+0x46 > _vn_lock+0x47 > zfs_fhtovp+0x338 > nfsvno_fhtovp+0x87 > nfsd_fhtovp+0x7a > nfsrvd_dorpc+0x9cf > nfssvc_program+0x447 > svc_run_internal+0x687 > svc_run+0x8f > nfsrvd_nfsd+0x193 > nfssvc_nfsd+0x9b > sys_nfssvc+0x90 > amd64_syscall+0x540 > Xfast_syscall+0xf7 > 1511 102752 nfsd nfsd: service > mi_switch+0x186 > sleepq_wait+0x42 > __lockmgr_args+0x5ae > vop_stdlock+0x39 > VOP_LOCK1_APV+0x46 > _vn_lock+0x47 > zfs_fhtovp+0x338 > nfsvno_fhtovp+0x87 > nfsd_fhtovp+0x7a > nfsrvd_dorpc+0x9cf > nfssvc_program+0x447 > svc_run_internal+0x687 > svc_thread_start+0xb > fork_exit+0x11f > fork_trampoline+0xe > 1511 102753 nfsd nfsd: service > mi_switch+0x186 > sleepq_wait+0x42 > _cv_wait+0x112 > zio_wait+0x61 > zil_commit+0x764 > zfs_freebsd_write+0xba0 > VOP_WRITE_APV+0xb2 > nfsvno_write+0x14d > nfsrvd_write+0x362 > nfsrvd_dorpc+0x3c0 > nfssvc_program+0x447 > svc_run_internal+0x687 > svc_thread_start+0xb > fork_exit+0x11f > fork_trampoline+0xe > 1511 102754 nfsd nfsd: service > mi_switch+0x186 > sleepq_wait+0x42 > _cv_wait+0x112 > zio_wait+0x61 > zil_commit+0x3cf > zfs_freebsd_fsync+0xdc > nfsvno_fsync+0x2f2 > nfsrvd_commit+0xe7 > nfsrvd_dorpc+0x3c0 > nfssvc_program+0x447 > svc_run_internal+0x687 > svc_thread_start+0xb > fork_exit+0x11f > fork_trampoline+0xe > 1511 102755 nfsd nfsd: service > mi_switch+0x186 > sleepq_wait+0x42 > __lockmgr_args+0x5ae > vop_stdlock+0x39 > VOP_LOCK1_APV+0x46 > _vn_lock+0x47 > zfs_fhtovp+0x338 > nfsvno_fhtovp+0x87 > nfsd_fhtovp+0x7a > nfsrvd_dorpc+0x9cf > nfssvc_program+0x447 > svc_run_internal+0x687 > svc_thread_start+0xb > fork_exit+0x11f > fork_trampoline+0xe > 1511 102756 nfsd nfsd: service > mi_switch+0x186 > sleepq_wait+0x42 > _cv_wait+0x112 > zil_commit+0x6d > zfs_freebsd_write+0xba0 > VOP_WRITE_APV+0xb2 > nfsvno_write+0x14d > nfsrvd_write+0x362 > nfsrvd_dorpc+0x3c0 > nfssvc_program+0x447 > svc_run_internal+0x687 > svc_thread_start+0xb > fork_exit+0x11f > fork_trampoline+0xe > These threads are either waiting for a vnode lock or waiting inside zil_commit() { at 3 different locations in zil_commit() }. A guess would be that the ZIL hasn`t completed a write for some reason, so 3 threads are waiting for it when one of them is holding a lock on the vnode being written and the remaining threads are waiting for that vnode lock. I am not a ZFS guy, so I cannot help further, except to suggest that you try and determine what might cause a write to the ZIL to stall. (Different device, different device driver...) Good luck with it, rick > > PID TID COMM TDNAME KSTACK > 1507 102750 nfsd - > mi_switch+0x186 > sleepq_catch_signals+0x2e1 > sleepq_wait_sig+0x16 > _cv_wait_sig+0x12a > seltdwait+0xf6 > kern_select+0x6ef > sys_select+0x5d > amd64_syscall+0x540 > Xfast_syscall+0xf7 > > > pool: tank > state: ONLINE > status: The pool is formatted using a legacy on-disk format. The pool > can > still be used, but some features are unavailable. > action: Upgrade the pool using 'zpool upgrade'. Once this is done, the > pool will no longer be accessible on software that does not support > feature > flags. > scan: scrub repaired 0 in 45h37m with 0 errors on Mon Dec 3 03:07:11 > 2012 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > da19 ONLINE 0 0 0 > da31 ONLINE 0 0 0 > da32 ONLINE 0 0 0 > da33 ONLINE 0 0 0 > da34 ONLINE 0 0 0 > raidz1-1 ONLINE 0 0 0 > da20 ONLINE 0 0 0 > da36 ONLINE 0 0 0 > da37 ONLINE 0 0 0 > da38 ONLINE 0 0 0 > da39 ONLINE 0 0 0 > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org"