From owner-freebsd-fs@FreeBSD.ORG Thu Sep 8 19:59:11 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 507B71065674 for ; Thu, 8 Sep 2011 19:59:11 +0000 (UTC) (envelope-from luke@digital-crocus.com) Received: from mail.digital-crocus.com (node2.digital-crocus.com [91.209.244.128]) by mx1.freebsd.org (Postfix) with ESMTP id 007508FC16 for ; Thu, 8 Sep 2011 19:59:10 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dkselector; d=hybrid-logic.co.uk; h=Received:Received:Subject:From:To:Cc:In-Reply-To:References:Content-Type:Organization:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Spam-Score:X-Digital-Crocus-Maillimit:X-Authenticated-Sender:X-Complaints:X-Admin:X-Abuse; b=T/U0dFTFinH4q410RUgqmcS0/wBgu+L0AsH183i2JNfQbreM3r1+r5UWWYAor22MG/O/xQAoeqNhmkTDd87sEFo6t5uICLFL2BSw8dTYwrsvq6p8bVe4gXEnnjfkUHfE; Received: from luke by mail.digital-crocus.com with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1R1kjq-000BFV-MC for freebsd-fs@freebsd.org; Thu, 08 Sep 2011 20:58:18 +0100 Received: from vlan111.pact.srf.ac.uk ([193.37.225.200] helo=[10.0.111.133]) by mail.digital-crocus.com with esmtpa (Exim 4.69 (FreeBSD)) (envelope-from ) id 1R1kjq-000BFD-AO; Thu, 08 Sep 2011 20:58:18 +0100 From: Luke Marsden To: Pawel Jakub Dawidek In-Reply-To: <20110908195405.GB1667@garage.freebsd.pl> References: <1314646728.7898.44.camel@pow> <4E5BFC6F.5080507@FreeBSD.org> <1314655349.7898.53.camel@pow> <1315502388.11352.15.camel@pow> <20110908195405.GB1667@garage.freebsd.pl> Content-Type: text/plain; charset="UTF-8" Organization: Hybrid Logic Date: Thu, 08 Sep 2011 20:59:03 +0100 Message-ID: <1315511943.11352.30.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Digital-Crocus-Maillimit: done X-Authenticated-Sender: luke X-Complaints: abuse@digital-crocus.com X-Admin: admin@digital-crocus.com X-Abuse: abuse@digital-crocus.com (Please include full headers in abuse reports) Cc: freebsd-fs@freebsd.org, tech@hybrid-logic.co.uk, Martin Matuska Subject: Re: ZFS hang in production on 8.2-RELEASE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Sep 2011 19:59:11 -0000 On Thu, 2011-09-08 at 21:54 +0200, Pawel Jakub Dawidek wrote: > On Thu, Sep 08, 2011 at 06:19:48PM +0100, Luke Marsden wrote: > > On Mon, 2011-08-29 at 23:02 +0100, Luke Marsden wrote: > > > On Mon, 2011-08-29 at 22:54 +0200, Martin Matuska wrote: > > > > No, I think this is more likely fixed by pjd's bugfix in r224791 (MFC'ed > > > > to stable/8 as r225100). > > > > > > > > The corresponding patch is: > > > > http://people.freebsd.org/~pjd/patches/zfsdev_state_lock.patch > > > > > > > > > > Great, thanks! Will this patch apply to ZFS v15? We can't upgrade to > > > v28 yet. > > > > > > > We just got another hang in production, same procstat -kk on zfs umount > > -f as before: > > > > 52739 100186 zfs - mi_switch+0x176 > > sleepq_wait+0x42 _sleep+0x317 zfsvfs_teardown+0x269 zfs_umount+0x1c4 > > dounmount+0x32a unmount+0x38b syscallenter+0x1e5 syscall+0x4b > > Xfast_syscall+0xe2 > > > > Please advise whether the zfsdev_state_lock patch with apply to > > 8.2-RELEASE and whether it is wise to attempt to apply it? > > The zfsdev_state_lock patch won't help you, it fixes totally unrelated > problem. Could you send backtraces of all processes? > Thanks for getting back to me. As before: FreeBSD XXX 8.2-RELEASE FreeBSD 8.2-RELEASE #0 r219081M: Wed Mar 2 08:29:52 CET 2011 root@www4:/usr/obj/usr/src/sys/GENERIC amd64 There are 9 'zfs rename' processes and 1 'zfs umount -f' processes hung. Here is the procstat for the 'zfs umount -f': 13451 104337 zfs - mi_switch+0x176 sleepq_wait+0x42 _sleep+0x317 zfsvfs_teardown+0x269 zfs_umount+0x1c4 dounmount+0x32a unmount+0x38b syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 And the 'zfs rename's all look the same: 20361 101049 zfs - mi_switch+0x176 sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_rmdirat+0xa4 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 An 'ls' on a directory which contains most of the system's ZFS mount-points (/hcfs) also hangs: 30073 101466 gnuls - mi_switch+0x176 sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall +0xe2 If I truss the 'ls' it hangs on the stat syscall: stat("/hcfs",{ mode=drwxr-xr-x ,inode=3,size=2012,blksize=16384 }) = 0 (0x0) There is also a 'find -s / ! ( -fstype zfs ) -prune -or -path /tmp -prune -or -path /usr/tmp -prune -or -path /var/tmp -prune -or -path /var/db/portsnap -prune -or -print' running which is also hung: 2650 101674 find - mi_switch+0x176 sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall +0xe2 However I/O to the presently mounted filesystems continues to work (even on parts of filesystems which are unlikely to be cached), and 'zfs list' showing all the filesystems (3,500 filesystems with ~100 snapshots per filesystem) also works. Any activity on the structure of the ZFS hierarchy *under the hcfs filesystem* crashes, such as a 'zfs create hpool/hcfs/test': 70868 101874 zfs - mi_switch+0x176 sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_mkdirat+0xce syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 BUT "zfs create hpool/system/opt/hello" (a ZFS filesystem in the same pool, but not rooted on hpool/hcfs) does not hang, and succeeds normally. procstat -kk on the zfskern process gives: PID TID COMM TDNAME KSTACK 5 100045 zfskern arc_reclaim_thre mi_switch+0x176 sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x2a9 fork_exit+0x118 fork_trampoline+0xe 5 100046 zfskern l2arc_feed_threa mi_switch+0x176 sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1ce fork_exit+0x118 fork_trampoline+0xe 5 100098 zfskern txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread +0xb5 fork_exit+0x118 fork_trampoline+0xe 5 100099 zfskern txg_thread_enter mi_switch+0x176 sleepq_timedwait+0x42 _cv_timedwait+0x134 txg_thread_wait+0x3c txg_sync_thread+0x365 fork_exit+0x118 fork_trampoline+0xe -- Best Regards, Luke Marsden CTO, Hybrid Logic Ltd. Web: http://www.hybrid-cluster.com/ Hybrid Web Cluster - cloud web hosting Mobile: +447791750420 (UK) / +1-415-449-1165 (US)