From owner-freebsd-fs@freebsd.org Tue Dec 15 21:32:48 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B35E2A48096 for ; Tue, 15 Dec 2015 21:32:48 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 4F9851262 for ; Tue, 15 Dec 2015 21:32:47 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:ilX2ZRDrNq2WpafIdoM5UyQJP3N1i/DPJgcQr6AfoPdwSP/4ocbcNUDSrc9gkEXOFd2CrakU1ayO6+jJYi8p39WoiDg6aptCVhsI2409vjcLJ4q7M3D9N+PgdCcgHc5PBxdP9nC/NlVJSo6lPwWB6kO74TNaIBjjLw09fr2zQd6MyZzvn8mJuLTtICxwzAKnZr1zKBjk5S7wjeIxxbVYF6Aq1xHSqWFJcekFjUlhJFaUggqurpzopM0r221qtvkg789NV7nhN+R9FOQATWduDmYu+ce+tQXfVRDdoTwYU34KiVxGGQXI5gr2GJDrvWz/v+t53SCcesn3Vqw1XzqlqKlxRRLikytCOSVqzGaCsdB9kq9d6DKovQB7yojYKNWWNf56f6XSVdYHQXZARsJYRmpKBcWhbN1cIfAGOLNiroL+734Hphi6CAzkUPnqwzRLgnLz9bA93PksFRnGmgcpSYFd+E/Ipcn4Yf9BGdu+y7PFmHCaN6tb X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CtBADuhXBW/61jaINeFoN2bQa9YYFjFwqFIkoCggQSAQEBAQEBAQGBCYItggcBAQEEAQEBICsgCwwEAgEIDgoCAg0ZAgInAQkmAgQIBwQBHASIDg6rdJF3AQEBAQEBAQEBAQEBAQEBAQEBARYEgQGFVYR9hCAbAQEFCRaDFoFJBY01d4hQhTmFIpInjV4CKAM4hCIgNAeDJAcXI4EIAQEB X-IronPort-AV: E=Sophos;i="5.20,434,1444708800"; d="scan'208";a="258029635" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 15 Dec 2015 16:32:41 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 2C35D15F565; Tue, 15 Dec 2015 16:32:41 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 6jMgO7VivfOV; Tue, 15 Dec 2015 16:32:40 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 021E515F55D; Tue, 15 Dec 2015 16:32:40 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id zs7AyB2iqUeo; Tue, 15 Dec 2015 16:32:39 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id C9C3015F56E; Tue, 15 Dec 2015 16:32:39 -0500 (EST) Date: Tue, 15 Dec 2015 16:32:39 -0500 (EST) From: Rick Macklem To: Steven Hartland Cc: Bengt Ahlgren , freebsd-fs@freebsd.org Message-ID: <865572400.133527790.1450215159693.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <56702A9F.90702@multiplay.co.uk> References: <567022FB.1010508@multiplay.co.uk> <56702A9F.90702@multiplay.co.uk> Subject: Re: ZFS hang in zfs_freebsd_rename MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF42 (Win)/8.0.9_GA_6191) Thread-Topic: ZFS hang in zfs_freebsd_rename Thread-Index: HhrgZNowbOuAMUqLR1lHOtfW6eGjIg== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Dec 2015 21:32:48 -0000 I'm not a ZFS guy, but I vaguely recall that renaming of snapshots can (or at least could, I don't know if it has been fixed) cause hung threads due to lock ordering issues. So, if by any chance you are renaming snapshots, you might want to avoid doing that. rick ----- Original Message ----- > There have been quite a few reported issues with this some at least have > been fix, but as with anything the only way to be sure is to test it. > > On 15/12/2015 14:52, Bengt Ahlgren wrote: > > Yes, that is on the todo list... > > > > So this is likely fixed then in 10.x? > > > > Bengt > > > > Steven Hartland writes: > > > >> Not a surprise in 9.x unfortunately, try upgrading to 10.x > >> > >> On 15/12/2015 12:51, Bengt Ahlgren wrote: > >>> We have a server running 9.3-REL which currenly has two quite large zfs > >>> pools: > >>> > >>> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > >>> p1 18.1T 10.7T 7.38T 59% 1.00x ONLINE - > >>> p2 43.5T 29.1T 14.4T 66% 1.00x ONLINE - > >>> > >>> It has been running without any issues for some time now. Once, just > >>> now, processes are getting stuck and impossible to kill on accessing a > >>> particular directory in the p2 pool. That pool is a 2x6 disk raidz2. > >>> > >>> One process is stuck in zfs_freebsd_rename, and other processes > >>> accessing that particular directory also get stuck. The system is now > >>> almost completely idle. > >>> > >>> Output from kgdb on the running system for that first process: > >>> > >>> Thread 651 (Thread 102157): > >>> #0 sched_switch (td=0xfffffe0b14059920, newtd=0xfffffe001633e920, > >>> flags=) > >>> at /usr/src/sys/kern/sched_ule.c:1904 > >>> #1 0xffffffff808f4604 in mi_switch (flags=260, newtd=0x0) at > >>> /usr/src/sys/kern/kern_synch.c:485 > >>> #2 0xffffffff809308e2 in sleepq_wait (wchan=0xfffffe0135b60488, > >>> pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:618 > >>> #3 0xffffffff808cf922 in __lockmgr_args (lk=0xfffffe0135b60488, > >>> flags=524544, ilk=0xfffffe0135b604b8, > >>> wmesg=, pri=, timo= >>> optimized out>, > >>> file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c", line=2337) > >>> at /usr/src/sys/kern/kern_lock.c:221 > >>> #4 0xffffffff80977369 in vop_stdlock (ap=) at > >>> lockmgr.h:97 > >>> #5 0xffffffff80dd4a04 in VOP_LOCK1_APV (vop=0xffffffff813e8160, > >>> a=0xffffffa07f935520) at vnode_if.c:2052 > >>> #6 0xffffffff80998c17 in _vn_lock (vp=0xfffffe0135b603f0, flags=524288, > >>> file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c", line=2337) > >>> at vnode_if.h:859 > >>> #7 0xffffffff8098b621 in vputx (vp=0xfffffe0135b603f0, func=1) at > >>> /usr/src/sys/kern/vfs_subr.c:2337 > >>> #8 0xffffffff81ac7955 in zfs_rename_unlock (zlpp=0xffffffa07f9356b8) > >>> at > >>> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3609 > >>> #9 0xffffffff81ac8c72 in zfs_freebsd_rename (ap=) > >>> at > >>> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4039 > >>> #10 0xffffffff80dd4f04 in VOP_RENAME_APV (vop=0xffffffff81b47d40, > >>> a=0xffffffa07f9358e0) at vnode_if.c:1522 > >>> #11 0xffffffff80996bbd in kern_renameat (td=, > >>> oldfd=, > >>> old=, newfd=-100, new=0x1826a9af00 >>> reading address 0x1826a9af00: Bad address>, > >>> pathseg=) at vnode_if.h:636 > >>> #12 0xffffffff80cd228a in amd64_syscall (td=0xfffffe0b14059920, traced=0) > >>> at subr_syscall.c:135 > >>> #13 0xffffffff80cbc907 in Xfast_syscall () at > >>> /usr/src/sys/amd64/amd64/exception.S:396 > >>> ---Type to continue, or q to quit--- > >>> #14 0x0000000800cc1acc in ?? () > >>> Previous frame inner to this frame (corrupt stack?) > >>> > >>> Full procstat -kk -a and kgdb "thread apply all bt" can be found here: > >>> > >>> https://www.sics.se/~bengta/ZFS-hang/ > >>> > >>> I don't know how to produce "alltrace in ddb" as the instructions in the > >>> wiki says. It runs the GENERIC kernel, so perhaps it isn't possible? > >>> > >>> I checked "camcontrol tags" for all the disks in the pool - all have > >>> zeroes for dev_active, devq_queued and held. > >>> > >>> Is there anything else I can check while the machine is up? I however > >>> need to restart it pretty soon. > >>> > >>> Bengt > >>> _______________________________________________ > >>> freebsd-fs@freebsd.org mailing list > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >