From owner-freebsd-fs@freebsd.org Tue Dec 15 15:34:54 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 92D76A4827A for ; Tue, 15 Dec 2015 15:34:54 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-wm0-x232.google.com (mail-wm0-x232.google.com [IPv6:2a00:1450:400c:c09::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1C74514E8 for ; Tue, 15 Dec 2015 15:34:54 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: by mail-wm0-x232.google.com with SMTP id p66so31297497wmp.1 for ; Tue, 15 Dec 2015 07:34:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=k0obaVQPHwrc4GEtyRecFUoqsqXEPZThZRTCiQJC0fc=; b=aM6iBRlphcYCiLazcxZzhUuzkntZZoNLkCRtRWIx+2cjcrmF5xgs1qJfUC1y78Lq79 dGfrV9xaPDMWsCt/InIkCmUGjiQ6b3SSslOe6yakUKGPiUjREsPYJChwRYU8KihXlWVJ 8HOEMXsyOEeBrNKpLjQQFZ/I96aTtxNwMwhYUtsGXVEk9m+05mRXx+GJ8Wy9LOREXsge nKHsA/Oxy2Uvy8bU9QJru27CTiOrdwfZ0JXXZL5jFVaVYY683hTOWcyBjGQzb1b3+bzz 9HEaxegJWIPsAyjPMHPYT2sF4Gikzsc7oyiOtsIq631gVik7DaGRubxZgWyievpaIWDJ pAxg== MIME-Version: 1.0 X-Received: by 10.28.104.134 with SMTP id d128mr5629527wmc.30.1450193691393; Tue, 15 Dec 2015 07:34:51 -0800 (PST) Received: by 10.28.181.213 with HTTP; Tue, 15 Dec 2015 07:34:51 -0800 (PST) In-Reply-To: References: <567022FB.1010508@multiplay.co.uk> <56702A9F.90702@multiplay.co.uk> Date: Tue, 15 Dec 2015 15:34:51 +0000 Message-ID: Subject: Re: ZFS hang in zfs_freebsd_rename From: krad To: Bengt Ahlgren Cc: Steven Hartland , FreeBSD FS Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Dec 2015 15:34:54 -0000 If your situation allows it goto stable as there have been lots of fixes since 10.2. It may be worth reviewing them to see if they are relevant. On 15 December 2015 at 15:01, Bengt Ahlgren wrote: > OK, thanks for the advice! > > Bengt > > Steven Hartland writes: > > > There have been quite a few reported issues with this some at least > > have been fix, but as with anything the only way to be sure is to test > > it. > > > > On 15/12/2015 14:52, Bengt Ahlgren wrote: > >> Yes, that is on the todo list... > >> > >> So this is likely fixed then in 10.x? > >> > >> Bengt > >> > >> Steven Hartland writes: > >> > >>> Not a surprise in 9.x unfortunately, try upgrading to 10.x > >>> > >>> On 15/12/2015 12:51, Bengt Ahlgren wrote: > >>>> We have a server running 9.3-REL which currenly has two quite large > zfs > >>>> pools: > >>>> > >>>> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > >>>> p1 18.1T 10.7T 7.38T 59% 1.00x ONLINE - > >>>> p2 43.5T 29.1T 14.4T 66% 1.00x ONLINE - > >>>> > >>>> It has been running without any issues for some time now. Once, just > >>>> now, processes are getting stuck and impossible to kill on accessing a > >>>> particular directory in the p2 pool. That pool is a 2x6 disk raidz2. > >>>> > >>>> One process is stuck in zfs_freebsd_rename, and other processes > >>>> accessing that particular directory also get stuck. The system is now > >>>> almost completely idle. > >>>> > >>>> Output from kgdb on the running system for that first process: > >>>> > >>>> Thread 651 (Thread 102157): > >>>> #0 sched_switch (td=0xfffffe0b14059920, newtd=0xfffffe001633e920, > flags=) > >>>> at /usr/src/sys/kern/sched_ule.c:1904 > >>>> #1 0xffffffff808f4604 in mi_switch (flags=260, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:485 > >>>> #2 0xffffffff809308e2 in sleepq_wait (wchan=0xfffffe0135b60488, > >>>> pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:618 > >>>> #3 0xffffffff808cf922 in __lockmgr_args (lk=0xfffffe0135b60488, > flags=524544, ilk=0xfffffe0135b604b8, > >>>> wmesg=, pri=, > timo=, > >>>> file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c", > line=2337) at /usr/src/sys/kern/kern_lock.c:221 > >>>> #4 0xffffffff80977369 in vop_stdlock (ap=) at > lockmgr.h:97 > >>>> #5 0xffffffff80dd4a04 in VOP_LOCK1_APV (vop=0xffffffff813e8160, > a=0xffffffa07f935520) at vnode_if.c:2052 > >>>> #6 0xffffffff80998c17 in _vn_lock (vp=0xfffffe0135b603f0, > flags=524288, > >>>> file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c", > line=2337) at vnode_if.h:859 > >>>> #7 0xffffffff8098b621 in vputx (vp=0xfffffe0135b603f0, func=1) at > /usr/src/sys/kern/vfs_subr.c:2337 > >>>> #8 0xffffffff81ac7955 in zfs_rename_unlock (zlpp=0xffffffa07f9356b8) > >>>> at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3609 > >>>> #9 0xffffffff81ac8c72 in zfs_freebsd_rename (ap= out>) > >>>> at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4039 > >>>> #10 0xffffffff80dd4f04 in VOP_RENAME_APV (vop=0xffffffff81b47d40, > a=0xffffffa07f9358e0) at vnode_if.c:1522 > >>>> #11 0xffffffff80996bbd in kern_renameat (td=, > oldfd=, > >>>> old=, newfd=-100, new=0x1826a9af00 reading address 0x1826a9af00: Bad address>, > >>>> pathseg=) at vnode_if.h:636 > >>>> #12 0xffffffff80cd228a in amd64_syscall (td=0xfffffe0b14059920, > traced=0) at subr_syscall.c:135 > >>>> #13 0xffffffff80cbc907 in Xfast_syscall () at > /usr/src/sys/amd64/amd64/exception.S:396 > >>>> ---Type to continue, or q to quit--- > >>>> #14 0x0000000800cc1acc in ?? () > >>>> Previous frame inner to this frame (corrupt stack?) > >>>> > >>>> Full procstat -kk -a and kgdb "thread apply all bt" can be found here: > >>>> > >>>> https://www.sics.se/~bengta/ZFS-hang/ > >>>> > >>>> I don't know how to produce "alltrace in ddb" as the instructions in > the > >>>> wiki says. It runs the GENERIC kernel, so perhaps it isn't possible? > >>>> > >>>> I checked "camcontrol tags" for all the disks in the pool - all have > >>>> zeroes for dev_active, devq_queued and held. > >>>> > >>>> Is there anything else I can check while the machine is up? I however > >>>> need to restart it pretty soon. > >>>> > >>>> Bengt > >>>> _______________________________________________ > >>>> freebsd-fs@freebsd.org mailing list > >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >