From owner-freebsd-fs@freebsd.org  Tue Dec 15 15:34:54 2015
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 92D76A4827A
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 15 Dec 2015 15:34:54 +0000 (UTC)
 (envelope-from kraduk@gmail.com)
Received: from mail-wm0-x232.google.com (mail-wm0-x232.google.com
 [IPv6:2a00:1450:400c:c09::232])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 1C74514E8
 for <freebsd-fs@freebsd.org>; Tue, 15 Dec 2015 15:34:54 +0000 (UTC)
 (envelope-from kraduk@gmail.com)
Received: by mail-wm0-x232.google.com with SMTP id p66so31297497wmp.1
 for <freebsd-fs@freebsd.org>; Tue, 15 Dec 2015 07:34:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=k0obaVQPHwrc4GEtyRecFUoqsqXEPZThZRTCiQJC0fc=;
 b=aM6iBRlphcYCiLazcxZzhUuzkntZZoNLkCRtRWIx+2cjcrmF5xgs1qJfUC1y78Lq79
 dGfrV9xaPDMWsCt/InIkCmUGjiQ6b3SSslOe6yakUKGPiUjREsPYJChwRYU8KihXlWVJ
 8HOEMXsyOEeBrNKpLjQQFZ/I96aTtxNwMwhYUtsGXVEk9m+05mRXx+GJ8Wy9LOREXsge
 nKHsA/Oxy2Uvy8bU9QJru27CTiOrdwfZ0JXXZL5jFVaVYY683hTOWcyBjGQzb1b3+bzz
 9HEaxegJWIPsAyjPMHPYT2sF4Gikzsc7oyiOtsIq631gVik7DaGRubxZgWyievpaIWDJ
 pAxg==
MIME-Version: 1.0
X-Received: by 10.28.104.134 with SMTP id d128mr5629527wmc.30.1450193691393;
 Tue, 15 Dec 2015 07:34:51 -0800 (PST)
Received: by 10.28.181.213 with HTTP; Tue, 15 Dec 2015 07:34:51 -0800 (PST)
In-Reply-To: <uh7io3zhi2z.fsf@P142s.sics.se>
References: <uh7a8pbj2mo.fsf@P142s.sics.se> <567022FB.1010508@multiplay.co.uk>
 <uh7vb7zhihv.fsf@P142s.sics.se> <56702A9F.90702@multiplay.co.uk>
 <uh7io3zhi2z.fsf@P142s.sics.se>
Date: Tue, 15 Dec 2015 15:34:51 +0000
Message-ID: <CALfReyfdq6-cZzkjgNDgD-hd=JB_EaaGE2ek9VEK+omxgN=nkw@mail.gmail.com>
Subject: Re: ZFS hang in zfs_freebsd_rename
From: krad <kraduk@gmail.com>
To: Bengt Ahlgren <bengta@sics.se>
Cc: Steven Hartland <killing@multiplay.co.uk>,
 FreeBSD FS <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Dec 2015 15:34:54 -0000

If your situation allows it goto stable as there have been lots of fixes
since 10.2. It may be worth reviewing them to see if they are relevant.

On 15 December 2015 at 15:01, Bengt Ahlgren <bengta@sics.se> wrote:

> OK, thanks for the advice!
>
> Bengt
>
> Steven Hartland <killing@multiplay.co.uk> writes:
>
> > There have been quite a few reported issues with this some at least
> > have been fix, but as with anything the only way to be sure is to test
> > it.
> >
> > On 15/12/2015 14:52, Bengt Ahlgren wrote:
> >> Yes, that is on the todo list...
> >>
> >> So this is likely fixed then in 10.x?
> >>
> >> Bengt
> >>
> >> Steven Hartland <killing@multiplay.co.uk> writes:
> >>
> >>> Not a surprise in 9.x unfortunately, try upgrading to 10.x
> >>>
> >>> On 15/12/2015 12:51, Bengt Ahlgren wrote:
> >>>> We have a server running 9.3-REL which currenly has two quite large
> zfs
> >>>> pools:
> >>>>
> >>>> NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> >>>> p1    18.1T  10.7T  7.38T    59%  1.00x  ONLINE  -
> >>>> p2    43.5T  29.1T  14.4T    66%  1.00x  ONLINE  -
> >>>>
> >>>> It has been running without any issues for some time now.  Once, just
> >>>> now, processes are getting stuck and impossible to kill on accessing a
> >>>> particular directory in the p2 pool.  That pool is a 2x6 disk raidz2.
> >>>>
> >>>> One process is stuck in zfs_freebsd_rename, and other processes
> >>>> accessing that particular directory also get stuck.  The system is now
> >>>> almost completely idle.
> >>>>
> >>>> Output from kgdb on the running system for that first process:
> >>>>
> >>>> Thread 651 (Thread 102157):
> >>>> #0  sched_switch (td=0xfffffe0b14059920, newtd=0xfffffe001633e920,
> flags=<value optimized out>)
> >>>>       at /usr/src/sys/kern/sched_ule.c:1904
> >>>> #1  0xffffffff808f4604 in mi_switch (flags=260, newtd=0x0) at
> /usr/src/sys/kern/kern_synch.c:485
> >>>> #2 0xffffffff809308e2 in sleepq_wait (wchan=0xfffffe0135b60488,
> >>>> pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:618
> >>>> #3  0xffffffff808cf922 in __lockmgr_args (lk=0xfffffe0135b60488,
> flags=524544, ilk=0xfffffe0135b604b8,
> >>>>       wmesg=<value optimized out>, pri=<value optimized out>,
> timo=<value optimized out>,
> >>>>       file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c",
> line=2337) at /usr/src/sys/kern/kern_lock.c:221
> >>>> #4  0xffffffff80977369 in vop_stdlock (ap=<value optimized out>) at
> lockmgr.h:97
> >>>> #5  0xffffffff80dd4a04 in VOP_LOCK1_APV (vop=0xffffffff813e8160,
> a=0xffffffa07f935520) at vnode_if.c:2052
> >>>> #6  0xffffffff80998c17 in _vn_lock (vp=0xfffffe0135b603f0,
> flags=524288,
> >>>>       file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c",
> line=2337) at vnode_if.h:859
> >>>> #7  0xffffffff8098b621 in vputx (vp=0xfffffe0135b603f0, func=1) at
> /usr/src/sys/kern/vfs_subr.c:2337
> >>>> #8  0xffffffff81ac7955 in zfs_rename_unlock (zlpp=0xffffffa07f9356b8)
> >>>>       at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3609
> >>>> #9  0xffffffff81ac8c72 in zfs_freebsd_rename (ap=<value optimized
> out>)
> >>>>       at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4039
> >>>> #10 0xffffffff80dd4f04 in VOP_RENAME_APV (vop=0xffffffff81b47d40,
> a=0xffffffa07f9358e0) at vnode_if.c:1522
> >>>> #11 0xffffffff80996bbd in kern_renameat (td=<value optimized out>,
> oldfd=<value optimized out>,
> >>>>       old=<value optimized out>, newfd=-100, new=0x1826a9af00 <Error
> reading address 0x1826a9af00: Bad address>,
> >>>>       pathseg=<value optimized out>) at vnode_if.h:636
> >>>> #12 0xffffffff80cd228a in amd64_syscall (td=0xfffffe0b14059920,
> traced=0) at subr_syscall.c:135
> >>>> #13 0xffffffff80cbc907 in Xfast_syscall () at
> /usr/src/sys/amd64/amd64/exception.S:396
> >>>> ---Type <return> to continue, or q <return> to quit---
> >>>> #14 0x0000000800cc1acc in ?? ()
> >>>> Previous frame inner to this frame (corrupt stack?)
> >>>>
> >>>> Full procstat -kk -a and kgdb "thread apply all bt" can be found here:
> >>>>
> >>>> https://www.sics.se/~bengta/ZFS-hang/
> >>>>
> >>>> I don't know how to produce "alltrace in ddb" as the instructions in
> the
> >>>> wiki says.  It runs the GENERIC kernel, so perhaps it isn't possible?
> >>>>
> >>>> I checked "camcontrol tags" for all the disks in the pool - all have
> >>>> zeroes for dev_active, devq_queued and held.
> >>>>
> >>>> Is there anything else I can check while the machine is up?  I however
> >>>> need to restart it pretty soon.
> >>>>
> >>>> Bengt
> >>>> _______________________________________________
> >>>> freebsd-fs@freebsd.org mailing list
> >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>