From owner-freebsd-fs@freebsd.org  Tue Dec 15 22:06:54 2015
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 994B4A48597
 for <freebsd-fs@mailman.ysv.freebsd.org>; Tue, 15 Dec 2015 22:06:54 +0000 (UTC)
 (envelope-from bengta@sics.se)
Received: from mail-lf0-x233.google.com (mail-lf0-x233.google.com
 [IPv6:2a00:1450:4010:c07::233])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 382BC15B3
 for <freebsd-fs@freebsd.org>; Tue, 15 Dec 2015 22:06:54 +0000 (UTC)
 (envelope-from bengta@sics.se)
Received: by mail-lf0-x233.google.com with SMTP id y184so16820010lfc.1
 for <freebsd-fs@freebsd.org>; Tue, 15 Dec 2015 14:06:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=sics-se.20150623.gappssmtp.com; s=20150623;
 h=from:to:cc:subject:in-reply-to:references:user-agent:date
 :message-id:mime-version:content-type;
 bh=4VuyDh0Nl2n02GCybP9E7e0HnLuOTHDfuXSjQq+3bCw=;
 b=CewHuVoY2nKbpsXGj1k2WDiZ9S8gXWl/Frc0UE8iCwtpCT8SjJQJ9++guoaKVkZ8j9
 hB5WMHMvHnYmqsZtDMYX7znLfKii0h8lwk9CFPKaYxLy6OHS0jbXjDjlRkRI5l2fGXvT
 lIbcAQfhW7L6mIM9OUnZiOkgcSnqyddtuXhC9HuZhyZ/mTcTuCzOMWsot315APJNw1c4
 jIqxNoQXGY2+I7Fn6WAMw8kst8I0Pd7Hj1gvCzk8ZJL9Vpk6bz+/szyJBEl34+eEdfSd
 fH1KzqgTOMI8ZR0mwHFxhwMc8EMK9sO3lcuDUfX4nUJvwg7q9n0BKH/2uN8z1x9mJ4i0
 RmVg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:from:to:cc:subject:in-reply-to:references
 :user-agent:date:message-id:mime-version:content-type;
 bh=4VuyDh0Nl2n02GCybP9E7e0HnLuOTHDfuXSjQq+3bCw=;
 b=ASl6IQ3dhYzOFt1Sm0Ks+y9RkBxutJfdvuMEKJufPTIOG2hLlwXjIfPFagc9z6VXLr
 JGHIhE341l7kx74I6JLAz+MeG+GnlVDYL8z0QH0ro7sCgt4m1DnHuLDeIF6dUFtgpPWX
 siYxjxJWKkvVlcAwFvzfxQSK7F2U7jvIflwqhqWBg5/lCT2g2d2m+ceXPZsIE5g21QaJ
 qAIsTFN784Lds1h62MgZ97cubD9MCTe/f0tmkXCL6iT+4tD6GF2xVUxi+uYFQKZ9/YjH
 elabRPgZ+R/FULVF0371QurDfFGXmdqC0GX9FxAJr4G4EWbkISN52s9ZnipsNctUjn52
 ASDQ==
X-Gm-Message-State: ALoCoQnCriKNthmlA91J5njRtXSUdN6WH8iqIYCDGwwJEeSOd+RNDql/I1+8m0jFBY7s1RVdOdNtHhPXIYPmeB/3/lKdChRH9w==
X-Received: by 10.25.211.209 with SMTP id k200mr16823851lfg.125.1450217211922; 
 Tue, 15 Dec 2015 14:06:51 -0800 (PST)
Received: from P142s.sics.se (h139n3-u-d1.ias.bredband.telia.com.
 [90.228.197.139])
 by smtp.gmail.com with ESMTPSA id d130sm504151lfe.18.2015.12.15.14.06.50
 (version=TLS1 cipher=AES128-SHA bits=128/128);
 Tue, 15 Dec 2015 14:06:51 -0800 (PST)
Received: from P142s.sics.se (localhost [127.0.0.1])
 by P142s.sics.se (8.15.2/8.15.2) with ESMTP id tBFM6ENu002134;
 Tue, 15 Dec 2015 23:06:14 +0100 (CET)
 (envelope-from bengta@P142s.sics.se)
Received: (from bengta@localhost)
 by P142s.sics.se (8.15.2/8.15.2/Submit) id tBFM6Dbk002133;
 Tue, 15 Dec 2015 23:06:13 +0100 (CET)
 (envelope-from bengta@P142s.sics.se)
From: Bengt Ahlgren <bengta@sics.se>
To: Rick Macklem <rmacklem@uoguelph.ca>
Cc: Steven Hartland <killing@multiplay.co.uk>, freebsd-fs@freebsd.org
Subject: Re: ZFS hang in zfs_freebsd_rename
In-Reply-To: <865572400.133527790.1450215159693.JavaMail.zimbra@uoguelph.ca>
 (Rick Macklem's message of "Tue, 15 Dec 2015 16:32:39 -0500 (EST)")
References: <uh7a8pbj2mo.fsf@P142s.sics.se> <567022FB.1010508@multiplay.co.uk>
 <uh7vb7zhihv.fsf@P142s.sics.se> <56702A9F.90702@multiplay.co.uk>
 <865572400.133527790.1450215159693.JavaMail.zimbra@uoguelph.ca>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (berkeley-unix)
Date: Tue, 15 Dec 2015 23:06:13 +0100
Message-ID: <uh7mvtbwene.fsf@P142s.sics.se>
MIME-Version: 1.0
Content-Type: text/plain
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Dec 2015 22:06:54 -0000

The pool has a few snapshots, but no renaming of them took place any
time recently.  This was renaming of a file.

Bengt

Rick Macklem <rmacklem@uoguelph.ca> writes:

> I'm not a ZFS guy, but I vaguely recall that renaming of snapshots
> can (or at least could, I don't know if it has been fixed) cause
> hung threads due to lock ordering issues.
>
> So, if by any chance you are renaming snapshots, you might want to
> avoid doing that.
>
> rick
>
> ----- Original Message -----
>> There have been quite a few reported issues with this some at least have
>> been fix, but as with anything the only way to be sure is to test it.
>> 
>> On 15/12/2015 14:52, Bengt Ahlgren wrote:
>> > Yes, that is on the todo list...
>> >
>> > So this is likely fixed then in 10.x?
>> >
>> > Bengt
>> >
>> > Steven Hartland <killing@multiplay.co.uk> writes:
>> >
>> >> Not a surprise in 9.x unfortunately, try upgrading to 10.x
>> >>
>> >> On 15/12/2015 12:51, Bengt Ahlgren wrote:
>> >>> We have a server running 9.3-REL which currenly has two quite large zfs
>> >>> pools:
>> >>>
>> >>> NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
>> >>> p1    18.1T  10.7T  7.38T    59%  1.00x  ONLINE  -
>> >>> p2    43.5T  29.1T  14.4T    66%  1.00x  ONLINE  -
>> >>>
>> >>> It has been running without any issues for some time now.  Once, just
>> >>> now, processes are getting stuck and impossible to kill on accessing a
>> >>> particular directory in the p2 pool.  That pool is a 2x6 disk raidz2.
>> >>>
>> >>> One process is stuck in zfs_freebsd_rename, and other processes
>> >>> accessing that particular directory also get stuck.  The system is now
>> >>> almost completely idle.
>> >>>
>> >>> Output from kgdb on the running system for that first process:
>> >>>
>> >>> Thread 651 (Thread 102157):
>> >>> #0  sched_switch (td=0xfffffe0b14059920, newtd=0xfffffe001633e920,
>> >>> flags=<value optimized out>)
>> >>>       at /usr/src/sys/kern/sched_ule.c:1904
>> >>> #1  0xffffffff808f4604 in mi_switch (flags=260, newtd=0x0) at
>> >>> /usr/src/sys/kern/kern_synch.c:485
>> >>> #2 0xffffffff809308e2 in sleepq_wait (wchan=0xfffffe0135b60488,
>> >>> pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:618
>> >>> #3  0xffffffff808cf922 in __lockmgr_args (lk=0xfffffe0135b60488,
>> >>> flags=524544, ilk=0xfffffe0135b604b8,
>> >>>       wmesg=<value optimized out>, pri=<value optimized out>, timo=<value
>> >>>       optimized out>,
>> >>>       file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c", line=2337)
>> >>>       at /usr/src/sys/kern/kern_lock.c:221
>> >>> #4  0xffffffff80977369 in vop_stdlock (ap=<value optimized out>) at
>> >>> lockmgr.h:97
>> >>> #5  0xffffffff80dd4a04 in VOP_LOCK1_APV (vop=0xffffffff813e8160,
>> >>> a=0xffffffa07f935520) at vnode_if.c:2052
>> >>> #6  0xffffffff80998c17 in _vn_lock (vp=0xfffffe0135b603f0, flags=524288,
>> >>>       file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c", line=2337)
>> >>>       at vnode_if.h:859
>> >>> #7  0xffffffff8098b621 in vputx (vp=0xfffffe0135b603f0, func=1) at
>> >>> /usr/src/sys/kern/vfs_subr.c:2337
>> >>> #8  0xffffffff81ac7955 in zfs_rename_unlock (zlpp=0xffffffa07f9356b8)
>> >>>       at
>> >>>       /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3609
>> >>> #9  0xffffffff81ac8c72 in zfs_freebsd_rename (ap=<value optimized out>)
>> >>>       at
>> >>>       /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4039
>> >>> #10 0xffffffff80dd4f04 in VOP_RENAME_APV (vop=0xffffffff81b47d40,
>> >>> a=0xffffffa07f9358e0) at vnode_if.c:1522
>> >>> #11 0xffffffff80996bbd in kern_renameat (td=<value optimized out>,
>> >>> oldfd=<value optimized out>,
>> >>>       old=<value optimized out>, newfd=-100, new=0x1826a9af00 <Error
>> >>>       reading address 0x1826a9af00: Bad address>,
>> >>>       pathseg=<value optimized out>) at vnode_if.h:636
>> >>> #12 0xffffffff80cd228a in amd64_syscall (td=0xfffffe0b14059920, traced=0)
>> >>> at subr_syscall.c:135
>> >>> #13 0xffffffff80cbc907 in Xfast_syscall () at
>> >>> /usr/src/sys/amd64/amd64/exception.S:396
>> >>> ---Type <return> to continue, or q <return> to quit---
>> >>> #14 0x0000000800cc1acc in ?? ()
>> >>> Previous frame inner to this frame (corrupt stack?)
>> >>>
>> >>> Full procstat -kk -a and kgdb "thread apply all bt" can be found here:
>> >>>
>> >>> https://www.sics.se/~bengta/ZFS-hang/
>> >>>
>> >>> I don't know how to produce "alltrace in ddb" as the instructions in the
>> >>> wiki says.  It runs the GENERIC kernel, so perhaps it isn't possible?
>> >>>
>> >>> I checked "camcontrol tags" for all the disks in the pool - all have
>> >>> zeroes for dev_active, devq_queued and held.
>> >>>
>> >>> Is there anything else I can check while the machine is up?  I however
>> >>> need to restart it pretty soon.
>> >>>
>> >>> Bengt