From owner-freebsd-fs@freebsd.org Tue Dec 15 14:58:38 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F2C2AA4498C for ; Tue, 15 Dec 2015 14:58:37 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x22d.google.com (mail-wm0-x22d.google.com [IPv6:2a00:1450:400c:c09::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8A77A1642 for ; Tue, 15 Dec 2015 14:58:37 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x22d.google.com with SMTP id p66so29476613wmp.1 for ; Tue, 15 Dec 2015 06:58:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type:content-transfer-encoding; bh=7QaCM02zNoYw6CAVAwgTXD70X1N/1B7oFUNk29E+zg8=; b=zWlkkYyacCl6Rui3s94b6OrE/xvgLUJ533xEOzWxA/Wf/X2NY811Q/BF3+F29chvo6 SzIZcUzDdAjc9nWsd40HbYtQykPRQ6slYGRZQtE9EvvoJzI0dYgZfenF8weiKCqjSxjb oipVySny54IZntM2G7Ne/K7CNiC1iJx8o6NPhHVMCzANPQ9lzLrs5dYzhezjXVtfQIW6 nXEpeJT6x6rhhNKGxQrL25hlKdKvAP9eGKv5HP0EIx+Ev3tEIjZB2qCrTZ1GYN9zlzYM aWAVbAWycngjRKd86xO85SUnfm3820SIgQK7LsWJmX0Ir+4cDFZ/f5sNczIdv6JI/cUf 6T1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=7QaCM02zNoYw6CAVAwgTXD70X1N/1B7oFUNk29E+zg8=; b=lV5Dy9xB9zQTfMq4hXGZmIdz1jeYTvKPHVobuMeBYZsq11Kp7U++KZPa1iLUXG7QT9 31Ym0SOcHpahe/6Mn0ePJJn1XnmiE1ABGNYdlMIcikHeyUow2kfxsE6URz0ErbjhhM4W 7WfDVGP10GkbnKjQQ9a5VV3DVa/V6DGmI7Vn+CA4dNaAJe8o5NhJfAkNX1aCJYGOPc3s 84FaVna72BQIcfngy6POWYyRFoN8/bjOwu11NW7bbhWEE1C0AJCuNoJOjDd6Bgwd5aRP tL4h/eE0xxrYd6pNdcP/C0EyYAa/m4o2LgusKUOpPCtuyvK9O4XCj/qIwUgPNVAes1W8 +Edw== X-Gm-Message-State: ALoCoQnjv9dGFXW4atiVsS74K4qBGguUMZiPzWbWOHBN6NuBd1ag6D9S+TZIBuq2DF9lLnT5Ai/OceZBiA6/agc6uA2dsF8HTQ== X-Received: by 10.28.13.138 with SMTP id 132mr5774977wmn.62.1450191516003; Tue, 15 Dec 2015 06:58:36 -0800 (PST) Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171]) by smtp.gmail.com with ESMTPSA id l20sm21406900wmd.20.2015.12.15.06.58.35 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 15 Dec 2015 06:58:35 -0800 (PST) Subject: Re: ZFS hang in zfs_freebsd_rename To: Bengt Ahlgren References: <567022FB.1010508@multiplay.co.uk> Cc: freebsd-fs@freebsd.org From: Steven Hartland Message-ID: <56702A9F.90702@multiplay.co.uk> Date: Tue, 15 Dec 2015 14:58:39 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Dec 2015 14:58:38 -0000 There have been quite a few reported issues with this some at least have been fix, but as with anything the only way to be sure is to test it. On 15/12/2015 14:52, Bengt Ahlgren wrote: > Yes, that is on the todo list... > > So this is likely fixed then in 10.x? > > Bengt > > Steven Hartland writes: > >> Not a surprise in 9.x unfortunately, try upgrading to 10.x >> >> On 15/12/2015 12:51, Bengt Ahlgren wrote: >>> We have a server running 9.3-REL which currenly has two quite large zfs >>> pools: >>> >>> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT >>> p1 18.1T 10.7T 7.38T 59% 1.00x ONLINE - >>> p2 43.5T 29.1T 14.4T 66% 1.00x ONLINE - >>> >>> It has been running without any issues for some time now. Once, just >>> now, processes are getting stuck and impossible to kill on accessing a >>> particular directory in the p2 pool. That pool is a 2x6 disk raidz2. >>> >>> One process is stuck in zfs_freebsd_rename, and other processes >>> accessing that particular directory also get stuck. The system is now >>> almost completely idle. >>> >>> Output from kgdb on the running system for that first process: >>> >>> Thread 651 (Thread 102157): >>> #0 sched_switch (td=0xfffffe0b14059920, newtd=0xfffffe001633e920, flags=) >>> at /usr/src/sys/kern/sched_ule.c:1904 >>> #1 0xffffffff808f4604 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:485 >>> #2 0xffffffff809308e2 in sleepq_wait (wchan=0xfffffe0135b60488, >>> pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:618 >>> #3 0xffffffff808cf922 in __lockmgr_args (lk=0xfffffe0135b60488, flags=524544, ilk=0xfffffe0135b604b8, >>> wmesg=, pri=, timo=, >>> file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c", line=2337) at /usr/src/sys/kern/kern_lock.c:221 >>> #4 0xffffffff80977369 in vop_stdlock (ap=) at lockmgr.h:97 >>> #5 0xffffffff80dd4a04 in VOP_LOCK1_APV (vop=0xffffffff813e8160, a=0xffffffa07f935520) at vnode_if.c:2052 >>> #6 0xffffffff80998c17 in _vn_lock (vp=0xfffffe0135b603f0, flags=524288, >>> file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c", line=2337) at vnode_if.h:859 >>> #7 0xffffffff8098b621 in vputx (vp=0xfffffe0135b603f0, func=1) at /usr/src/sys/kern/vfs_subr.c:2337 >>> #8 0xffffffff81ac7955 in zfs_rename_unlock (zlpp=0xffffffa07f9356b8) >>> at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3609 >>> #9 0xffffffff81ac8c72 in zfs_freebsd_rename (ap=) >>> at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4039 >>> #10 0xffffffff80dd4f04 in VOP_RENAME_APV (vop=0xffffffff81b47d40, a=0xffffffa07f9358e0) at vnode_if.c:1522 >>> #11 0xffffffff80996bbd in kern_renameat (td=, oldfd=, >>> old=, newfd=-100, new=0x1826a9af00 , >>> pathseg=) at vnode_if.h:636 >>> #12 0xffffffff80cd228a in amd64_syscall (td=0xfffffe0b14059920, traced=0) at subr_syscall.c:135 >>> #13 0xffffffff80cbc907 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 >>> ---Type to continue, or q to quit--- >>> #14 0x0000000800cc1acc in ?? () >>> Previous frame inner to this frame (corrupt stack?) >>> >>> Full procstat -kk -a and kgdb "thread apply all bt" can be found here: >>> >>> https://www.sics.se/~bengta/ZFS-hang/ >>> >>> I don't know how to produce "alltrace in ddb" as the instructions in the >>> wiki says. It runs the GENERIC kernel, so perhaps it isn't possible? >>> >>> I checked "camcontrol tags" for all the disks in the pool - all have >>> zeroes for dev_active, devq_queued and held. >>> >>> Is there anything else I can check while the machine is up? I however >>> need to restart it pretty soon. >>> >>> Bengt >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"