From owner-freebsd-current@FreeBSD.ORG Mon Sep 8 08:04:47 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 60752486 for ; Mon, 8 Sep 2014 08:04:47 +0000 (UTC) Received: from mail-lb0-x22e.google.com (mail-lb0-x22e.google.com [IPv6:2a00:1450:4010:c04::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D72ED15DE for ; Mon, 8 Sep 2014 08:04:46 +0000 (UTC) Received: by mail-lb0-f174.google.com with SMTP id n15so6736110lbi.33 for ; Mon, 08 Sep 2014 01:04:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:message-id:date:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=COrXza2iuPPmZS+HrzsA47eb2jFfvhEuNuQ88HLEZvk=; b=DEQ/ZFQ8S1LKZ6gvT+f2i1e6Aj1llrufztiSO/UI2RZzDY+6mELvnPrxSp9jY7NGCx M594rIQV4CUmuLMkvZwHeKO4ZFPP9BKnktTwPQAsMcbHlEQDiVoqpbK90uyWCXivhIgs ZCU/MgO5dRoCG+76V0JpX7jmXBqrbUskYTb3vk4h5WUdz94GOR5j6PCQDgS5Leglwfzw /MB9+6PP3hxjN3qviEqux9HnRS8KpMH4SoqHZt1KrSLP7d16WrIX8LLIofhydMMD+Dod kdDsL00T2Y+/lLAi1Uu3p3JgMks6Ao3wdwCwDyNs53tSo5QKrwukr2Jhwwec6XZZWTFp 4+5Q== X-Received: by 10.112.163.103 with SMTP id yh7mr26078300lbb.73.1410163484678; Mon, 08 Sep 2014 01:04:44 -0700 (PDT) Received: from mavbook.mavhome.dp.ua ([134.249.139.101]) by mx.google.com with ESMTPSA id k9sm3316491lbj.4.2014.09.08.01.04.43 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 08 Sep 2014 01:04:43 -0700 (PDT) Sender: Alexander Motin From: Alexander Motin X-Google-Original-From: Alexander Motin Message-ID: <540D631A.8090700@ixsystems.com> Date: Mon, 08 Sep 2014 11:04:42 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: d@delphij.net, Fabian Keil , freebsd-current@freebsd.org Subject: Re: ZFS-related panic: "possible" spa->spa_errlog_lock deadlock References: <492dbacb.5942cc9b@fabiankeil.de> <540C66AC.8070809@delphij.net> <4fa875ba.3cc970d7@fabiankeil.de> <540C8039.7010309@delphij.net> In-Reply-To: <540C8039.7010309@delphij.net> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Sep 2014 08:04:47 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 07.09.2014 18:56, Xin Li wrote: > On 9/7/14 11:23 PM, Fabian Keil wrote: >> Xin Li wrote: > >>> On 9/7/14 9:02 PM, Fabian Keil wrote: >>>> Using a kernel built from FreeBSD 11.0-CURRENT r271182 I got >>>> the following panic yesterday: >>>> >>>> [...] Unread portion of the kernel message buffer: [6880] >>>> panic: deadlkres: possible deadlock detected for >>>> 0xfffff80015289490, blocked for 1800503 ticks >>> >>> Any chance to get all backtraces (e.g. thread apply all bt >>> full 16)? I think a different thread that held the lock have >>> been blocked, probably related to your disconnected vdev. > >> Output of "thread apply all bt full 16" is available at: >> http://www.fabiankeil.de/tmp/freebsd/kgdb-output-spa_errlog_lock-deadlock.txt > >> A lot of the backtraces prematurely end with "Cannot access >> memory at address", therefore I also added "thread apply all bt" >> output. > >> Apparently there are at least two additional threads blocking >> below spa_get_stats(): > >> Thread 1182 (Thread 101989): #0 sched_switch >> (td=0xfffff800628cc490, newtd=, >> flags=) at >> /usr/src/sys/kern/sched_ule.c:1932 #1 0xffffffff805a23c1 in >> mi_switch (flags=260, newtd=0x0) at >> /usr/src/sys/kern/kern_synch.c:493 #2 0xffffffff805e4bca in >> sleepq_wait (wchan=0x0, pri=0) at >> /usr/src/sys/kern/subr_sleepqueue.c:631 #3 0xffffffff80539f10 >> in _cv_wait (cvp=0xfffff80025534a50, lock=0xfffff80025534a30) at >> /usr/src/sys/kern/kern_condvar.c:139 #4 0xffffffff811721db in >> zio_wait (zio=) at >> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1442 >> #5 0xffffffff81102111 in dbuf_read (db=, >> zio=, flags=) at >> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:649 >> #6 0xffffffff81108e6d in dmu_buf_hold (os=> out>, object=, offset=> out>, tag=0x0, dbp=0xfffffe00955c6648, flags=> out>) at >> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:172 >> #7 0xffffffff81163986 in zap_lockdir (os=0xfffff8002b7ab000, >> obj=92, tx=0x0, lti=RW_READER, fatreader=1, adding=0, >> zapp=) at >> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:467 > >> > > #8 0xffffffff811644ad in zap_count (os=0x0, zapobj=0, > count=0xfffffe00955c66d8) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:712 >> > #9 0xffffffff8114a6dc in spa_get_errlog_size >> (spa=0xfffff800062ed000) at >> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_errlog.c:149 > >> > > ---Type to continue, or q to quit--- >> #10 0xffffffff8113f549 in spa_get_stats (name=0xfffffe0044cac000 >> "spaceloop", config=0xfffffe00955c68e8, >> altroot=0xfffffe0044cac430 "", buflen=2048) at >> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:3287 >> #11 0xffffffff81189a45 in zfs_ioc_pool_stats >> (zc=0xfffffe0044cac000) at >> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:1656 > >> > > #12 0xffffffff81187290 in zfsdev_ioctl (dev=, > zcmd=, arg=, flag= optimized out>, td=) >> at >> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:6136 > >> > > #13 0xffffffff80464a55 in devfs_ioctl_f (fp=0xfffff80038bd00a0, > com=3222821381, data=0xfffff800067b80a0, cred= out>, td=0xfffff800628cc490) at > /usr/src/sys/fs/devfs/devfs_vnops.c:757 >> #14 0xffffffff805f3c3d in kern_ioctl (td=0xfffff800628cc490, >> fd=, com=0) at file.h:311 #15 >> 0xffffffff805f381c in sys_ioctl (td=0xfffff800628cc490, >> uap=0xfffffe00955c6b80) at /usr/src/sys/kern/sys_generic.c:702 >> #16 0xffffffff8085c2db in amd64_syscall (td=0xfffff800628cc490, >> traced=0) at subr_syscall.c:133 #17 0xffffffff8083f90b in >> Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:390 #18 >> 0x00000008019fc3da in ?? () Previous frame inner to this frame >> (corrupt stack?) > > Yes, thread 1182 owned the lock and is waiting for the zio be > done. Other threads that wanted the lock would have to wait. > > I don't have much clue why the system entered this state, however, > as the operations should have errored out (the GELI device is gone > on 21:44:56 based on your log, which suggests all references were > closed) instead of waiting. > > Adding mav@ as he may have some idea. Some time ago I experimented with ZFS behavior in situation of disappearing disks. I fixed several most easily reproducible scenarios, but finally I've got to conclusion that ZFS in many places written in a way that simply does not expect errors. In such cases it just stucks, waiting for disk to reappear and I/O to complete. In some situations after disk reconnect it can now be recovered with `zpool clear ...`, but still not all, since sometimes code may stuck holding some lock required for recovery. - -- Alexander Motin -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQF8BAEBCgBmBQJUDWMaXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRFOThDRjNDNEU2OUNDM0NEMEU1NzlENTU4 MzE4QzM5NTVCQUIyMjdGAAoJEIMYw5VbqyJ/0CcIAJ6QegWMb1EOVqB3dJiUB8if LKanU3dx4WGhofJPiPtsW6Aa8UO+4NevS+zjApXpHWge+QiWiN4NKaMMYxLgk5Zs a6rv6L0m7i/jlKbAQKj+ewUSOzX2XF66ERhGfoWfmUpjZIJqKV4nPnJksg6iEEUN RgDQdxvc6CxKvczUjmMBDdMIWDPUZ1aOoNarRmhNA2g0SnN4dKjQVPpD5zdOL8LI 3yN2ReUvXj1vSAY6Drmsr3MjdY8+EdpKgFY8VZj7wiu+KVK5YVHx5Xbq0M+LJ/yj YsT1IB4KgoFfSJNzfm/78bvpXvFzGlbrm8RsqYoV00Y3YYd2ybes04Bb6yg+Qd0= =wxCp -----END PGP SIGNATURE-----