From owner-freebsd-current@FreeBSD.ORG Sun Sep 7 15:56:44 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B44B2B4C for ; Sun, 7 Sep 2014 15:56:44 +0000 (UTC) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 98FC915A1 for ; Sun, 7 Sep 2014 15:56:44 +0000 (UTC) Received: from delphij-macbook.local (unknown [10.64.64.51]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id B3D08FE3D; Sun, 7 Sep 2014 08:56:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1410105403; x=1410119803; bh=kAMVrVRaCqGnwB6jFlvjehir4sgi4JLbxbIph3pMX+c=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=QujuUClYIEzRQX4Q9s1uZsuyTvl1e6H2mWEziZBNCVSbIoR+mzcJjX5sKnqzzfe/e 5Tj+guyZfMYVRRV9XOMCs3YdrYGriu5gVAW/jCaavnBYACwIrLr9pSQhn4t6IMYztU pUrGItCBI/eQdYP11wXxeNAoULeRJs5LVQ+U//nU= Message-ID: <540C8039.7010309@delphij.net> Date: Sun, 07 Sep 2014 23:56:41 +0800 From: Xin Li Reply-To: d@delphij.net Organization: The FreeBSD Project MIME-Version: 1.0 To: Fabian Keil , freebsd-current@freebsd.org Subject: Re: ZFS-related panic: "possible" spa->spa_errlog_lock deadlock References: <492dbacb.5942cc9b@fabiankeil.de> <540C66AC.8070809@delphij.net> <4fa875ba.3cc970d7@fabiankeil.de> In-Reply-To: <4fa875ba.3cc970d7@fabiankeil.de> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: Alexander Motin X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Sep 2014 15:56:44 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 9/7/14 11:23 PM, Fabian Keil wrote: > Xin Li wrote: > >> On 9/7/14 9:02 PM, Fabian Keil wrote: >>> Using a kernel built from FreeBSD 11.0-CURRENT r271182 I got >>> the following panic yesterday: >>> >>> [...] Unread portion of the kernel message buffer: [6880] >>> panic: deadlkres: possible deadlock detected for >>> 0xfffff80015289490, blocked for 1800503 ticks >> >> Any chance to get all backtraces (e.g. thread apply all bt full >> 16)? I think a different thread that held the lock have been >> blocked, probably related to your disconnected vdev. > > Output of "thread apply all bt full 16" is available at: > http://www.fabiankeil.de/tmp/freebsd/kgdb-output-spa_errlog_lock-deadlock.txt > > A lot of the backtraces prematurely end with "Cannot access memory > at address", therefore I also added "thread apply all bt" output. > > Apparently there are at least two additional threads blocking below > spa_get_stats(): > > Thread 1182 (Thread 101989): #0 sched_switch > (td=0xfffff800628cc490, newtd=, flags= optimized out>) at /usr/src/sys/kern/sched_ule.c:1932 #1 > 0xffffffff805a23c1 in mi_switch (flags=260, newtd=0x0) at > /usr/src/sys/kern/kern_synch.c:493 #2 0xffffffff805e4bca in > sleepq_wait (wchan=0x0, pri=0) at > /usr/src/sys/kern/subr_sleepqueue.c:631 #3 0xffffffff80539f10 in > _cv_wait (cvp=0xfffff80025534a50, lock=0xfffff80025534a30) at > /usr/src/sys/kern/kern_condvar.c:139 #4 0xffffffff811721db in > zio_wait (zio=) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1442 > #5 0xffffffff81102111 in dbuf_read (db=, > zio=, flags=) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:649 > #6 0xffffffff81108e6d in dmu_buf_hold (os=, > object=, offset=, > tag=0x0, dbp=0xfffffe00955c6648, flags=) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:172 > #7 0xffffffff81163986 in zap_lockdir (os=0xfffff8002b7ab000, > obj=92, tx=0x0, lti=RW_READER, fatreader=1, adding=0, zapp= optimized out>) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:467 > > #8 0xffffffff811644ad in zap_count (os=0x0, zapobj=0, count=0xfffffe00955c66d8) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:712 > #9 0xffffffff8114a6dc in spa_get_errlog_size > (spa=0xfffff800062ed000) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_errlog.c:149 > > - ---Type to continue, or q to quit--- > #10 0xffffffff8113f549 in spa_get_stats (name=0xfffffe0044cac000 > "spaceloop", config=0xfffffe00955c68e8, altroot=0xfffffe0044cac430 > "", buflen=2048) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:3287 > #11 0xffffffff81189a45 in zfs_ioc_pool_stats > (zc=0xfffffe0044cac000) at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:1656 > > #12 0xffffffff81187290 in zfsdev_ioctl (dev=, zcmd=, arg=, flag=, td=) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:6136 > > #13 0xffffffff80464a55 in devfs_ioctl_f (fp=0xfffff80038bd00a0, com=3222821381, data=0xfffff800067b80a0, cred=, td=0xfffff800628cc490) at /usr/src/sys/fs/devfs/devfs_vnops.c:757 > #14 0xffffffff805f3c3d in kern_ioctl (td=0xfffff800628cc490, > fd=, com=0) at file.h:311 #15 > 0xffffffff805f381c in sys_ioctl (td=0xfffff800628cc490, > uap=0xfffffe00955c6b80) at /usr/src/sys/kern/sys_generic.c:702 #16 > 0xffffffff8085c2db in amd64_syscall (td=0xfffff800628cc490, > traced=0) at subr_syscall.c:133 #17 0xffffffff8083f90b in > Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:390 #18 > 0x00000008019fc3da in ?? () Previous frame inner to this frame > (corrupt stack?) Yes, thread 1182 owned the lock and is waiting for the zio be done. Other threads that wanted the lock would have to wait. I don't have much clue why the system entered this state, however, as the operations should have errored out (the GELI device is gone on 21:44:56 based on your log, which suggests all references were closed) instead of waiting. Adding mav@ as he may have some idea. Cheers, -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJUDIA5AAoJEJW2GBstM+nsdsgP/RT4nBT4mvOtpF2IEL7VFexJ OjipGsWIDmG9kc8CEEN+qh3Q4+prJO3IwHGTUPa0Vu13jRZ3T/uoZj/ncVAqnCyW s+SeEBjVVhZs/B08LqT2A8fZI9jVdqvFVWEL02z3ibWggoCnP60oag1OyvkNGqQU apWtXkjTnrFmOEcbB95viD8QEhfUzlQgBbeRuK8ADtK/jQNEl6A8xdE5HCT2DIN4 icIwoj9eXqyLB0/aGVIFycRID4hWAZaqPehXVtGhCdnlGut7itdufuXtfmTCEDWs z3vkGjTCJ3qsyKSxl/2ABGRgAH/lpR6J/N2yZOSNMRTt9PbjN1iLppu2IfD33OdY QlQrI2HhbwvoZmYe4f4B/1/8qzKag77hzYG2J2aN0OGn45zkThOwoQt854zm7OHq f3O3pysxUInTnIJrdBa53cT0arhQRtYn1xL5CYyvK4Nto74ht6g/AuBJbBWzWM/B t2fKuZmpGt32lf+vHjWS0O2VWdkc6I6s5rVZJsSLzAMfc1rWePIcokPdUk9RucyX qRrNDpMW53APWhPKGpkK+Utyx/OLKhO62d7iiYrGhVX0FU+trBV/6QdwPdkaM8Lx 6v51hDK6ZjF/K5ZCYFwQOPcSaSHiMIpQBQr0mFh2s/05p3/f7jNAthTx7npEIfYZ PZ7DoMna8cwC4vtfoR1t =Jxfg -----END PGP SIGNATURE-----