From owner-freebsd-current@freebsd.org Thu Oct 29 12:38:39 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D4230A124EF for ; Thu, 29 Oct 2015 12:38:39 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay01.ispgateway.de (smtprelay01.ispgateway.de [80.67.31.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6A1351E57 for ; Thu, 29 Oct 2015 12:38:38 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from [78.35.160.31] (helo=fabiankeil.de) by smtprelay01.ispgateway.de with esmtpsa (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.84) (envelope-from ) id 1ZrmRw-0003W8-LL; Thu, 29 Oct 2015 13:37:00 +0100 Date: Thu, 29 Oct 2015 13:35:41 +0100 From: Fabian Keil To: freebsd-current@freebsd.org Cc: "Steven Hartland" , Xin Li , "Alexander Motin" Subject: Re: ZFS-related panic: "possible" spa->spa_errlog_lock deadlock Message-ID: <20151029133541.2bb55af9@fabiankeil.de> In-Reply-To: <20151028135821.0d375ec5@fabiankeil.de> References: <492dbacb.5942cc9b@fabiankeil.de> <540C66AC.8070809@delphij.net> <4fa875ba.3cc970d7@fabiankeil.de> <540C8039.7010309@delphij.net> <20151028135821.0d375ec5@fabiankeil.de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/4TEXJLc2m1/KqUySJtNSQbe"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Oct 2015 12:38:40 -0000 --Sig_/4TEXJLc2m1/KqUySJtNSQbe Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Fabian Keil wrote: > Xin Li wrote: >=20 > > On 9/7/14 11:23 PM, Fabian Keil wrote: =20 > > > Xin Li wrote: > > > =20 > > >> On 9/7/14 9:02 PM, Fabian Keil wrote: =20 > > >>> Using a kernel built from FreeBSD 11.0-CURRENT r271182 I got > > >>> the following panic yesterday: > > >>>=20 > > >>> [...] Unread portion of the kernel message buffer: [6880] > > >>> panic: deadlkres: possible deadlock detected for > > >>> 0xfffff80015289490, blocked for 1800503 ticks =20 > > >>=20 > > >> Any chance to get all backtraces (e.g. thread apply all bt full > > >> 16)? I think a different thread that held the lock have been > > >> blocked, probably related to your disconnected vdev. =20 > > >=20 > > > Output of "thread apply all bt full 16" is available at:=20 > > > http://www.fabiankeil.de/tmp/freebsd/kgdb-output-spa_errlog_lock-dead= lock.txt > > > > > > A lot of the backtraces prematurely end with "Cannot access memory > > > at address", therefore I also added "thread apply all bt" output. > > >=20 > > > Apparently there are at least two additional threads blocking below > > > spa_get_stats(): =20 > [...] > > Yes, thread 1182 owned the lock and is waiting for the zio be done. > > Other threads that wanted the lock would have to wait. > >=20 > > I don't have much clue why the system entered this state, however, as > > the operations should have errored out (the GELI device is gone on > > 21:44:56 based on your log, which suggests all references were closed) > > instead of waiting. =20 > I finally found the time to analyse the problem which seems > to be that spa_sync() requires at least one writeable vdev to > complete, but holds the lock(s) required to remove or bring back > vdevs. >=20 > Letting spa_sync() drop the lock and wait for at least one vdev > to become writeable again seems to make the problem unreproducible > for me, but probably merely shrinks the race window and thus is not > a complete solution. >=20 > For details see: > https://www.fabiankeil.de/sourcecode/electrobsd/ZFS-Optionally-let-spa_sy= nc-wait-for-writable-vdev.diff Patch updated to unbreak the userspace build and to note that the deadlock can still be reproduced with an artifical test case like: Shell 1: mdconfig -u 0 -f /dpool/scratch/test-vdev.img zpool create test /dev/md0 while sleep 1; do mdconfig -d -u 0 -o force && mdconfig -f /dpool/scratch/test-vdev.img && zpool clear test; done Shell 2: # Cause writes to the pool from another shell, for example # by creating datasets. Log excerpt (from test begin to deadlock): Oct 29 12:34:28 kendra ZFS: vdev state changed, pool_guid=3D160393537382368= 08887 vdev_guid=3D3080051161477470469 [...] Oct 29 12:46:57 kendra ZFS: vdev state changed, pool_guid=3D160393537382368= 08887 vdev_guid=3D3080051161477470469 Oct 29 12:46:59 kendra ZFS: vdev is removed, pool_guid=3D160393537382368088= 87 vdev_guid=3D3080051161477470469 Oct 29 12:46:59 kendra ZFS: vdev state changed, pool_guid=3D160393537382368= 08887 vdev_guid=3D3080051161477470469 Oct 29 12:47:00 kendra kernel: g_dev_taste: make_dev_p() failed (gp->name= =3Dmd0, error=3D17) With the deadman enabled, this will also cause: panic: I/O to pool 'test' appears to be hung on vdev guid 3080051161477470= 469 at '/dev/md0'. cpuid =3D 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01136a= f870 vpanic() at vpanic+0x182/frame 0xfffffe01136af8f0 panic() at panic+0x43/frame 0xfffffe01136af950 vdev_deadman() at vdev_deadman+0x127/frame 0xfffffe01136af9a0 vdev_deadman() at vdev_deadman+0x40/frame 0xfffffe01136af9f0 spa_deadman() at spa_deadman+0x86/frame 0xfffffe01136afa20 softclock_call_cc() at softclock_call_cc+0x1a3/frame 0xfffffe01136afaf0 softclock() at softclock+0x94/frame 0xfffffe01136afb20 intr_event_execute_handlers() at intr_event_execute_handlers+0x1b6/frame 0= xfffffe01136afb60 ithread_loop() at ithread_loop+0xa6/frame 0xfffffe01136afbb0 fork_exit() at fork_exit+0x9c/frame 0xfffffe01136afbf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01136afbf0 With test's txg_sync_thread being the offender: (kgdb) tid 101874 [Switching to thread 819 (Thread 101874)]#0 sched_switch (td=3D0xfffff800= 513649a0, newtd=3D, flags=3D) at = /usr/src/sys/kern/sched_ule.c:1969 1969 cpuid =3D PCPU_GET(cpuid); (kgdb) where #0 sched_switch (td=3D0xfffff800513649a0, newtd=3D, = flags=3D) at /usr/src/sys/kern/sched_ule.c:1969 #1 0xffffffff805a3a18 in mi_switch (flags=3D260, newtd=3D0x0) at /usr/src= /sys/kern/kern_synch.c:470 #2 0xffffffff805ea15a in sleepq_wait (wchan=3D0x0, pri=3D0) at /usr/src/s= ys/kern/subr_sleepqueue.c:631 #3 0xffffffff80530509 in _cv_wait (cvp=3D0xfffff8002678ea98, lock=3D0xfff= ff8002678ea78) at /usr/src/sys/kern/kern_condvar.c:139 #4 0xffffffff81930bbb in zio_wait (zio=3D) at /usr/s= rc/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1535 #5 0xffffffff818e4871 in dsl_pool_sync (dp=3D0xfffff80047dfd000, txg=3D76= ) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:540 #6 0xffffffff81903653 in spa_sync (spa=3D0xfffff8009dfe2000, txg=3D76) at= /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:6734 #7 0xffffffff8190ccfa in txg_sync_thread (arg=3D0xfffff80047dfd000) at /u= sr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:517 #8 0xffffffff80556edc in fork_exit (callout=3D0xffffffff8190c970 , arg=3D0xfffff80047dfd000, frame=3D0xfffffe011c27bc00) at /usr/sr= c/sys/kern/kern_fork.c:1011 #9 0xffffffff8085b91e in fork_trampoline () at /usr/src/sys/amd64/amd64/e= xception.S:609 #10 0x0000000000000000 in ?? () (kgdb) f 6 #6 0xffffffff81903653 in spa_sync (spa=3D0xfffff8009dfe2000, txg=3D76) at= /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:6734 (kgdb) p spa->spa_name $3 =3D 0xfffff8009dfe2000 "test" Fabian --Sig_/4TEXJLc2m1/KqUySJtNSQbe Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlYyEp4ACgkQBYqIVf93VJ0sxgCfYP3pBHRJnWy1IZIxJiNERYDN NoAAn2SGahuiiTpA4II8Q3kF+1815Pvc =83oi -----END PGP SIGNATURE----- --Sig_/4TEXJLc2m1/KqUySJtNSQbe--