Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 7 Sep 2014 19:00:33 +0200
From:      Fabian Keil <freebsd-listen@fabiankeil.de>
To:        <freebsd-current@freebsd.org>
Subject:   Re: ZFS-related panic: "possible" spa->spa_errlog_lock deadlock
Message-ID:  <69f90aa2.5d2989ae@fabiankeil.de>
In-Reply-To: <540C8039.7010309@delphij.net>
References:  <492dbacb.5942cc9b@fabiankeil.de> <540C66AC.8070809@delphij.net> <4fa875ba.3cc970d7@fabiankeil.de> <540C8039.7010309@delphij.net>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/A5pQFSMRW5iUxcWMFYlZxJV
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Xin Li <delphij@delphij.net> wrote:

> On 9/7/14 11:23 PM, Fabian Keil wrote:
> > Xin Li <delphij@delphij.net> wrote:
> >=20
> >> On 9/7/14 9:02 PM, Fabian Keil wrote:
> >>> Using a kernel built from FreeBSD 11.0-CURRENT r271182 I got
> >>> the following panic yesterday:
> >>>=20
> >>> [...] Unread portion of the kernel message buffer: [6880]
> >>> panic: deadlkres: possible deadlock detected for
> >>> 0xfffff80015289490, blocked for 1800503 ticks
> >>=20
> >> Any chance to get all backtraces (e.g. thread apply all bt full
> >> 16)? I think a different thread that held the lock have been
> >> blocked, probably related to your disconnected vdev.
> >=20
> > Output of "thread apply all bt full 16" is available at:=20
> > http://www.fabiankeil.de/tmp/freebsd/kgdb-output-spa_errlog_lock-deadlo=
ck.txt
> >
> >  A lot of the backtraces prematurely end with "Cannot access memory
> > at address", therefore I also added "thread apply all bt" output.
> >=20
> > Apparently there are at least two additional threads blocking below
> > spa_get_stats():
[...]
> Yes, thread 1182 owned the lock and is waiting for the zio be done.
> Other threads that wanted the lock would have to wait.
>=20
> I don't have much clue why the system entered this state, however, as
> the operations should have errored out (the GELI device is gone on
> 21:44:56 based on your log, which suggests all references were closed)
> instead of waiting.

It occurred to me that I have relevant auth.log entries as well:

Sep  6 20:54:31 r500 sudo:       fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/geli attach -j - /dev/label/spaceloop
Sep  6 20:54:44 r500 sudo:       fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/geli attach -j - /dev/label/takems
Sep  6 20:54:51 r500 sudo:       fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zpool import -d /dev/label takems
Sep  6 20:55:30 r500 sudo:       fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zfs send -i @2014-08-13_23:10 tank/home/fk@2014-0=
9-06_19:56
Sep  6 20:55:30 r500 sudo:       fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zfs receive -v -u -F spaceloop/backup/r500/tank/h=
ome/fk
Sep  6 20:55:46 r500 sudo:       fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zfs send -i @2014-08-13_23:10 tank/home/fk@2014-0=
9-06_19:56
Sep  6 20:55:46 r500 sudo:       fk : TTY=3Dpts/5 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zfs receive -v -u -F spaceloop/backup/r500/tank/h=
ome/fk
[...]
Sep  6 21:28:47 r500 sudo:       fk : TTY=3Dpts/6 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zpool status spaceloop
Sep  6 21:29:43 r500 sudo:       fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zpool export takems
Sep  6 21:29:46 r500 sudo:       fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/geli detach label/takems.eli
Sep  6 21:30:08 r500 sudo:       fk : TTY=3Dpts/10 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zpool clear spaceloop
Sep  6 21:44:16 r500 sudo:       fk : TTY=3Dpts/11 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/usr/sbin/usbconfig
Sep  6 21:44:56 r500 sudo:       fk : TTY=3Dpts/11 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/usr/sbin/usbconfig -d 1.3 reset
Sep  6 21:46:26 r500 sudo:       fk : TTY=3Dpts/1 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/usr/sbin/usbconfig
Sep  6 22:03:33 r500 login: login on ttyv0 as fk

IIRC, I tried the USB reset because the "zfs receive ... spaceloop/*",
"zpool status spaceloop" and "zpool clear spaceloop" processes (and some
that weren't called with sudo and thus weren't logged) got stuck and while
it caused the kernel to close label/spaceloop.eli as intended, it did not
noticeable affect the deadlocked processes.

I don't remember exactly why the same ZFS stream was sent twice, but the mo=
st
likely explanation seems to be that I aborted the operation before it was d=
one
and it's conceivable that this left the system in a state that caused the s=
econd
attempt to get stuck.

Fabian

--Sig_/A5pQFSMRW5iUxcWMFYlZxJV
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlQMjzEACgkQBYqIVf93VJ3JhQCfYxpbckkuu4k0LhEBrdFEz+oe
qVEAoJ+WeRJfPjfnVreyC/aMiUAzKNZg
=tiZ7
-----END PGP SIGNATURE-----

--Sig_/A5pQFSMRW5iUxcWMFYlZxJV--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?69f90aa2.5d2989ae>