From owner-freebsd-current@FreeBSD.ORG  Sun Sep  7 13:02:31 2014
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 60C1C3DD
 for <freebsd-current@freebsd.org>; Sun,  7 Sep 2014 13:02:31 +0000 (UTC)
Received: from smtprelay06.ispgateway.de (smtprelay06.ispgateway.de
 [80.67.31.102])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B2C141D64
 for <freebsd-current@freebsd.org>; Sun,  7 Sep 2014 13:02:29 +0000 (UTC)
Received: from [87.79.252.204] (helo=fabiankeil.de)
 by smtprelay06.ispgateway.de with esmtpsa (SSLv3:AES128-SHA:128)
 (Exim 4.68) (envelope-from <freebsd-listen@fabiankeil.de>)
 id 1XQc6k-0003XZ-Vt
 for freebsd-current@freebsd.org; Sun, 07 Sep 2014 15:02:19 +0200
Date: Sun, 7 Sep 2014 15:02:19 +0200
From: Fabian Keil <freebsd-listen@fabiankeil.de>
To: FreeBSD Current <freebsd-current@freebsd.org>
Subject: ZFS-related panic: "possible" spa->spa_errlog_lock deadlock
Message-ID: <492dbacb.5942cc9b@fabiankeil.de>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 boundary="Sig_/RF=29s9FFtRhTIQX5d+=r1I"; protocol="application/pgp-signature"
X-Df-Sender: Nzc1MDY3
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Sep 2014 13:02:31 -0000

--Sig_/RF=29s9FFtRhTIQX5d+=r1I
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Using a kernel built from FreeBSD 11.0-CURRENT r271182 I got the following
panic yesterday:

[...]
Unread portion of the kernel message buffer:
[6880] panic: deadlkres: possible deadlock detected for 0xfffff80015289490,=
 blocked for 1800503 ticks
[6880]=20
[6880] cpuid =3D 1
[6880] KDB: stack backtrace:
[6880] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe=
009432e9e0
[6880] kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe009432ea90
[6880] panic() at panic+0x1c1/frame 0xfffffe009432eb50
[6880] deadlkres() at deadlkres+0x588/frame 0xfffffe009432ebb0
[6880] fork_exit() at fork_exit+0x9a/frame 0xfffffe009432ebf0
[6880] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe009432ebf0
[6880] --- trap 0, rip =3D 0, rsp =3D 0xfffffe009432ecb0, rbp =3D 0 ---
[6880] KDB: enter: panic
[6880] Uptime: 1h54m40s
[6880] Dumping 354 out of 1973 MB:..5%..14%..23%..32%..41%..55%..64%..73%..=
82%..91%

Reading symbols from /boot/kernel/zfs.ko.symbols...done.
[...]
Loaded symbols for /boot/kernel/profile.ko.symbols
#0  doadump (textdump=3D1) at pcpu.h:219
219	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) where
#0  doadump (textdump=3D1) at pcpu.h:219
#1  0xffffffff80597bad in kern_reboot (howto=3D260) at /usr/src/sys/kern/ke=
rn_shutdown.c:447
#2  0xffffffff80598100 in panic (fmt=3D<value optimized out>) at /usr/src/s=
ys/kern/kern_shutdown.c:746
#3  0xffffffff80539b98 in deadlkres () at /usr/src/sys/kern/kern_clock.c:240
#4  0xffffffff8055e8da in fork_exit (callout=3D0xffffffff80539610 <deadlkre=
s>, arg=3D0x0, frame=3D0xfffffe009432ec00) at /usr/src/sys/kern/kern_fork.c=
:977
#5  0xffffffff8083fb5e in fork_trampoline () at /usr/src/sys/amd64/amd64/ex=
ception.S:605
#6  0x0000000000000000 in ?? ()
Current language:  auto; currently minimal
(kgdb) set $td=3D(struct thread *)0xfffff80015289490
(kgdb) tid $td->td_tid
[Switching to thread 1184 (Thread 101428)]#0  sched_switch (td=3D0xfffff800=
15289490, newtd=3D<value optimized out>, flags=3D<value optimized out>) at =
/usr/src/sys/kern/sched_ule.c:1932
1932			cpuid =3D PCPU_GET(cpuid);
(kgdb) bt
#0  sched_switch (td=3D0xfffff80015289490, newtd=3D<value optimized out>, f=
lags=3D<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
#1  0xffffffff805a23c1 in mi_switch (flags=3D260, newtd=3D0x0) at /usr/src/=
sys/kern/kern_synch.c:493
#2  0xffffffff805e4bca in sleepq_wait (wchan=3D0x0, pri=3D0) at /usr/src/sy=
s/kern/subr_sleepqueue.c:631
#3  0xffffffff805a12b2 in _sx_xlock_hard (sx=3D0xfffff800062ed820, tid=3D18=
446735277971510416, opts=3D<value optimized out>, file=3D0x0, line=3D0) at =
/usr/src/sys/kern/kern_sx.c:676
#4  0xffffffff805a0add in _sx_xlock (sx=3D0x0, opts=3D0, file=3D<value opti=
mized out>, line=3D0) at sx.h:154
#5  0xffffffff8114a691 in spa_get_errlog_size (spa=3D0xfffff800062ed000) at=
 /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_errlog.c:142
#6  0xffffffff8113f549 in spa_get_stats (name=3D0xfffffe0006126000 "spacelo=
op", config=3D0xfffffe00950e58e8, altroot=3D0xfffffe0006126430 "", buflen=
=3D2048)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:3287
#7  0xffffffff81189a45 in zfs_ioc_pool_stats (zc=3D0xfffffe0006126000) at /=
usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:1656
#8  0xffffffff81187290 in zfsdev_ioctl (dev=3D<value optimized out>, zcmd=
=3D<value optimized out>, arg=3D<value optimized out>, flag=3D<value optimi=
zed out>, td=3D<value optimized out>)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:=
6136
#9  0xffffffff80464a55 in devfs_ioctl_f (fp=3D0xfffff80017107f50, com=3D322=
2821381, data=3D0xfffff8004fb4b740, cred=3D<value optimized out>, td=3D0xff=
fff80015289490) at /usr/src/sys/fs/devfs/devfs_vnops.c:757
#10 0xffffffff805f3c3d in kern_ioctl (td=3D0xfffff80015289490, fd=3D<value =
optimized out>, com=3D0) at file.h:311
#11 0xffffffff805f381c in sys_ioctl (td=3D0xfffff80015289490, uap=3D0xfffff=
e00950e5b80) at /usr/src/sys/kern/sys_generic.c:702
#12 0xffffffff8085c2db in amd64_syscall (td=3D0xfffff80015289490, traced=3D=
0) at subr_syscall.c:133
#13 0xffffffff8083f90b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exce=
ption.S:390
#14 0x00000008019fc3da in ?? ()
(kgdb) f 3
#3  0xffffffff805a12b2 in _sx_xlock_hard (sx=3D0xfffff800062ed820, tid=3D18=
446735277971510416, opts=3D<value optimized out>, file=3D0x0, line=3D0) at =
/usr/src/sys/kern/kern_sx.c:676
676				sleepq_wait(&sx->lock_object, 0);
(kgdb) p sx->lock_object
$14 =3D {lo_name =3D 0xffffffff81202163 "spa->spa_errlog_lock", lo_flags =
=3D 40960000, lo_data =3D 0, lo_witness =3D 0x0}

This happened several minutes after a couple of zpool processes
stopped responding while accessing information about the following
pool:

fk@r500 ~ $zpool status -v spaceloop
  pool: spaceloop
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 2.98M in 16h27m with 0 errors on Sun Sep  7 12:25:36=
 2014
config:

	NAME                   STATE     READ WRITE CKSUM
	spaceloop              ONLINE       0     0     0
	  label/spaceloop.eli  ONLINE       0     0    26

errors: No known data errors

Dump time was 21:58:48, the vdev disappeared a while before that,
therefore I believe the "possible" deadlock is a real one:

Sep  6 21:18:09 r500 kernel: [4441] GEOM_ELI: g_eli_write_done() failed (er=
ror=3D5) label/spaceloop.eli[WRITE(offset=3D2235700736, length=3D16384)]
Sep  6 21:18:09 r500 kernel: [4441] GEOM_ELI: g_eli_read_done() failed (err=
or=3D5) label/spaceloop.eli[READ(offset=3D4008976384, length=3D8192)]
Sep  6 21:18:09 r500 kernel: [4441] GEOM_ELI: g_eli_read_done() failed (err=
or=3D5) label/spaceloop.eli[READ(offset=3D4009238528, length=3D8192)]
Sep  6 21:18:09 r500 kernel: [4441] GEOM_ELI: g_eli_read_done() failed (err=
or=3D5) label/spaceloop.eli[READ(offset=3D270336, length=3D8192)]
Sep  6 21:18:27 r500 kernel: [4459] (da1:umass-sim1:1:0:0): WRITE(10). CDB:=
 2a 00 00 60 65 0c 00 00 40 00=20
Sep  6 21:18:27 r500 kernel: [4459] (da1:umass-sim1:1:0:0): CAM status: CCB=
 request completed with an error
Sep  6 21:18:27 r500 kernel: [4459] (da1:umass-sim1:1:0:0): Retrying command
Sep  6 21:18:45 r500 kernel: [4477] (da1:umass-sim1:1:0:0): WRITE(10). CDB:=
 2a 00 00 60 65 0c 00 00 40 00=20
Sep  6 21:18:45 r500 kernel: [4477] (da1:umass-sim1:1:0:0): CAM status: CCB=
 request completed with an error
Sep  6 21:18:45 r500 kernel: [4477] (da1:umass-sim1:1:0:0): Retrying command
Sep  6 21:19:03 r500 kernel: [4495] (da1:umass-sim1:1:0:0): WRITE(10). CDB:=
 2a 00 00 60 65 0c 00 00 40 00=20
Sep  6 21:19:03 r500 kernel: [4495] (da1:umass-sim1:1:0:0): CAM status: CCB=
 request completed with an error
Sep  6 21:19:03 r500 kernel: [4495] (da1:umass-sim1:1:0:0): Retrying command
Sep  6 21:19:21 r500 kernel: [4514] (da1:umass-sim1:1:0:0): WRITE(10). CDB:=
 2a 00 00 60 65 0c 00 00 40 00=20
Sep  6 21:19:21 r500 kernel: [4514] (da1:umass-sim1:1:0:0): CAM status: CCB=
 request completed with an error
Sep  6 21:19:21 r500 kernel: [4514] (da1:umass-sim1:1:0:0): Retrying command
Sep  6 21:19:40 r500 kernel: [4532] (da1:umass-sim1:1:0:0): WRITE(10). CDB:=
 2a 00 00 60 65 0c 00 00 40 00=20
Sep  6 21:19:40 r500 kernel: [4532] (da1:umass-sim1:1:0:0): CAM status: CCB=
 request completed with an error
Sep  6 21:19:40 r500 kernel: [4532] (da1:umass-sim1:1:0:0): Error 5, Retrie=
s exhausted
Sep  6 21:19:40 r500 kernel: [4532] GEOM_ELI: g_eli_write_done() failed (er=
ror=3D5) label/spaceloop.eli[WRITE(offset=3D3234469888, length=3D32768)]
[...]
Sep  6 21:29:46 r500 kernel: [5138] GEOM_ELI: Device label/takems.eli destr=
oyed.
Sep  6 21:29:52 r500 kernel: [5144] ugen0.3: <SMI Corporation> at usbus0 (d=
isconnected)
Sep  6 21:29:52 r500 kernel: [5144] umass2: at uhub0, port 1, addr 3 (disco=
nnected)
Sep  6 21:29:52 r500 kernel: [5144] da2 at umass-sim2 bus 2 scbus4 target 0=
 lun 0
Sep  6 21:29:52 r500 kernel: [5144] da2: <takeMS USB Mini 1100> s/n AA04012=
900007508 detached
Sep  6 21:29:52 r500 kernel: [5144] pass4 at umass-sim2 bus 2 scbus4 target=
 0 lun 0
Sep  6 21:29:52 r500 kernel: [5144] pass4: <takeMS USB Mini 1100> s/n AA040=
12900007508 detached
Sep  6 21:29:52 r500 kernel: [5144] (pass4:umass-sim2:2:0:0): Periph destro=
yed
Sep  6 21:29:52 r500 kernel: [5144] (da2:umass-sim2:2:0:0): Periph destroyed
Sep  6 21:44:56 r500 kernel: [6049] umass1: at uhub1, port 1, addr 3 (disco=
nnected)
Sep  6 21:44:56 r500 kernel: [6049] da1 at umass-sim1 bus 1 scbus3 target 0=
 lun 0
Sep  6 21:44:56 r500 kernel: [6049] da1: <General USB Flash Disk 1.0> s/n 0=
000000000002511 detached
Sep  6 21:44:56 r500 kernel: [6049] pass3 at umass-sim1 bus 1 scbus3 target=
 0 lun 0
Sep  6 21:44:56 r500 kernel: [6049] pass3: <General USB Flash Disk 1.0> s/n=
 0000000000002511 detached
Sep  6 21:44:56 r500 kernel: [6049] (pass3:umass-sim1:1:0:0): Periph destro=
yed
Sep  6 21:44:56 r500 kernel: [6049] GEOM_ELI: Device label/spaceloop.eli de=
stroyed.
Sep  6 21:44:56 r500 kernel: [6049] GEOM_ELI: Detached label/spaceloop.eli =
on last close.
Sep  6 21:44:56 r500 kernel: [6049] (da1:umass-sim1:1:0:0): Periph destroyed
Sep  6 21:44:57 r500 ZFS: vdev is removed, pool_guid=3D13593515477493551420=
 vdev_guid=3D2311923824790456307
Sep  6 21:44:57 r500 kernel: [6049] umass1: <General USB Flash Disk, class =
0/0, rev 2.00/1.00, addr 3> on usbus1
Sep  6 21:44:57 r500 kernel: [6049] umass1:  SCSI over Bulk-Only; quirks =
=3D 0x4101
Sep  6 21:44:57 r500 kernel: [6049] umass1:3:1: Attached to scbus3
Sep  6 21:46:10 r500 kernel: [6122] (probe0:umass-sim1:1:0:0): INQUIRY. CDB=
: 12 00 00 00 24 00=20
Sep  6 21:46:10 r500 kernel: [6122] (probe0:umass-sim1:1:0:0): CAM status: =
CCB request completed with an error
Sep  6 21:46:10 r500 kernel: [6122] (probe0:umass-sim1:1:0:0): Retrying com=
mand
Sep  6 21:46:45 r500 kernel: [6157] ugen1.3: <General> at usbus1 (disconnec=
ted)
Sep  6 21:46:45 r500 kernel: [6157] umass1: at uhub1, port 1, addr 3 (disco=
nnected)
Sep  6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): INQUIRY. CDB=
: 12 00 00 00 24 00=20
Sep  6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): CAM status: =
CCB request completed with an error
Sep  6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): Retrying com=
mand
Sep  6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): INQUIRY. CDB=
: 12 00 00 00 24 00=20
Sep  6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): CAM status: =
CCB request completed with an error
Sep  6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): Retrying com=
mand
Sep  6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): INQUIRY. CDB=
: 12 00 00 00 24 00=20
Sep  6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): CAM status: =
CCB request completed with an error
Sep  6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): Retrying com=
mand
Sep  6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): INQUIRY. CDB=
: 12 00 00 00 24 00=20
Sep  6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): CAM status: =
CCB request completed with an error
Sep  6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): Error 5, Ret=
ries exhausted

Two other pools, one using USB as well, were unaffected until
the panic was triggered.

A pleasant surprise was that the system entered ddb from Xorg
and dumping on /dev/ada0s1b worked despite /dev/ada0s1b.eli being
configured as swap device. I wasn't aware that this was working now.

Fabian

--Sig_/RF=29s9FFtRhTIQX5d+=r1I
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlQMV1sACgkQBYqIVf93VJ30SwCggvna+u5WCWCr2ysP3JaM0sP6
xdEAoIPZ7NS8C3Pua3cW1cFYAG1IQY/Z
=2tkT
-----END PGP SIGNATURE-----

--Sig_/RF=29s9FFtRhTIQX5d+=r1I--