Date: Mon, 7 Mar 2011 20:06:34 +0100 From: Fabian Keil <freebsd-listen@fabiankeil.de> To: freebsd-fs@FreeBSD.org Cc: Pawel Jakub Dawidek <pjd@FreeBSD.org> Subject: ZFSv28: log_sysevent: type 19 is not implemented (was: HEADS UP: ZFSv28 is in!) Message-ID: <20110307200634.3c0f92df@r500.local> In-Reply-To: <20110228192129.119cac0c@r500.local> References: <20110227202957.GD1992@garage.freebsd.pl> <20110228192129.119cac0c@r500.local>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/oOazOkkVEQVl61drUilc=as
Content-Type: multipart/mixed; boundary="MP_/L/L31YZZ9emswjliN6_sjlJ"
--MP_/L/L31YZZ9emswjliN6_sjlJ
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Fabian Keil <freebsd-listen@fabiankeil.de> wrote:
> Pawel Jakub Dawidek <pjd@FreeBSD.org> wrote:
>=20
> > I just committed ZFSv28 to HEAD.
> Anyway, the things I tested so far (zfs/zpool upgrade,
> delegation, send, receive, snapshot) worked fine.
After unintentionally unplugging an USB disk with this
zpool on it:
| fk@r500 ~ $zpool status toshiba
| pool: toshiba
| state: ONLINE
| scan: none requested
| config:
|
| NAME STATE READ WRITE CKSUM
| toshiba ONLINE 0 0 0
| label/toshiba.eli ONLINE 0 0 0
|
| errors: No known data errors
the system became sluggish and /var/log/messages got spammed
with error messages:
Mar 6 21:33:10 r500 kernel: ugen7.2: <TOSHIBA> at usbus7 (disconnected)
Mar 6 21:33:10 r500 kernel: umass1: at uhub7, port 1, addr 2 (disconnected)
Mar 6 21:33:10 r500 kernel: (pass3:umass-sim1:1:0:0): lost device
Mar 6 21:33:10 r500 kernel: (pass3:umass-sim1:1:0:0): removing device entry
Mar 6 21:33:10 r500 kernel: (da1:umass-sim1:1:0:0): lost device
Mar 6 21:33:10 r500 kernel: (da1:umass-sim1:1:0:0): Synchronize cache fail=
ed, status =3D=3D 0xa, scsi status =3D=3D 0x0
Mar 6 21:33:10 r500 kernel: (da1:umass-sim1:1:0:0): removing device entry
Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented
Mar 6 21:33:10 r500 last message repeated 50 times
Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlog_sy=
sevent: type 19 is not im
Mar 6 21:33:10 r500 kernel: plemented
Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented
Mar 6 21:33:10 r500 last message repeated 238 times
Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlog_sy=
sevent: type 19 is
Mar 6 21:33:10 r500 kernel: not implemented
Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlog_sy=
sevent: type 19 is not impleme
Mar 6 21:33:10 r500 kernel: nted
Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented
Mar 6 21:33:10 r500 last message repeated 87 times
Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlog_sy=
sevent: type 19 is not implemented
Mar 6 21:33:10 r500 kernel:=20
Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented
Mar 6 21:33:10 r500 last message repeated 47 times
Mar 6 21:33:11 r500 kernel: type 19 is not implemented
Mar 6 21:33:11 r500 kernel: log_sysevent: type 19 is not implemented
Mar 6 21:33:11 r500 last message repeated 1527 times
Mar 6 21:33:11 r500 kernel: log_sysevent: type 19 is not implementedlog_sy=
sevent: type 19 is not implemented
Mar 6 21:33:11 r500 kernel:=20
Mar 6 21:33:11 r500 kernel: log_sysevent: type 19 is not implemented
fk@r500 ~ $zcat /var/log/messages.0.bz2 | grep -c "type 19 is not implement=
ed"
34101
At the time of the unplugging, a video was read from the pool.
When trying to export the pool, zpool export hung.
After rebooting the system, the message was shown two
more times between the creation of the provider for the
main zpool and the swap device:
Mar 6 21:43:20 r500 kernel: GEOM_ELI: Device ada0s1d.eli created.
Mar 6 21:43:20 r500 kernel: GEOM_ELI: Encryption: AES-CBC 128
Mar 6 21:43:20 r500 kernel: GEOM_ELI: Crypto: software
Mar 6 21:43:20 r500 kernel: Trying to mount root from ufs:/dev/ada0s1a [rw=
]...
Mar 6 21:43:20 r500 kernel: WARNING: / was not properly dismounted
Mar 6 21:43:20 r500 kernel: start_init: trying /sbin/init
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense data
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 =
0 0 0 0 0 0 0 0=20
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status E=
rror
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condit=
ion
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc=
:3a,1 (Medium not present - tray closed)
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable error
Mar 6 21:43:20 r500 kernel: log_sysevent: type 19 is not implemented
Mar 6 21:43:20 r500 kernel: log_sysevent: type 19 is not implemented
Mar 6 21:43:20 r500 kernel: GEOM_ELI: Device ada0s1b.eli created.
Mar 6 21:43:20 r500 kernel: GEOM_ELI: Encryption: AES-XTS 256
Mar 6 21:43:20 r500 kernel: GEOM_ELI: Crypto: software
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense data
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 =
0 0 0 0 0 0 0 0=20
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status E=
rror
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condit=
ion
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc=
:3a,1 (Medium not present - tray closed)
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable error
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense data
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 =
0 0 0 0 0 0 0 0=20
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status E=
rror
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condit=
ion
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc=
:3a,1 (Medium not present - tray closed)
Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable error
Mar 6 21:43:20 r500 kernel: lo1: bpf attached
Mar 6 21:43:20 r500 kernel: wlan0: bpf attached
Mar 6 21:43:20 r500 kernel: wlan0: bpf attached
Mar 6 21:43:20 r500 kernel: wlan0: Ethernet address: 00:[...]
Mar 6 21:43:20 r500 kernel: firmware: 'iwn5000fw' version 0: 353240 bytes =
loaded at 0xffffffff814120b0
Mar 6 21:43:20 r500 kernel: firmware: 'iwn5000fw' version 0: 353240 bytes =
loaded at 0xffffffff814120b0
Mar 6 21:43:20 r500 kernel: bge0: Disabling fastboot
Mar 6 21:43:20 r500 kernel: bge0: Disabling fastboot
Mar 6 21:43:20 r500 savecore: /dev/ada0s1b: Operation not permitted
Mar 6 21:43:21 r500 named[2219]: starting BIND 9.6.3 -t /var/named -u bind
Mar 6 21:43:21 r500 named[2219]: built with '--prefix=3D/usr' '--infodir=
=3D/usr/share/info' '--mandir=3D/usr/share/man' '--enable-threads' '--enabl=
e-getifaddrs' '--disable-linux-caps' '--with-openssl=3D/usr' '--with-random=
dev=3D/dev/random' '--without-idn' '--without-libxml2'
Mar 6 21:43:21 r500 named[2219]: command channel listening on 127.0.0.1#953
Mar 6 21:43:21 r500 named[2219]: command channel listening on ::1#953
Mar 6 21:43:21 r500 named[2219]: the working directory is not writable
Mar 6 21:43:21 r500 named[2219]: running
Mar 6 21:43:53 r500 wpa_supplicant[512]: WPA: Group rekeying completed wit=
h 00:[...] [GTK=3DTKIP]
Mar 6 21:44:14 r500 syslogd: exiting on signal 15
As the boot process got stuck with no additional messages
printed, I rebooted into single-user mode, exported the
faulted pool and finished the boot process. The system
came back normally and the pool could be imported without
issues
fk@r500 ~ $grep zfs /boot/loader.conf | grep -v "^ *#"
zfs_load=3D"YES"
I used the attached patch to stop the log spam, but the
main issue seems to be reproducible. The top output after
detaching the pool:
last pid: 4985; load averages: 10.47, 3.96, 2.02 up 0+02:01:49 19:20=
:49
552 processes: 12 running, 518 sleeping, 22 waiting
CPU: 1.2% user, 0.0% nice, 98.8% system, 0.0% interrupt, 0.0% idle
Mem: 267M Active, 95M Inact, 859M Wired, 2032K Cache, 7872K Buf, 692M Free
Swap: 2048M Total, 2048M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN=
D =
=20
2 root 1 -8 - 0K 16K CPU1 1 1:26 96.97% g_event
0 root 387 -8 0 0K 6192K - 0 2:11 80.66% kernel
1702 root 1 76 0 6276K 796K RUN 0 0:15 11.96% devd
11 root 2 155 ki31 0K 32K RUN 0 26:46 0.00% idle
3395 fk 1 22 0 465M 329M select 0 7:27 0.00% Xorg
3398 fk 1 20 0 96120K 9704K RUN 0 0:49 0.00% e16
12 root 22 -84 - 0K 352K WAIT 1 0:37 0.00% intr
26 root 1 20 - 0K 16K geli:w 0 0:31 0.00% g_eli[=
0] ada0s1d
27 root 1 22 - 0K 16K geli:w 1 0:29 0.00% g_eli[=
1] ada0s1d
The stuck zpool's stack:
fk@r500 ~ $sudo procstat -kk $(pgrep zpool)
PID TID COMM TDNAME KSTACK =
=20
5087 100490 zpool initial thread mi_switch+0x174 sleepq_wait+=
0x42 __lockmgr_args+0x7a3 vop_stdlock+0x39 VOP_LOCK1_APV+0x52 _vn_lock+0x47=
vflush+0x125 zfs_umount+0x9f dounmount+0x31e unmount+0x38b syscallenter+0x=
331 syscall+0x4b Xfast_syscall+0xdd=20
When I re-attached the disk, g_event kept eating cpu.
I'm using a script to attach and import pools on USB devices and as
it currently doesn't handle faulted pools, it tried to import the
already faulted pool, which resulted in a zpool core dump.
Mar 7 19:27:25 r500 sudo: fk : TTY=3Dttyv0 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/geli attach -j - -k /home/fk/geli-keys/toshiba.ke=
y /dev/label/toshiba
Mar 7 19:27:33 r500 sudo: fk : TTY=3Dttyv0 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zpool import toshiba
Mar 7 19:27:33 r500 kernel: GEOM_ELI: Device label/toshiba.eli created.
Mar 7 19:27:33 r500 kernel: GEOM_ELI: Encryption: AES-CBC 128
Mar 7 19:27:33 r500 kernel: GEOM_ELI: Crypto: software
Mar 7 19:27:33 r500 kernel: pid 5206 (zpool), uid 0: exited on signal 11 (=
core dumped)
fk@r500 ~ $gdb /sbin/zpool zpool.core=20
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
[...]
Loaded symbols for /libexec/ld-elf.so.1
#0 0x000000080085ec7d in avl_insert () from /lib/libavl.so.2
[New Thread 802807400 (LWP 100524/initial thread)]
(gdb) where
#0 0x000000080085ec7d in avl_insert () from /lib/libavl.so.2
#1 0x000000080085ed5e in avl_add () from /lib/libavl.so.2
#2 0x0000000801ae9105 in zpool_find_import_cached () from /lib/libzfs.so.2
#3 0x0000000000409deb in zpool_do_import ()
#4 0x0000000000406fa9 in main ()
This time rebooting into single-user mode to export the faulted
pool wasn't necessary, but I doubt that the patch had anything
to do with it.
The second time I was using today's CURRENT instead of yesterday's,
but I don't think there were any ZFS-related commits.
Fabian
--MP_/L/L31YZZ9emswjliN6_sjlJ
Content-Type: text/x-patch
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
filename=0001-In-log_sysevent-only-warn-about-the-unsupported-type.patch
=46rom 9689a1342d30bb0408afd694b4146cf0d84b61c3 Mon Sep 17 00:00:00 2001
From: Fabian Keil <fk@fabiankeil.de>
Date: Mon, 7 Mar 2011 12:59:03 +0100
Subject: [PATCH] In log_sysevent(), only warn about the unsupported type if=
the type is different than the last unsupported one.
---
.../compat/opensolaris/kern/opensolaris_sysevent.c | 13 +++++++++++--
1 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_sysevent.c b/sys/=
cddl/compat/opensolaris/kern/opensolaris_sysevent.c
index ad9ba00..bc697ff 100644
--- a/sys/cddl/compat/opensolaris/kern/opensolaris_sysevent.c
+++ b/sys/cddl/compat/opensolaris/kern/opensolaris_sysevent.c
@@ -286,9 +286,18 @@ log_sysevent(sysevent_t *evp, int flag, sysevent_id_t =
*eid)
break;
}
default:
- printf("%s: type %d is not implemented\n", __func__,
- nvpair_type(elem));
+ {
+ static int last_unsupported_type;
+ int unsupported_type =3D nvpair_type(elem);
+
+ if (last_unsupported_type !=3D unsupported_type)
+ {
+ printf("%s: type %d is not implemented\n",
+ __func__, unsupported_type);
+ last_unsupported_type =3D unsupported_type;
+ }
break;
+ }
}
}
=20
--=20
1.7.4.1
--MP_/L/L31YZZ9emswjliN6_sjlJ--
--Sig_/oOazOkkVEQVl61drUilc=as
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (FreeBSD)
iEYEARECAAYFAk11LL0ACgkQBYqIVf93VJ0ZWwCfd33ylh/Wa1UEAEzq1DLAeVV3
chEAn1EbtLniRqrV0A/UijEnl6EAZ9TL
=vRYt
-----END PGP SIGNATURE-----
--Sig_/oOazOkkVEQVl61drUilc=as--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110307200634.3c0f92df>
