Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Mar 2011 20:06:34 +0100
From:      Fabian Keil <freebsd-listen@fabiankeil.de>
To:        freebsd-fs@FreeBSD.org
Cc:        Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject:   ZFSv28: log_sysevent: type 19 is not implemented (was: HEADS UP: ZFSv28 is in!)
Message-ID:  <20110307200634.3c0f92df@r500.local>
In-Reply-To: <20110228192129.119cac0c@r500.local>
References:  <20110227202957.GD1992@garage.freebsd.pl> <20110228192129.119cac0c@r500.local>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/oOazOkkVEQVl61drUilc=as
Content-Type: multipart/mixed; boundary="MP_/L/L31YZZ9emswjliN6_sjlJ"

--MP_/L/L31YZZ9emswjliN6_sjlJ
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Fabian Keil <freebsd-listen@fabiankeil.de> wrote:

> Pawel Jakub Dawidek <pjd@FreeBSD.org> wrote:
>=20
> > I just committed ZFSv28 to HEAD.

> Anyway, the things I tested so far (zfs/zpool upgrade,
> delegation, send, receive, snapshot) worked fine.

After unintentionally unplugging an USB disk with this
zpool on it:

| fk@r500 ~ $zpool status toshiba
|   pool: toshiba
|  state: ONLINE
|  scan: none requested
| config:
|
|         NAME                 STATE     READ WRITE CKSUM
|         toshiba              ONLINE       0     0     0
|           label/toshiba.eli  ONLINE       0     0     0
|
| errors: No known data errors

the system became sluggish and /var/log/messages got spammed
with error messages:

Mar  6 21:33:10 r500 kernel: ugen7.2: <TOSHIBA> at usbus7 (disconnected)
Mar  6 21:33:10 r500 kernel: umass1: at uhub7, port 1, addr 2 (disconnected)
Mar  6 21:33:10 r500 kernel: (pass3:umass-sim1:1:0:0): lost device
Mar  6 21:33:10 r500 kernel: (pass3:umass-sim1:1:0:0): removing device entry
Mar  6 21:33:10 r500 kernel: (da1:umass-sim1:1:0:0): lost device
Mar  6 21:33:10 r500 kernel: (da1:umass-sim1:1:0:0): Synchronize cache fail=
ed, status =3D=3D 0xa, scsi status =3D=3D 0x0
Mar  6 21:33:10 r500 kernel: (da1:umass-sim1:1:0:0): removing device entry
Mar  6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented
Mar  6 21:33:10 r500 last message repeated 50 times
Mar  6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlog_sy=
sevent: type 19 is not im
Mar  6 21:33:10 r500 kernel: plemented
Mar  6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented
Mar  6 21:33:10 r500 last message repeated 238 times
Mar  6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlog_sy=
sevent: type 19 is
Mar  6 21:33:10 r500 kernel: not implemented
Mar  6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlog_sy=
sevent: type 19 is not impleme
Mar  6 21:33:10 r500 kernel: nted
Mar  6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented
Mar  6 21:33:10 r500 last message repeated 87 times
Mar  6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlog_sy=
sevent: type 19 is not implemented
Mar  6 21:33:10 r500 kernel:=20
Mar  6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented
Mar  6 21:33:10 r500 last message repeated 47 times
Mar  6 21:33:11 r500 kernel: type 19 is not implemented
Mar  6 21:33:11 r500 kernel: log_sysevent: type 19 is not implemented
Mar  6 21:33:11 r500 last message repeated 1527 times
Mar  6 21:33:11 r500 kernel: log_sysevent: type 19 is not implementedlog_sy=
sevent: type 19 is not implemented
Mar  6 21:33:11 r500 kernel:=20
Mar  6 21:33:11 r500 kernel: log_sysevent: type 19 is not implemented

fk@r500 ~ $zcat /var/log/messages.0.bz2 | grep -c "type 19 is not implement=
ed"
34101

At the time of the unplugging, a video was read from the pool.

When trying to export the pool, zpool export hung.

After rebooting the system, the message was shown two
more times between the creation of the provider for the
main zpool and the swap device:

Mar  6 21:43:20 r500 kernel: GEOM_ELI: Device ada0s1d.eli created.
Mar  6 21:43:20 r500 kernel: GEOM_ELI: Encryption: AES-CBC 128
Mar  6 21:43:20 r500 kernel: GEOM_ELI:     Crypto: software
Mar  6 21:43:20 r500 kernel: Trying to mount root from ufs:/dev/ada0s1a [rw=
]...
Mar  6 21:43:20 r500 kernel: WARNING: / was not properly dismounted
Mar  6 21:43:20 r500 kernel: start_init: trying /sbin/init
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense data
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 =
0 0 0 0 0 0 0 0=20
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status E=
rror
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condit=
ion
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc=
:3a,1 (Medium not present - tray closed)
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable error
Mar  6 21:43:20 r500 kernel: log_sysevent: type 19 is not implemented
Mar  6 21:43:20 r500 kernel: log_sysevent: type 19 is not implemented
Mar  6 21:43:20 r500 kernel: GEOM_ELI: Device ada0s1b.eli created.
Mar  6 21:43:20 r500 kernel: GEOM_ELI: Encryption: AES-XTS 256
Mar  6 21:43:20 r500 kernel: GEOM_ELI:     Crypto: software
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense data
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 =
0 0 0 0 0 0 0 0=20
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status E=
rror
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condit=
ion
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc=
:3a,1 (Medium not present - tray closed)
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable error
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense data
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 =
0 0 0 0 0 0 0 0=20
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status E=
rror
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condit=
ion
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc=
:3a,1 (Medium not present - tray closed)
Mar  6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable error
Mar  6 21:43:20 r500 kernel: lo1: bpf attached
Mar  6 21:43:20 r500 kernel: wlan0: bpf attached
Mar  6 21:43:20 r500 kernel: wlan0: bpf attached
Mar  6 21:43:20 r500 kernel: wlan0: Ethernet address: 00:[...]
Mar  6 21:43:20 r500 kernel: firmware: 'iwn5000fw' version 0: 353240 bytes =
loaded at 0xffffffff814120b0
Mar  6 21:43:20 r500 kernel: firmware: 'iwn5000fw' version 0: 353240 bytes =
loaded at 0xffffffff814120b0
Mar  6 21:43:20 r500 kernel: bge0: Disabling fastboot
Mar  6 21:43:20 r500 kernel: bge0: Disabling fastboot
Mar  6 21:43:20 r500 savecore: /dev/ada0s1b: Operation not permitted
Mar  6 21:43:21 r500 named[2219]: starting BIND 9.6.3 -t /var/named -u bind
Mar  6 21:43:21 r500 named[2219]: built with '--prefix=3D/usr' '--infodir=
=3D/usr/share/info' '--mandir=3D/usr/share/man' '--enable-threads' '--enabl=
e-getifaddrs' '--disable-linux-caps' '--with-openssl=3D/usr' '--with-random=
dev=3D/dev/random' '--without-idn' '--without-libxml2'
Mar  6 21:43:21 r500 named[2219]: command channel listening on 127.0.0.1#953
Mar  6 21:43:21 r500 named[2219]: command channel listening on ::1#953
Mar  6 21:43:21 r500 named[2219]: the working directory is not writable
Mar  6 21:43:21 r500 named[2219]: running
Mar  6 21:43:53 r500 wpa_supplicant[512]: WPA: Group rekeying completed wit=
h 00:[...] [GTK=3DTKIP]
Mar  6 21:44:14 r500 syslogd: exiting on signal 15

As the boot process got stuck with no additional messages
printed, I rebooted into single-user mode, exported the
faulted pool and finished the boot process. The system
came back normally and the pool could be imported without
issues

fk@r500 ~ $grep zfs /boot/loader.conf | grep -v "^ *#"
zfs_load=3D"YES"

I used the attached patch to stop the log spam, but the
main issue seems to be reproducible. The top output after
detaching the pool:

last pid:  4985;  load averages: 10.47,  3.96,  2.02   up 0+02:01:49  19:20=
:49
552 processes: 12 running, 518 sleeping, 22 waiting
CPU:  1.2% user,  0.0% nice, 98.8% system,  0.0% interrupt,  0.0% idle
Mem: 267M Active, 95M Inact, 859M Wired, 2032K Cache, 7872K Buf, 692M Free
Swap: 2048M Total, 2048M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAN=
D                                                                          =
                   =20
    2 root          1  -8    -     0K    16K CPU1    1   1:26 96.97% g_event
    0 root        387  -8    0     0K  6192K -       0   2:11 80.66% kernel
 1702 root          1  76    0  6276K   796K RUN     0   0:15 11.96% devd
   11 root          2 155 ki31     0K    32K RUN     0  26:46  0.00% idle
 3395 fk            1  22    0   465M   329M select  0   7:27  0.00% Xorg
 3398 fk            1  20    0 96120K  9704K RUN     0   0:49  0.00% e16
   12 root         22 -84    -     0K   352K WAIT    1   0:37  0.00% intr
   26 root          1  20    -     0K    16K geli:w  0   0:31  0.00% g_eli[=
0] ada0s1d
   27 root          1  22    -     0K    16K geli:w  1   0:29  0.00% g_eli[=
1] ada0s1d

The stuck zpool's stack:

fk@r500 ~ $sudo procstat -kk $(pgrep zpool)
  PID    TID COMM             TDNAME           KSTACK                      =
=20
 5087 100490 zpool            initial thread   mi_switch+0x174 sleepq_wait+=
0x42 __lockmgr_args+0x7a3 vop_stdlock+0x39 VOP_LOCK1_APV+0x52 _vn_lock+0x47=
 vflush+0x125 zfs_umount+0x9f dounmount+0x31e unmount+0x38b syscallenter+0x=
331 syscall+0x4b Xfast_syscall+0xdd=20

When I re-attached the disk, g_event kept eating cpu.

I'm using a script to attach and import pools on USB devices and as
it currently doesn't handle faulted pools, it tried to import the
already faulted pool, which resulted in a zpool core dump.

Mar  7 19:27:25 r500 sudo:       fk : TTY=3Dttyv0 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/geli attach -j - -k /home/fk/geli-keys/toshiba.ke=
y /dev/label/toshiba
Mar  7 19:27:33 r500 sudo:       fk : TTY=3Dttyv0 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zpool import toshiba
Mar  7 19:27:33 r500 kernel: GEOM_ELI: Device label/toshiba.eli created.
Mar  7 19:27:33 r500 kernel: GEOM_ELI: Encryption: AES-CBC 128
Mar  7 19:27:33 r500 kernel: GEOM_ELI:     Crypto: software
Mar  7 19:27:33 r500 kernel: pid 5206 (zpool), uid 0: exited on signal 11 (=
core dumped)

fk@r500 ~ $gdb /sbin/zpool zpool.core=20
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
[...]
Loaded symbols for /libexec/ld-elf.so.1
#0  0x000000080085ec7d in avl_insert () from /lib/libavl.so.2
[New Thread 802807400 (LWP 100524/initial thread)]
(gdb) where
#0  0x000000080085ec7d in avl_insert () from /lib/libavl.so.2
#1  0x000000080085ed5e in avl_add () from /lib/libavl.so.2
#2  0x0000000801ae9105 in zpool_find_import_cached () from /lib/libzfs.so.2
#3  0x0000000000409deb in zpool_do_import ()
#4  0x0000000000406fa9 in main ()

This time rebooting into single-user mode to export the faulted
pool wasn't necessary, but I doubt that the patch had anything
to do with it.

The second time I was using today's CURRENT instead of yesterday's,
but I don't think there were any ZFS-related commits.

Fabian

--MP_/L/L31YZZ9emswjliN6_sjlJ
Content-Type: text/x-patch
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename=0001-In-log_sysevent-only-warn-about-the-unsupported-type.patch

=46rom 9689a1342d30bb0408afd694b4146cf0d84b61c3 Mon Sep 17 00:00:00 2001
From: Fabian Keil <fk@fabiankeil.de>
Date: Mon, 7 Mar 2011 12:59:03 +0100
Subject: [PATCH] In log_sysevent(), only warn about the unsupported type if=
 the type is different than the last unsupported one.

---
 .../compat/opensolaris/kern/opensolaris_sysevent.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_sysevent.c b/sys/=
cddl/compat/opensolaris/kern/opensolaris_sysevent.c
index ad9ba00..bc697ff 100644
--- a/sys/cddl/compat/opensolaris/kern/opensolaris_sysevent.c
+++ b/sys/cddl/compat/opensolaris/kern/opensolaris_sysevent.c
@@ -286,9 +286,18 @@ log_sysevent(sysevent_t *evp, int flag, sysevent_id_t =
*eid)
 			break;
 		    }
 		default:
-			printf("%s: type %d is not implemented\n", __func__,
-			    nvpair_type(elem));
+		    {
+			static int last_unsupported_type;
+			int unsupported_type =3D nvpair_type(elem);
+
+			if (last_unsupported_type !=3D unsupported_type)
+			{
+			    printf("%s: type %d is not implemented\n",
+				__func__, unsupported_type);
+			    last_unsupported_type =3D unsupported_type;
+			}
 			break;
+		    }
 		}
 	}
=20
--=20
1.7.4.1


--MP_/L/L31YZZ9emswjliN6_sjlJ--

--Sig_/oOazOkkVEQVl61drUilc=as
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (FreeBSD)

iEYEARECAAYFAk11LL0ACgkQBYqIVf93VJ0ZWwCfd33ylh/Wa1UEAEzq1DLAeVV3
chEAn1EbtLniRqrV0A/UijEnl6EAZ9TL
=vRYt
-----END PGP SIGNATURE-----

--Sig_/oOazOkkVEQVl61drUilc=as--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110307200634.3c0f92df>