From owner-freebsd-fs@FreeBSD.ORG Mon Mar 7 19:13:07 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1797C1065673; Mon, 7 Mar 2011 19:13:07 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay06.ispgateway.de (smtprelay06.ispgateway.de [80.67.31.102]) by mx1.freebsd.org (Postfix) with ESMTP id 8094E8FC17; Mon, 7 Mar 2011 19:13:06 +0000 (UTC) Received: from [78.34.189.182] (helo=r500.local) by smtprelay06.ispgateway.de with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1Pwfrc-0004bi-Hv; Mon, 07 Mar 2011 20:13:04 +0100 Date: Mon, 7 Mar 2011 20:06:34 +0100 From: Fabian Keil To: freebsd-fs@FreeBSD.org Message-ID: <20110307200634.3c0f92df@r500.local> In-Reply-To: <20110228192129.119cac0c@r500.local> References: <20110227202957.GD1992@garage.freebsd.pl> <20110228192129.119cac0c@r500.local> X-Mailer: Claws Mail 3.7.8 (GTK+ 2.22.1; amd64-portbld-freebsd9.0) X-PGP-KEY-URL: http://www.fabiankeil.de/gpg-keys/freebsd-listen-2008-08-18.asc Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/oOazOkkVEQVl61drUilc=as"; protocol="application/pgp-signature" X-Df-Sender: 775067 Cc: Pawel Jakub Dawidek Subject: ZFSv28: log_sysevent: type 19 is not implemented (was: HEADS UP: ZFSv28 is in!) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2011 19:13:07 -0000 --Sig_/oOazOkkVEQVl61drUilc=as Content-Type: multipart/mixed; boundary="MP_/L/L31YZZ9emswjliN6_sjlJ" --MP_/L/L31YZZ9emswjliN6_sjlJ Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Fabian Keil wrote: > Pawel Jakub Dawidek wrote: >=20 > > I just committed ZFSv28 to HEAD. > Anyway, the things I tested so far (zfs/zpool upgrade, > delegation, send, receive, snapshot) worked fine. After unintentionally unplugging an USB disk with this zpool on it: | fk@r500 ~ $zpool status toshiba | pool: toshiba | state: ONLINE | scan: none requested | config: | | NAME STATE READ WRITE CKSUM | toshiba ONLINE 0 0 0 | label/toshiba.eli ONLINE 0 0 0 | | errors: No known data errors the system became sluggish and /var/log/messages got spammed with error messages: Mar 6 21:33:10 r500 kernel: ugen7.2: at usbus7 (disconnected) Mar 6 21:33:10 r500 kernel: umass1: at uhub7, port 1, addr 2 (disconnected) Mar 6 21:33:10 r500 kernel: (pass3:umass-sim1:1:0:0): lost device Mar 6 21:33:10 r500 kernel: (pass3:umass-sim1:1:0:0): removing device entry Mar 6 21:33:10 r500 kernel: (da1:umass-sim1:1:0:0): lost device Mar 6 21:33:10 r500 kernel: (da1:umass-sim1:1:0:0): Synchronize cache fail= ed, status =3D=3D 0xa, scsi status =3D=3D 0x0 Mar 6 21:33:10 r500 kernel: (da1:umass-sim1:1:0:0): removing device entry Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented Mar 6 21:33:10 r500 last message repeated 50 times Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlog_sy= sevent: type 19 is not im Mar 6 21:33:10 r500 kernel: plemented Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented Mar 6 21:33:10 r500 last message repeated 238 times Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlog_sy= sevent: type 19 is Mar 6 21:33:10 r500 kernel: not implemented Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlog_sy= sevent: type 19 is not impleme Mar 6 21:33:10 r500 kernel: nted Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented Mar 6 21:33:10 r500 last message repeated 87 times Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implementedlog_sy= sevent: type 19 is not implemented Mar 6 21:33:10 r500 kernel:=20 Mar 6 21:33:10 r500 kernel: log_sysevent: type 19 is not implemented Mar 6 21:33:10 r500 last message repeated 47 times Mar 6 21:33:11 r500 kernel: type 19 is not implemented Mar 6 21:33:11 r500 kernel: log_sysevent: type 19 is not implemented Mar 6 21:33:11 r500 last message repeated 1527 times Mar 6 21:33:11 r500 kernel: log_sysevent: type 19 is not implementedlog_sy= sevent: type 19 is not implemented Mar 6 21:33:11 r500 kernel:=20 Mar 6 21:33:11 r500 kernel: log_sysevent: type 19 is not implemented fk@r500 ~ $zcat /var/log/messages.0.bz2 | grep -c "type 19 is not implement= ed" 34101 At the time of the unplugging, a video was read from the pool. When trying to export the pool, zpool export hung. After rebooting the system, the message was shown two more times between the creation of the provider for the main zpool and the swap device: Mar 6 21:43:20 r500 kernel: GEOM_ELI: Device ada0s1d.eli created. Mar 6 21:43:20 r500 kernel: GEOM_ELI: Encryption: AES-CBC 128 Mar 6 21:43:20 r500 kernel: GEOM_ELI: Crypto: software Mar 6 21:43:20 r500 kernel: Trying to mount root from ufs:/dev/ada0s1a [rw= ]... Mar 6 21:43:20 r500 kernel: WARNING: / was not properly dismounted Mar 6 21:43:20 r500 kernel: start_init: trying /sbin/init Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense data Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 = 0 0 0 0 0 0 0 0=20 Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status E= rror Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condit= ion Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc= :3a,1 (Medium not present - tray closed) Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable error Mar 6 21:43:20 r500 kernel: log_sysevent: type 19 is not implemented Mar 6 21:43:20 r500 kernel: log_sysevent: type 19 is not implemented Mar 6 21:43:20 r500 kernel: GEOM_ELI: Device ada0s1b.eli created. Mar 6 21:43:20 r500 kernel: GEOM_ELI: Encryption: AES-XTS 256 Mar 6 21:43:20 r500 kernel: GEOM_ELI: Crypto: software Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense data Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 = 0 0 0 0 0 0 0 0=20 Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status E= rror Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condit= ion Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc= :3a,1 (Medium not present - tray closed) Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable error Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense data Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 = 0 0 0 0 0 0 0 0=20 Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status E= rror Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condit= ion Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc= :3a,1 (Medium not present - tray closed) Mar 6 21:43:20 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable error Mar 6 21:43:20 r500 kernel: lo1: bpf attached Mar 6 21:43:20 r500 kernel: wlan0: bpf attached Mar 6 21:43:20 r500 kernel: wlan0: bpf attached Mar 6 21:43:20 r500 kernel: wlan0: Ethernet address: 00:[...] Mar 6 21:43:20 r500 kernel: firmware: 'iwn5000fw' version 0: 353240 bytes = loaded at 0xffffffff814120b0 Mar 6 21:43:20 r500 kernel: firmware: 'iwn5000fw' version 0: 353240 bytes = loaded at 0xffffffff814120b0 Mar 6 21:43:20 r500 kernel: bge0: Disabling fastboot Mar 6 21:43:20 r500 kernel: bge0: Disabling fastboot Mar 6 21:43:20 r500 savecore: /dev/ada0s1b: Operation not permitted Mar 6 21:43:21 r500 named[2219]: starting BIND 9.6.3 -t /var/named -u bind Mar 6 21:43:21 r500 named[2219]: built with '--prefix=3D/usr' '--infodir= =3D/usr/share/info' '--mandir=3D/usr/share/man' '--enable-threads' '--enabl= e-getifaddrs' '--disable-linux-caps' '--with-openssl=3D/usr' '--with-random= dev=3D/dev/random' '--without-idn' '--without-libxml2' Mar 6 21:43:21 r500 named[2219]: command channel listening on 127.0.0.1#953 Mar 6 21:43:21 r500 named[2219]: command channel listening on ::1#953 Mar 6 21:43:21 r500 named[2219]: the working directory is not writable Mar 6 21:43:21 r500 named[2219]: running Mar 6 21:43:53 r500 wpa_supplicant[512]: WPA: Group rekeying completed wit= h 00:[...] [GTK=3DTKIP] Mar 6 21:44:14 r500 syslogd: exiting on signal 15 As the boot process got stuck with no additional messages printed, I rebooted into single-user mode, exported the faulted pool and finished the boot process. The system came back normally and the pool could be imported without issues fk@r500 ~ $grep zfs /boot/loader.conf | grep -v "^ *#" zfs_load=3D"YES" I used the attached patch to stop the log spam, but the main issue seems to be reproducible. The top output after detaching the pool: last pid: 4985; load averages: 10.47, 3.96, 2.02 up 0+02:01:49 19:20= :49 552 processes: 12 running, 518 sleeping, 22 waiting CPU: 1.2% user, 0.0% nice, 98.8% system, 0.0% interrupt, 0.0% idle Mem: 267M Active, 95M Inact, 859M Wired, 2032K Cache, 7872K Buf, 692M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN= D = =20 2 root 1 -8 - 0K 16K CPU1 1 1:26 96.97% g_event 0 root 387 -8 0 0K 6192K - 0 2:11 80.66% kernel 1702 root 1 76 0 6276K 796K RUN 0 0:15 11.96% devd 11 root 2 155 ki31 0K 32K RUN 0 26:46 0.00% idle 3395 fk 1 22 0 465M 329M select 0 7:27 0.00% Xorg 3398 fk 1 20 0 96120K 9704K RUN 0 0:49 0.00% e16 12 root 22 -84 - 0K 352K WAIT 1 0:37 0.00% intr 26 root 1 20 - 0K 16K geli:w 0 0:31 0.00% g_eli[= 0] ada0s1d 27 root 1 22 - 0K 16K geli:w 1 0:29 0.00% g_eli[= 1] ada0s1d The stuck zpool's stack: fk@r500 ~ $sudo procstat -kk $(pgrep zpool) PID TID COMM TDNAME KSTACK = =20 5087 100490 zpool initial thread mi_switch+0x174 sleepq_wait+= 0x42 __lockmgr_args+0x7a3 vop_stdlock+0x39 VOP_LOCK1_APV+0x52 _vn_lock+0x47= vflush+0x125 zfs_umount+0x9f dounmount+0x31e unmount+0x38b syscallenter+0x= 331 syscall+0x4b Xfast_syscall+0xdd=20 When I re-attached the disk, g_event kept eating cpu. I'm using a script to attach and import pools on USB devices and as it currently doesn't handle faulted pools, it tried to import the already faulted pool, which resulted in a zpool core dump. Mar 7 19:27:25 r500 sudo: fk : TTY=3Dttyv0 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/geli attach -j - -k /home/fk/geli-keys/toshiba.ke= y /dev/label/toshiba Mar 7 19:27:33 r500 sudo: fk : TTY=3Dttyv0 ; PWD=3D/home/fk ; USER= =3Droot ; COMMAND=3D/sbin/zpool import toshiba Mar 7 19:27:33 r500 kernel: GEOM_ELI: Device label/toshiba.eli created. Mar 7 19:27:33 r500 kernel: GEOM_ELI: Encryption: AES-CBC 128 Mar 7 19:27:33 r500 kernel: GEOM_ELI: Crypto: software Mar 7 19:27:33 r500 kernel: pid 5206 (zpool), uid 0: exited on signal 11 (= core dumped) fk@r500 ~ $gdb /sbin/zpool zpool.core=20 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. [...] Loaded symbols for /libexec/ld-elf.so.1 #0 0x000000080085ec7d in avl_insert () from /lib/libavl.so.2 [New Thread 802807400 (LWP 100524/initial thread)] (gdb) where #0 0x000000080085ec7d in avl_insert () from /lib/libavl.so.2 #1 0x000000080085ed5e in avl_add () from /lib/libavl.so.2 #2 0x0000000801ae9105 in zpool_find_import_cached () from /lib/libzfs.so.2 #3 0x0000000000409deb in zpool_do_import () #4 0x0000000000406fa9 in main () This time rebooting into single-user mode to export the faulted pool wasn't necessary, but I doubt that the patch had anything to do with it. The second time I was using today's CURRENT instead of yesterday's, but I don't think there were any ZFS-related commits. Fabian --MP_/L/L31YZZ9emswjliN6_sjlJ Content-Type: text/x-patch Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename=0001-In-log_sysevent-only-warn-about-the-unsupported-type.patch =46rom 9689a1342d30bb0408afd694b4146cf0d84b61c3 Mon Sep 17 00:00:00 2001 From: Fabian Keil Date: Mon, 7 Mar 2011 12:59:03 +0100 Subject: [PATCH] In log_sysevent(), only warn about the unsupported type if= the type is different than the last unsupported one. --- .../compat/opensolaris/kern/opensolaris_sysevent.c | 13 +++++++++++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_sysevent.c b/sys/= cddl/compat/opensolaris/kern/opensolaris_sysevent.c index ad9ba00..bc697ff 100644 --- a/sys/cddl/compat/opensolaris/kern/opensolaris_sysevent.c +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_sysevent.c @@ -286,9 +286,18 @@ log_sysevent(sysevent_t *evp, int flag, sysevent_id_t = *eid) break; } default: - printf("%s: type %d is not implemented\n", __func__, - nvpair_type(elem)); + { + static int last_unsupported_type; + int unsupported_type =3D nvpair_type(elem); + + if (last_unsupported_type !=3D unsupported_type) + { + printf("%s: type %d is not implemented\n", + __func__, unsupported_type); + last_unsupported_type =3D unsupported_type; + } break; + } } } =20 --=20 1.7.4.1 --MP_/L/L31YZZ9emswjliN6_sjlJ-- --Sig_/oOazOkkVEQVl61drUilc=as Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) iEYEARECAAYFAk11LL0ACgkQBYqIVf93VJ0ZWwCfd33ylh/Wa1UEAEzq1DLAeVV3 chEAn1EbtLniRqrV0A/UijEnl6EAZ9TL =vRYt -----END PGP SIGNATURE----- --Sig_/oOazOkkVEQVl61drUilc=as--