Date: Sun, 31 Oct 2010 10:42:09 +0100 From: Fabian Keil <freebsd-listen@fabiankeil.de> To: freebsd-fs@freebsd.org Subject: Re: ZFS inresponsive. Message-ID: <20101031104209.771cc47e@r500.local> In-Reply-To: <BE2EF896-4990-4A75-9DD4-B0534DFE0B71@pean.org> References: <BE2EF896-4990-4A75-9DD4-B0534DFE0B71@pean.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] Peter Ankerstål <peter@pean.org> wrote: > I accidentally aborted a zfs send mid-tranfer. > > Now the receiveing side cant do anything with the affected filesystem. both zfs list and zfs destroy hangs. > How can I fix this? I can list other filesystems without a problem but a general list or any command concerning > the one the got broken in transer hangs the zfs command. I occasionally see this, too, when sending snapshots to zpools on USB disks using geli. If the device gets lost in transfer, general zfs and zpool commands hang: Oct 18 17:09:28 r500 sudo: fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/geli attach -k /home/fk/geli-keys/poppstar.key /dev/label/poppstar Oct 18 17:09:34 r500 sudo: fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zpool import poppstar Oct 18 17:09:34 r500 kernel: GEOM_ELI: Device label/poppstar.eli created. Oct 18 17:09:34 r500 kernel: GEOM_ELI: Encryption: AES-CBC 128 Oct 18 17:09:34 r500 kernel: GEOM_ELI: Crypto: software Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense data Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status Error Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condition Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc:3a,1 (Medium not present - tray closed) Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable error Oct 18 17:09:58 r500 sudo: fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs create poppstar/dvds/sliders Oct 18 17:10:32 r500 sudo: fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs send wde2/dvds/sliders/season-1-2@2009-12-05 Oct 18 17:10:32 r500 sudo: fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs receive -v poppstar/dvds/sliders/season-1-2 Oct 18 17:38:06 r500 sudo: fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs receive -v poppstar/dvds/sliders/season-3 Oct 18 17:38:06 r500 sudo: fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs send wde2/dvds/sliders/season-3@2009-03-28 Oct 18 17:39:47 r500 kernel: acpi_lid0: Lid closed Oct 18 18:13:19 r500 kernel: acpi_lid0: Lid opened Oct 18 18:22:56 r500 sudo: fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs send wde2/dvds/sliders/season-4@2009-12-05 Oct 18 18:22:56 r500 sudo: fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs receive -v poppstar/dvds/sliders/season-4 Oct 18 18:33:05 r500 kernel: ugen3.3: <Generic> at usbus3 (disconnected) Oct 18 18:33:05 r500 kernel: umass2: at uhub3, port 1, addr 3 (disconnected) Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): SCSI status error Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Requesting SCSI sense data Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): AutoSense failed Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Error 5, Unretryable error Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645020672, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645151744, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645282816, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645413888, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645544960, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645676032, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645807104, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362645938176, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362646069248, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=5). label/poppstar.eli[WRITE(offset=362644889600, length=131072)] Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Selection timeout Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Retrying command Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Selection timeout Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Retrying command Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Selection timeout Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Retrying command Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Selection timeout Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Retrying command Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): lost device Oct 18 18:33:05 r500 kernel: (pass4:umass-sim2:2:0:0): lost device Oct 18 18:33:05 r500 kernel: (pass4:umass-sim2:2:0:0): removing device entry Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Error 6, Unretryable error Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0):G EOM_ELIInvalidating pack: Oct 18 18:33:05 r500 kernel: g_eli_read_done() failed label/poppstar.eli[READ(offset=2000398327808, length=8192)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: g_eli_read_done() failed label/poppstar.eli[READ(offset=270336, length=8192)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: g_eli_read_done() failed label/poppstar.eli[READ(offset=2000398589952, length=8192)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646200320, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646331392, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646593536, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646724608, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646462464, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646855680, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362646986752, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362647117824, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362647248896, length=131072)] Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=6). label/poppstar.eli[WRITE(offset=362647379968, length=131072)] Oct 18 18:33:05 r500 root: ZFS: vdev I/O failure, zpool=poppstar path=/dev/label/poppstar.eli offset=2000398327808 size=8192 error=6 Oct 18 18:33:05 r500 root: ZFS: vdev I/O failure, zpool=poppstar path=/dev/label/poppstar.eli offset=270336 size=8192 error=6 Oct 18 18:33:05 r500 root: ZFS: vdev I/O failure, zpool=poppstar path=/dev/label/poppstar.eli offset=2000398589952 size=8192 error=6 Oct 18 18:33:05 r500 root: ZFS: zpool I/O failure, zpool=poppstar error=6 Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Synchronize cache failed, status == 0xa, scsi status == 0x0 Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): removing device entry Oct 18 18:33:05 r500 root: ZFS: zpool I/O failure, zpool=poppstar error=6 Oct 18 18:33:05 r500 last message repeated 151 times Oct 18 18:33:05 r500 root: ZFS: zpool I/O failure, zpool=poppstar error=28 Oct 18 18:33:06 r500 last message repeated 161 times Oct 18 18:33:06 r500 root: ZFS: vdev I/O failure, zpool=poppstar path= offset= size= error= Oct 18 18:33:06 r500 root: Unknown USB device: vendor 0x13fd product 0x1240 bus uhub3 Oct 18 18:33:06 r500 kernel: ugen3.3: <Generic> at usbus3 Oct 18 18:33:06 r500 kernel: umass2: <Generic External, class 0/0, rev 2.00/2.10, addr 3> on usbus3 Oct 18 18:33:06 r500 kernel: umass2: SCSI over Bulk-Only; quirks = 0x0000 Oct 18 18:33:08 r500 kernel: umass2:4:2:-1: Attached to scbus4 Oct 18 18:33:08 r500 kernel: pass4 at umass-sim2 bus 2 scbus4 target 0 lun 0 Oct 18 18:33:08 r500 kernel: pass4: <Generic External 2.10> Fixed Direct Access SCSI-4 device Oct 18 18:33:08 r500 kernel: pass4: Serial Number 395857304652454620202020 Oct 18 18:33:08 r500 kernel: pass4: 40.000MB/s transfers Oct 18 18:33:08 r500 kernel: GEOM: new disk da2 Oct 18 18:33:08 r500 kernel: da2 at umass-sim2 bus 2 scbus4 target 0 lun 0 Oct 18 18:33:08 r500 kernel: da2: <Generic External 2.10> Fixed Direct Access SCSI-4 device Oct 18 18:33:08 r500 kernel: da2: Serial Number 395857304652454620202020 Oct 18 18:33:08 r500 kernel: da2: 40.000MB/s transfers Oct 18 18:33:08 r500 kernel: da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) Oct 18 18:34:02 r500 sudo: fk : TTY=pts/14 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zpool list Oct 18 18:34:02 r500 root: ZFS: zpool I/O failure, zpool=poppstar error=6 Oct 18 18:34:02 r500 root: ZFS: vdev I/O failure, zpool=poppstar path= offset= size= error= Oct 18 18:36:34 r500 kernel: ugen3.3: <Generic> at usbus3 (disconnected) Oct 18 18:36:34 r500 kernel: umass2: at uhub3, port 1, addr 3 (disconnected) Oct 18 18:36:34 r500 kernel: (da2:umass-sim2:2:0:0): lost device Oct 18 18:36:34 r500 kernel: (da2:umass-sim2:2:0:0): removing device entry Oct 18 18:36:34 r500 kernel: (pass4:umass-sim2:2:0:0): lost device Oct 18 18:36:34 r500 kernel: (pass4:umass-sim2:2:0:0): removing device entry Oct 18 18:36:44 r500 root: Unknown USB device: vendor 0x13fd product 0x1240 bus uhub3 Oct 18 18:36:44 r500 kernel: ugen3.3: <Generic> at usbus3 Oct 18 18:36:44 r500 kernel: umass2: <Generic External, class 0/0, rev 2.00/2.10, addr 3> on usbus3 Oct 18 18:36:44 r500 kernel: umass2: SCSI over Bulk-Only; quirks = 0x0000 Oct 18 18:36:46 r500 kernel: umass2:4:2:-1: Attached to scbus4 Oct 18 18:36:46 r500 kernel: pass4 at umass-sim2 bus 2 scbus4 target 0 lun 0 Oct 18 18:36:46 r500 kernel: pass4: <Generic External 2.10> Fixed Direct Access SCSI-4 device Oct 18 18:36:46 r500 kernel: pass4: Serial Number 395857304652454620202020 Oct 18 18:36:46 r500 kernel: pass4: 40.000MB/s transfers Oct 18 18:36:46 r500 kernel: GEOM: new disk da2 Oct 18 18:36:46 r500 kernel: da2 at umass-sim2 bus 2 scbus4 target 0 lun 0 Oct 18 18:36:46 r500 kernel: da2: <Generic External 2.10> Fixed Direct Access SCSI-4 device Oct 18 18:36:46 r500 kernel: da2: Serial Number 395857304652454620202020 Oct 18 18:36:46 r500 kernel: da2: 40.000MB/s transfers Oct 18 18:36:46 r500 kernel: da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) Oct 18 18:36:46 r500 kernel: ugen3.3: <Generic> at usbus3 (disconnected) Oct 18 18:36:46 r500 kernel: umass2: at uhub3, port 1, addr 3 (disconnected) Oct 18 18:36:46 r500 kernel: ugen7.3: <Western Digital> at usbus7 (disconnected) Oct 18 18:36:46 r500 kernel: umass1: at uhub7, port 1, addr 3 (disconnected) Oct 18 18:36:46 r500 kernel: (da2:umass-sim2:2:0:0): lost device Oct 18 18:36:46 r500 kernel: (da2:umass-sim2:2:0:0): removing device entry Oct 18 18:36:46 r500 kernel: (pass4:umass-sim2:2:0:0): lost device Oct 18 18:36:46 r500 kernel: (pass4:umass-sim2:2:0:0): removing device entry Oct 18 18:36:46 r500 kernel: (da1:umass-sim1:1:0:0): lost device Oct 18 18:36:46 r500 kernel: (pass3:umass-sim1:1:0:0): lost device Oct 18 18:36:46 r500 kernel: (pass3:umass-sim1:1:0:0): removing device entry Oct 18 18:36:46 r500 kernel: (da1:umass-sim1:1:0:0): Synchronize cache failed, status == 0xa, scsi status == 0x0 Oct 18 18:36:46 r500 kernel: (da1:umass-sim1:1:0:0): removing device entry Oct 18 18:36:46 r500 kernel: ugen7.2: <Western Digital> at usbus7 (disconnected) Oct 18 18:36:46 r500 kernel: umass0: at uhub7, port 2, addr 2 (disconnected) Oct 18 18:36:46 r500 kernel: (da0:umass-sim0:0:0:0): lost device Oct 18 18:36:46 r500 kernel: (pass2:umass-sim0:0:0:0): lost device Oct 18 18:36:46 r500 kernel: (pass2:umass-sim0:0:0:0): removing device entry Oct 18 18:36:46 r500 kernel: (da0:umass-sim0:0:0:0): Synchronize cache failed, status == 0xa, scsi status == 0x0 Oct 18 18:36:46 r500 kernel: (da0:umass-sim0:0:0:0): removing device entry Oct 18 18:36:54 r500 root: ZFS: zpool I/O failure, zpool=wde2 error=28 Oct 18 18:36:54 r500 last message repeated 3 times Oct 18 18:36:54 r500 root: ZFS: vdev I/O failure, zpool=wde2 path= offset= size= error= Oct 18 18:36:54 r500 root: ZFS: zpool I/O failure, zpool=wde3 error=28 Oct 18 18:36:54 r500 last message repeated 3 times Oct 18 18:36:54 r500 root: ZFS: vdev I/O failure, zpool=wde3 path= offset= size= error= fk@r500 ~ $ps waux | grep \ [z] root 5501 0.0 0.1 13364 1288 9 I+ 6:22PM 0:00.01 sudo zfs receive -v poppstar/dvds/sliders/season-4 root 5502 0.0 0.1 18944 1312 9 D+ 6:22PM 0:21.73 zfs receive -v poppstar/dvds/sliders/season-4 fk 4711 0.0 0.1 19980 1364 11 D+ 5:10PM 0:01.03 zpool iostat poppstar 1 root 5947 0.0 0.1 13364 1728 14 I+ 6:34PM 0:00.01 sudo zpool list root 5948 0.0 0.1 19980 1760 14 D+ 6:34PM 0:00.01 zpool list fk 5990 0.0 0.1 19980 1744 16 D+ 6:36PM 0:00.01 zpool list -H -o name fk@r500 ~ $sudo procstat -kk `pgrep zfs` PID TID COMM TDNAME KSTACK 5502 100519 zfs - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_wait_synced+0x7c dmu_tx_assign+0x16c dmu_recv_stream+0x125f zfs_ioc_recv+0x282 zfsdev_ioctl+0x8f devfs_ioctl_f+0x7b kern_ioctl+0x102 ioctl+0xfd syscallenter+0x331 syscall+0x4b Xfast_syscall+0xe2 fk@r500 ~ $sudo procstat -kk `pgrep zpool` PID TID COMM TDNAME KSTACK 5990 100507 zpool - mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x3c2 _sx_xlock+0x6e spa_all_configs+0xbf zfs_ioc_pool_configs+0x29 zfsdev_ioctl+0x8f devfs_ioctl_f+0x7b kern_ioctl+0x102 ioctl+0xfd syscallenter+0x331 syscall+0x4b Xfast_syscall+0xe2 5948 100292 zpool - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x39a dnode_hold_impl+0xf0 dmu_buf_hold+0x34 zap_lockdir+0x52 zap_cursor_retrieve+0x194 spa_prop_get+0x26f zfs_ioc_pool_get_props+0xb2 zfsdev_ioctl+0x8f devfs_ioctl_f+0x7b kern_ioctl+0x102 ioctl+0xfd syscallenter+0x331 syscall+0x4b Xfast_syscall+0xe2 4711 100269 zpool - mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x3c2 _sx_xlock+0x6e spa_open_common+0x76 spa_get_stats+0x42 zfs_ioc_pool_stats+0x2c zfsdev_ioctl+0x8f devfs_ioctl_f+0x7b kern_ioctl+0x102 ioctl+0xfd syscallenter+0x331 syscall+0x4b Xfast_syscall+0xe2 At the time I've been using: FreeBSD r500.local 9.0-CURRENT FreeBSD 9.0-CURRENT #193 r+3be3a96: Sun Oct 17 12:51:52 CEST 2010 fk@r500.local:/usr/obj/usr/src/sys/ZOEY amd64 with ZFS pool version 15, but I saw the problem with earlier FreeBSD and ZFS versions, too. Fabian [-- Attachment #2 --] -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD) iEYEARECAAYFAkzNOfgACgkQBYqIVf93VJ0KfwCcDApuTT8QkxyMXt6HLEMOeslR kVAAoINh2rLwsOyXgcP+5uH1vjvZv3pN =ue8o -----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101031104209.771cc47e>
