Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 31 Oct 2010 10:42:09 +0100
From:      Fabian Keil <freebsd-listen@fabiankeil.de>
To:        freebsd-fs@freebsd.org
Subject:   Re: ZFS inresponsive.
Message-ID:  <20101031104209.771cc47e@r500.local>
In-Reply-To: <BE2EF896-4990-4A75-9DD4-B0534DFE0B71@pean.org>
References:  <BE2EF896-4990-4A75-9DD4-B0534DFE0B71@pean.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_//nUatHWA_bdzcjmJ_IYD4LJ
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Peter Ankerst=E5l <peter@pean.org> wrote:

> I accidentally aborted a zfs send mid-tranfer.
>=20
> Now the receiveing side cant do anything with the affected filesystem. bo=
th zfs list and zfs destroy hangs.
> How can I fix this? I can list other filesystems without a problem but a =
general list or any command concerning
> the one the got broken in transer hangs the zfs command.

I occasionally see this, too, when sending snapshots to zpools on
USB disks using geli. If the device gets lost in transfer, general
zfs and zpool commands hang:

Oct 18 17:09:28 r500 sudo:       fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/geli attach -k /home/fk/geli-keys/poppstar.key /d=
ev/label/poppstar
Oct 18 17:09:34 r500 sudo:       fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zpool import poppstar
Oct 18 17:09:34 r500 kernel: GEOM_ELI: Device label/poppstar.eli created.
Oct 18 17:09:34 r500 kernel: GEOM_ELI: Encryption: AES-CBC 128
Oct 18 17:09:34 r500 kernel: GEOM_ELI:     Crypto: software
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): Requesting SCSI sense data
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status error
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 =
0 0 0 0 0 0 0 0=20
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): CAM status: SCSI Status E=
rror
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): SCSI status: Check Condit=
ion
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc=
:3a,1 (Medium not present - tray closed)
Oct 18 17:09:35 r500 kernel: (cd0:ahcich1:0:0:0): Error 6, Unretryable error
Oct 18 17:09:58 r500 sudo:       fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zfs create poppstar/dvds/sliders
Oct 18 17:10:32 r500 sudo:       fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zfs send wde2/dvds/sliders/season-1-2@2009-12-05
Oct 18 17:10:32 r500 sudo:       fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zfs receive -v poppstar/dvds/sliders/season-1-2
Oct 18 17:38:06 r500 sudo:       fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zfs receive -v poppstar/dvds/sliders/season-3
Oct 18 17:38:06 r500 sudo:       fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zfs send wde2/dvds/sliders/season-3@2009-03-28
Oct 18 17:39:47 r500 kernel: acpi_lid0: Lid closed
Oct 18 18:13:19 r500 kernel: acpi_lid0: Lid opened
Oct 18 18:22:56 r500 sudo:       fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zfs send wde2/dvds/sliders/season-4@2009-12-05
Oct 18 18:22:56 r500 sudo:       fk : TTY=3Dpts/9 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zfs receive -v poppstar/dvds/sliders/season-4
Oct 18 18:33:05 r500 kernel: ugen3.3: <Generic> at usbus3 (disconnected)
Oct 18 18:33:05 r500 kernel: umass2: at uhub3, port 1, addr 3 (disconnected)
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): SCSI status error
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Requesting SCSI sense =
data
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): AutoSense failed
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Error 5, Unretryable e=
rror
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D5). label/poppstar.eli[WRITE(offset=3D362645020672, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D5). label/poppstar.eli[WRITE(offset=3D362645151744, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D5). label/poppstar.eli[WRITE(offset=3D362645282816, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D5). label/poppstar.eli[WRITE(offset=3D362645413888, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D5). label/poppstar.eli[WRITE(offset=3D362645544960, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D5). label/poppstar.eli[WRITE(offset=3D362645676032, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D5). label/poppstar.eli[WRITE(offset=3D362645807104, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D5). label/poppstar.eli[WRITE(offset=3D362645938176, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D5). label/poppstar.eli[WRITE(offset=3D362646069248, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D5). label/poppstar.eli[WRITE(offset=3D362644889600, length=3D131072)]
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Selection timeout
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Retrying command
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Selection timeout
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Retrying command
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Selection timeout
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Retrying command
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Selection timeout
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Retrying command
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): lost device
Oct 18 18:33:05 r500 kernel: (pass4:umass-sim2:2:0:0): lost device
Oct 18 18:33:05 r500 kernel: (pass4:umass-sim2:2:0:0): removing device entry
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Error 6, Unretryable e=
rror
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0):G EOM_ELIInvalidating p=
ack:=20
Oct 18 18:33:05 r500 kernel: g_eli_read_done() failed label/poppstar.eli[RE=
AD(offset=3D2000398327808, length=3D8192)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: g_eli_read_done() failed label/popps=
tar.eli[READ(offset=3D270336, length=3D8192)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: g_eli_read_done() failed label/popps=
tar.eli[READ(offset=3D2000398589952, length=3D8192)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D6). label/poppstar.eli[WRITE(offset=3D362646200320, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D6). label/poppstar.eli[WRITE(offset=3D362646331392, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D6). label/poppstar.eli[WRITE(offset=3D362646593536, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D6). label/poppstar.eli[WRITE(offset=3D362646724608, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D6). label/poppstar.eli[WRITE(offset=3D362646462464, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D6). label/poppstar.eli[WRITE(offset=3D362646855680, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D6). label/poppstar.eli[WRITE(offset=3D362646986752, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D6). label/poppstar.eli[WRITE(offset=3D362647117824, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D6). label/poppstar.eli[WRITE(offset=3D362647248896, length=3D131072)]
Oct 18 18:33:05 r500 kernel: GEOM_ELI: Crypto WRITE request failed (error=
=3D6). label/poppstar.eli[WRITE(offset=3D362647379968, length=3D131072)]
Oct 18 18:33:05 r500 root: ZFS: vdev I/O failure, zpool=3Dpoppstar path=3D/=
dev/label/poppstar.eli offset=3D2000398327808 size=3D8192 error=3D6
Oct 18 18:33:05 r500 root: ZFS: vdev I/O failure, zpool=3Dpoppstar path=3D/=
dev/label/poppstar.eli offset=3D270336 size=3D8192 error=3D6
Oct 18 18:33:05 r500 root: ZFS: vdev I/O failure, zpool=3Dpoppstar path=3D/=
dev/label/poppstar.eli offset=3D2000398589952 size=3D8192 error=3D6
Oct 18 18:33:05 r500 root: ZFS: zpool I/O failure, zpool=3Dpoppstar error=
=3D6
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): Synchronize cache fail=
ed, status =3D=3D 0xa, scsi status =3D=3D 0x0
Oct 18 18:33:05 r500 kernel: (da2:umass-sim2:2:0:0): removing device entry
Oct 18 18:33:05 r500 root: ZFS: zpool I/O failure, zpool=3Dpoppstar error=
=3D6
Oct 18 18:33:05 r500 last message repeated 151 times
Oct 18 18:33:05 r500 root: ZFS: zpool I/O failure, zpool=3Dpoppstar error=
=3D28
Oct 18 18:33:06 r500 last message repeated 161 times
Oct 18 18:33:06 r500 root: ZFS: vdev I/O failure, zpool=3Dpoppstar path=3D =
offset=3D size=3D error=3D
Oct 18 18:33:06 r500 root: Unknown USB device: vendor 0x13fd product 0x1240=
 bus uhub3
Oct 18 18:33:06 r500 kernel: ugen3.3: <Generic> at usbus3
Oct 18 18:33:06 r500 kernel: umass2: <Generic External, class 0/0, rev 2.00=
/2.10, addr 3> on usbus3
Oct 18 18:33:06 r500 kernel: umass2:  SCSI over Bulk-Only; quirks =3D 0x0000
Oct 18 18:33:08 r500 kernel: umass2:4:2:-1: Attached to scbus4
Oct 18 18:33:08 r500 kernel: pass4 at umass-sim2 bus 2 scbus4 target 0 lun 0
Oct 18 18:33:08 r500 kernel: pass4: <Generic External 2.10> Fixed Direct Ac=
cess SCSI-4 device=20
Oct 18 18:33:08 r500 kernel: pass4: Serial Number 395857304652454620202020
Oct 18 18:33:08 r500 kernel: pass4: 40.000MB/s transfers
Oct 18 18:33:08 r500 kernel: GEOM: new disk da2
Oct 18 18:33:08 r500 kernel: da2 at umass-sim2 bus 2 scbus4 target 0 lun 0
Oct 18 18:33:08 r500 kernel: da2: <Generic External 2.10> Fixed Direct Acce=
ss SCSI-4 device=20
Oct 18 18:33:08 r500 kernel: da2: Serial Number 395857304652454620202020
Oct 18 18:33:08 r500 kernel: da2: 40.000MB/s transfers
Oct 18 18:33:08 r500 kernel: da2: 1907729MB (3907029168 512 byte sectors: 2=
55H 63S/T 243201C)
Oct 18 18:34:02 r500 sudo:       fk : TTY=3Dpts/14 ; PWD=3D/home/fk ; USER=
=3Droot ; COMMAND=3D/sbin/zpool list
Oct 18 18:34:02 r500 root: ZFS: zpool I/O failure, zpool=3Dpoppstar error=
=3D6
Oct 18 18:34:02 r500 root: ZFS: vdev I/O failure, zpool=3Dpoppstar path=3D =
offset=3D size=3D error=3D
Oct 18 18:36:34 r500 kernel: ugen3.3: <Generic> at usbus3 (disconnected)
Oct 18 18:36:34 r500 kernel: umass2: at uhub3, port 1, addr 3 (disconnected)
Oct 18 18:36:34 r500 kernel: (da2:umass-sim2:2:0:0): lost device
Oct 18 18:36:34 r500 kernel: (da2:umass-sim2:2:0:0): removing device entry
Oct 18 18:36:34 r500 kernel: (pass4:umass-sim2:2:0:0): lost device
Oct 18 18:36:34 r500 kernel: (pass4:umass-sim2:2:0:0): removing device entry
Oct 18 18:36:44 r500 root: Unknown USB device: vendor 0x13fd product 0x1240=
 bus uhub3
Oct 18 18:36:44 r500 kernel: ugen3.3: <Generic> at usbus3
Oct 18 18:36:44 r500 kernel: umass2: <Generic External, class 0/0, rev 2.00=
/2.10, addr 3> on usbus3
Oct 18 18:36:44 r500 kernel: umass2:  SCSI over Bulk-Only; quirks =3D 0x0000
Oct 18 18:36:46 r500 kernel: umass2:4:2:-1: Attached to scbus4
Oct 18 18:36:46 r500 kernel: pass4 at umass-sim2 bus 2 scbus4 target 0 lun 0
Oct 18 18:36:46 r500 kernel: pass4: <Generic External 2.10> Fixed Direct Ac=
cess SCSI-4 device=20
Oct 18 18:36:46 r500 kernel: pass4: Serial Number 395857304652454620202020
Oct 18 18:36:46 r500 kernel: pass4: 40.000MB/s transfers
Oct 18 18:36:46 r500 kernel: GEOM: new disk da2
Oct 18 18:36:46 r500 kernel: da2 at umass-sim2 bus 2 scbus4 target 0 lun 0
Oct 18 18:36:46 r500 kernel: da2: <Generic External 2.10> Fixed Direct Acce=
ss SCSI-4 device=20
Oct 18 18:36:46 r500 kernel: da2: Serial Number 395857304652454620202020
Oct 18 18:36:46 r500 kernel: da2: 40.000MB/s transfers
Oct 18 18:36:46 r500 kernel: da2: 1907729MB (3907029168 512 byte sectors: 2=
55H 63S/T 243201C)
Oct 18 18:36:46 r500 kernel: ugen3.3: <Generic> at usbus3 (disconnected)
Oct 18 18:36:46 r500 kernel: umass2: at uhub3, port 1, addr 3 (disconnected)
Oct 18 18:36:46 r500 kernel: ugen7.3: <Western Digital> at usbus7 (disconne=
cted)
Oct 18 18:36:46 r500 kernel: umass1: at uhub7, port 1, addr 3 (disconnected)
Oct 18 18:36:46 r500 kernel: (da2:umass-sim2:2:0:0): lost device
Oct 18 18:36:46 r500 kernel: (da2:umass-sim2:2:0:0): removing device entry
Oct 18 18:36:46 r500 kernel: (pass4:umass-sim2:2:0:0): lost device
Oct 18 18:36:46 r500 kernel: (pass4:umass-sim2:2:0:0): removing device entry
Oct 18 18:36:46 r500 kernel: (da1:umass-sim1:1:0:0): lost device
Oct 18 18:36:46 r500 kernel: (pass3:umass-sim1:1:0:0): lost device
Oct 18 18:36:46 r500 kernel: (pass3:umass-sim1:1:0:0): removing device entry
Oct 18 18:36:46 r500 kernel: (da1:umass-sim1:1:0:0): Synchronize cache fail=
ed, status =3D=3D 0xa, scsi status =3D=3D 0x0
Oct 18 18:36:46 r500 kernel: (da1:umass-sim1:1:0:0): removing device entry
Oct 18 18:36:46 r500 kernel: ugen7.2: <Western Digital> at usbus7 (disconne=
cted)
Oct 18 18:36:46 r500 kernel: umass0: at uhub7, port 2, addr 2 (disconnected)
Oct 18 18:36:46 r500 kernel: (da0:umass-sim0:0:0:0): lost device
Oct 18 18:36:46 r500 kernel: (pass2:umass-sim0:0:0:0): lost device
Oct 18 18:36:46 r500 kernel: (pass2:umass-sim0:0:0:0): removing device entry
Oct 18 18:36:46 r500 kernel: (da0:umass-sim0:0:0:0): Synchronize cache fail=
ed, status =3D=3D 0xa, scsi status =3D=3D 0x0
Oct 18 18:36:46 r500 kernel: (da0:umass-sim0:0:0:0): removing device entry
Oct 18 18:36:54 r500 root: ZFS: zpool I/O failure, zpool=3Dwde2 error=3D28
Oct 18 18:36:54 r500 last message repeated 3 times
Oct 18 18:36:54 r500 root: ZFS: vdev I/O failure, zpool=3Dwde2 path=3D offs=
et=3D size=3D error=3D
Oct 18 18:36:54 r500 root: ZFS: zpool I/O failure, zpool=3Dwde3 error=3D28
Oct 18 18:36:54 r500 last message repeated 3 times
Oct 18 18:36:54 r500 root: ZFS: vdev I/O failure, zpool=3Dwde3 path=3D offs=
et=3D size=3D error=3D

fk@r500 ~ $ps waux | grep \ [z]
root     5501   0.0  0.1  13364   1288   9  I+    6:22PM   0:00.01 sudo zfs=
 receive -v poppstar/dvds/sliders/season-4
root     5502   0.0  0.1  18944   1312   9  D+    6:22PM   0:21.73 zfs rece=
ive -v poppstar/dvds/sliders/season-4
fk       4711   0.0  0.1  19980   1364  11  D+    5:10PM   0:01.03 zpool io=
stat poppstar 1
root     5947   0.0  0.1  13364   1728  14  I+    6:34PM   0:00.01 sudo zpo=
ol list
root     5948   0.0  0.1  19980   1760  14  D+    6:34PM   0:00.01 zpool li=
st
fk       5990   0.0  0.1  19980   1744  16  D+    6:36PM   0:00.01 zpool li=
st -H -o name

fk@r500 ~ $sudo procstat -kk `pgrep zfs`
  PID    TID COMM             TDNAME           KSTACK                      =
=20
 5502 100519 zfs              -                mi_switch+0x176 sleepq_wait+=
0x42 _cv_wait+0x129 txg_wait_synced+0x7c dmu_tx_assign+0x16c dmu_recv_strea=
m+0x125f zfs_ioc_recv+0x282 zfsdev_ioctl+0x8f devfs_ioctl_f+0x7b kern_ioctl=
+0x102 ioctl+0xfd syscallenter+0x331 syscall+0x4b Xfast_syscall+0xe2=20

fk@r500 ~ $sudo procstat -kk `pgrep zpool`
  PID    TID COMM             TDNAME           KSTACK                      =
=20
 5990 100507 zpool            -                mi_switch+0x176 sleepq_wait+=
0x42 _sx_xlock_hard+0x3c2 _sx_xlock+0x6e spa_all_configs+0xbf zfs_ioc_pool_=
configs+0x29 zfsdev_ioctl+0x8f devfs_ioctl_f+0x7b kern_ioctl+0x102 ioctl+0x=
fd syscallenter+0x331 syscall+0x4b Xfast_syscall+0xe2=20
 5948 100292 zpool            -                mi_switch+0x176 sleepq_wait+=
0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x39a dnode_hold_impl+0xf0 dmu_=
buf_hold+0x34 zap_lockdir+0x52 zap_cursor_retrieve+0x194 spa_prop_get+0x26f=
 zfs_ioc_pool_get_props+0xb2 zfsdev_ioctl+0x8f devfs_ioctl_f+0x7b kern_ioct=
l+0x102 ioctl+0xfd syscallenter+0x331 syscall+0x4b Xfast_syscall+0xe2=20
 4711 100269 zpool            -                mi_switch+0x176 sleepq_wait+=
0x42 _sx_xlock_hard+0x3c2 _sx_xlock+0x6e spa_open_common+0x76 spa_get_stats=
+0x42 zfs_ioc_pool_stats+0x2c zfsdev_ioctl+0x8f devfs_ioctl_f+0x7b kern_ioc=
tl+0x102 ioctl+0xfd syscallenter+0x331 syscall+0x4b Xfast_syscall+0xe2=20

At the time I've been using:
FreeBSD r500.local 9.0-CURRENT FreeBSD 9.0-CURRENT #193 r+3be3a96: Sun Oct =
17 12:51:52 CEST 2010     fk@r500.local:/usr/obj/usr/src/sys/ZOEY  amd64
with ZFS pool version 15, but I saw the problem with earlier
FreeBSD and ZFS versions, too.

Fabian

--Sig_//nUatHWA_bdzcjmJ_IYD4LJ
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (FreeBSD)

iEYEARECAAYFAkzNOfgACgkQBYqIVf93VJ0KfwCcDApuTT8QkxyMXt6HLEMOeslR
kVAAoINh2rLwsOyXgcP+5uH1vjvZv3pN
=ue8o
-----END PGP SIGNATURE-----

--Sig_//nUatHWA_bdzcjmJ_IYD4LJ--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101031104209.771cc47e>