Date: Thu, 08 Jul 2010 19:58:08 +0100 From: Karl Pielorz <kpielorz_lst@tdx.co.uk> To: freebsd-geom@freebsd.org Subject: FreeBSD 7.3-Stable / GEOM issue with ZFS attach/replace & zvol's... Message-ID: <6F0C8FABB57A0A91965413C1@Octa64>
next in thread | raw e-mail | index | archive | help
Hi All, I posted a few days ago in -fs and -hackers - but never got any reply. I've done some digging around now, and been able to reproduce the problem below on another machine (by sending my ZFS zvol's & snapshots to it). I'm running 7.3-STABLE on an amd64, w/10Gb of RAM, and 2 * dual core Opteron 285's. In a nutshell: A zfs attach/replace (or similar) on my system results in GEOM iterating through all the 'drives' on the system (which is apparently normal). When it encounters some of my ZFS volume snapshots (which are GELI encrypted) it appears to 'hang' and the zfs attach/replace never completes. Remove the snapshot it hangs on - and it hangs on another. Remove all the snapshots/volumes - and the ZFS command completes without issue. At the moment this is stopping me from replacing a failing drive which is part of a zpool mirror set :( e.g. With GEOM debugging turned on, I get: host# zfs attach vol ad34 ad40 " [GEOM complains the guid for ad40 doesn't match what it wants - and then starts iterating through all the disk devices one after another... The guid mismatch appears 'normal' - i.e. it always happens - even on working systems] Jul 5 19:42:50 host kernel: g_access(0xffffff0035015380(zvol/vol2/zfs_backups/scanned), 1, 0, 0) Jul 5 19:42:50 host kernel: open delta:[r1w0e0] old:[r0w0e0] provider:[r0w0e0] 0xffffff000e1fd000(zvol/vol2/zfs_backups/scanned) Jul 5 19:42:50 host kernel: g_access(0xffffff0035015380(zvol/vol2/zfs_backups/scanned), -1, 0, 0) Jul 5 19:42:50 host kernel: open delta:[r-1w0e0] old:[r1w0e0] provider:[r1w0e0] 0xffffff000e1fd000(zvol/vol2/zfs_backups/scanned) Jul 5 19:42:50 host kernel: g_detach(0xffffff0035015380) Jul 5 19:42:50 host kernel: g_access(0xffffff0035015380(zvol/vol/scanned@1237495449), 1, 0, 0) Jul 5 19:42:50 host kernel: open delta:[r1w0e0] old:[r0w0e0] provider:[r0w0e0] 0xffffff000e60b300(zvol/vol/scanned@1237495449) **** ZFS [hangs here] - as does anything that subsequently touches ZFS *** " ps axl at that point shows: " 0 2250 2004 0 -8 0 14460 2044 g_wait D+ p0 0:00.01 zpool attach vol ad34 ad40 " So it appears to be hung in 'g_wait'. If I then reboot, and do: "zfs destroy vol/scanned@1237495449" Then try the attach again - it hangs on another snapshot of 'vol/scanned' (e.g. 'vol/scanned@1274617895') next time round. If I destroy all of them: "zfs destroy -r vol/scanned" The attach completes without issue. All those snapshots can be dd'd from without issue (or mounted when attached via GELI etc.) - none of the snapshots or GELI volumes are mounted when I do the attach/replace. zpool status, and an ls of '/dev/zvol/vol' are below. It *looks* like GEOM is seeing something it doesn't like, and hanging? The system has worked fine for coming up to a year with ZFS - I have replaced/attached drives in the past - but that was under 7.2-Stable. Is there any additional GEOM debugging I can enable? (or any possible workarounds - i.e. something I can do to get GEOM to ignore the ZVol's?) -Karl zpool status: pool: vol state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM vol ONLINE 0 0 0 mirror ONLINE 0 0 0 ad28 ONLINE 0 0 0 ad12 ONLINE 0 0 0 mirror ONLINE 0 0 0 ad14 ONLINE 0 0 0 ad30 ONLINE 0 0 0 mirror ONLINE 0 0 0 ad16 ONLINE 0 0 0 ad32 ONLINE 0 0 0 mirror ONLINE 0 0 0 ad18 ONLINE 0 0 0 ad34 ONLINE 0 0 0 mirror ONLINE 0 0 0 ad20 ONLINE 0 0 0 ad36 ONLINE 0 0 0 mirror ONLINE 0 0 0 ad22 ONLINE 0 0 0 ad38 ONLINE 0 0 0 spares ad42 AVAIL (ad40 is also spare - but not linked to any pools) ls /dev/zvol/vol crw-r----- 1 root operator 0, 162 Jul 5 19:55 scanned crw-r----- 1 root operator 0, 172 Jul 5 19:55 scanned@1237495449 crw-r----- 1 root operator 0, 164 Jul 5 19:55 scanned@1238970339 crw-r----- 1 root operator 0, 167 Jul 5 19:55 scanned@1239143782 crw-r----- 1 root operator 0, 165 Jul 5 19:55 scanned@1244575946 crw-r----- 1 root operator 0, 163 Jul 5 19:55 scanned@1247670305 crw-r----- 1 root operator 0, 168 Jul 5 19:55 scanned@1251063149 crw-r----- 1 root operator 0, 166 Jul 5 19:55 scanned@1256072040 crw-r----- 1 root operator 0, 169 Jul 5 19:55 scanned@1259364830 crw-r----- 1 root operator 0, 170 Jul 5 19:55 scanned@1267226353 crw-r----- 1 root operator 0, 171 Jul 5 19:55 scanned@1274617895 crw-r----- 1 root operator 0, 195 Jul 5 19:55 scanned@1278362753
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6F0C8FABB57A0A91965413C1>