Date: Mon, 24 Oct 2011 15:13:54 GMT From: Peter Maloney <peter.maloney@brockmann-consult.de> To: freebsd-gnats-submit@FreeBSD.org Subject: amd64/161968: renaming snapshot with -r including a zvol snapshot causes total ZFS freeze/lockup Message-ID: <201110241513.p9OFDshs043546@red.freebsd.org> Resent-Message-ID: <201110241520.p9OFK0iW064501@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 161968 >Category: amd64 >Synopsis: renaming snapshot with -r including a zvol snapshot causes total ZFS freeze/lockup >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-amd64 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Oct 24 15:20:00 UTC 2011 >Closed-Date: >Last-Modified: >Originator: Peter Maloney >Release: 8.2-STABLE FreeBSD 8.2-STABLE #0: Tue Sep 27 16:27:57 CEST 2011 root@bcnastest2.bc.local:/usr/obj/usr/src/sys/GENERIC amd64 >Organization: Brockmann Consult >Environment: FreeBSD bcnas1.bc.local 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Sep 29 15:06:03 CEST 2011 root@bcnas1.bc.local:/usr/obj/usr/src/sys/GENERIC amd64 >Description: renaming snapshot with -r including a zvol snapshot causes total ZFS freeze/lockup/deadlock. After it is locked up, any command using "zfs" "zpool" "sysctl -a", or NFS exports will freeze. And "shutdown -r" will not restart the system, only shut it down until it says the disks are all synced. CTRL+T done after zfs or zpool shows state "spa_namespace_lock". Done after "sysctl -a" shows state "g_waitfor_event". Most of the time, a simple "zfs rename" does not cause a lockup, however with a specific snapshot on one system, renaming it always causes a lockup, and on every other 8-STABLE system I have, my script always causes a lockup after a few loops. My FreeBSD 8-STABLE was installed as 8.2 release plus the mps driver, and then cvsup using this cvsupfile (removed comments): *default host=cvsup.de.FreeBSD.org *default base=/var/db *default prefix=/usr *default release=cvs tag=RELENG_8 *default delete use-rel-suffix *default date=2011.09.27.00.00.00 *default compress src-all (and the same freeze result occurs with date changed to today, Oct. 24th) # zpool get all big NAME PROPERTY VALUE SOURCE big size 39.8G - big capacity 24% - big altroot - default big health ONLINE - big guid 14576708073682355899 default big version 28 default big bootfs - default big delegation on default big autoreplace on local big cachefile - default big failmode continue local big listsnapshots on local big autoexpand off default big dedupditto 0 default big dedupratio 1.00x - big free 30.1G - big allocated 9.64G - big readonly off - # zfs get all big NAME PROPERTY VALUE SOURCE big type filesystem - big creation Thu Jul 21 11:48 2011 - big used 4.80G - big available 14.7G - big referenced 4.80G - big compressratio 1.00x - big mounted yes - big quota none default big reservation none default big recordsize 128K default big mountpoint /big default big sharenfs off default big checksum on default big compression off default big atime on default big devices on default big exec on default big setuid on default big readonly off default big jailed off default big snapdir visible local big aclmode discard default big aclinherit restricted default big canmount on default big xattr off temporary big copies 1 default big version 4 - big utf8only off - big normalization none - big casesensitivity sensitive - big vscan off default big nbmand off default big sharesmb off default big refquota none default big refreservation none default big primarycache all default big secondarycache all default big usedbysnapshots 0 - big usedbydataset 4.80G - big usedbychildren 6.70M - big usedbyrefreservation 0 - big logbias latency default big dedup off default big mlslabel - big sync standard default big refcompressratio 1.00x - # zfs list NAME USED AVAIL REFER MOUNTPOINT big 4.80G 14.7G 4.80G /big big@testcrashsnap4 0 - 4.80G - zroot 5.64G 109G 894M legacy zroot/tmp 2.14M 109G 2.14M /tmp zroot/usr 4.72G 109G 2.45G /usr zroot/usr/home 53.5K 109G 53.5K /usr/home zroot/usr/obj 922M 109G 922M /usr/objtmp zroot/usr/ports 1.07G 109G 941M /usr/ports zroot/usr/ports/distfiles 150M 109G 150M /usr/ports/distfiles zroot/usr/ports/packages 21K 109G 21K /usr/ports/packages zroot/usr/src 314M 109G 314M /usr/src zroot/var 17.6M 109G 904K /var zroot/var/crash 22.5K 109G 22.5K /var/crash zroot/var/db 16.2M 109G 15.1M /var/db zroot/var/db/pkg 1.10M 109G 1.10M /var/db/pkg zroot/var/empty 21K 109G 21K /var/empty zroot/var/log 272K 109G 272K /var/log zroot/var/mail 48K 109G 48K /var/mail zroot/var/run 50K 109G 50K /var/run zroot/var/tmp 23K 109G 23K /var/tmp # cat /boot/loader.conf zfs_load="YES" vfs.root.mountfrom="zfs:zroot" /etc/sysctl.conf is nothing but comments On a virtual machine where I have 8.2 release (not stable), I don't know how to reproduce the problem. I also tested it on the latest downloaded with cvsup today, which freezes the same way. All my zfs systems are amd64. I was hoping to use a zvol for iSCSI and use snapshots, so simply avoiding using snapshots on zvols is unacceptable. >How-To-Repeat: Prerequisite: A system running 8.2-STABLE (more specifically using *default date=2011.09.27.00.00.00 in cvsup). (1) Create a zpool. [root@bcnastest2 ~]# zpool status big pool: big state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM big ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad16 ONLINE 0 0 0 cache gpt/cache0 ONLINE 0 0 0 errors: No known data errors (2) create a zvol in the above zpool. [root@bcnastest2 ~]# zfs create -V 100m big/testzvol (3) run this script as root (written in bash, works in sh too except for the count printout; make sure to set dataset variable) #-------begin script------- dataset=big count=0 while true; do echo Snapshot zfs destroy -r ${dataset}@testcrashsnap >/dev/null 2>&1 zfs snapshot -r ${dataset}@testcrashsnap || break current="" for next in 1 2 3 4 5; do echo Renaming from ${current} to ${next} zfs destroy -r ${dataset}@testcrashsnap${next} >/dev/null 2>&1 zfs rename -r ${dataset}@testcrashsnap${current} ${dataset}@testcrashsnap${next} || break current=${next} done echo Destroy zfs destroy -r ${dataset}@testcrashsnap${current} || break let count++ echo $count done #-------end script------- Result: After an arbitrary number of loops, the output stops. Here is the output including result from hitting CTRL+C, CTRL+Z and Ctrl+T. The script was run on a Friday. The last line of output from Ctrl+t was done on the following Monday. ============================================ Snapshot Renaming from to 1 Renaming from 1 to 2 Renaming from 2 to 3 Renaming from 3 to 4 Renaming from 4 to 5 Destroy 1 Snapshot Renaming from to 1 Renaming from 1 to 2 Renaming from 2 to 3 Renaming from 3 to 4 Renaming from 4 to 5 Destroy 2 Snapshot Renaming from to 1 Renaming from 1 to 2 Renaming from 2 to 3 Renaming from 3 to 4 Renaming from 4 to 5 Destroy 3 Snapshot Renaming from to 1 Renaming from 1 to 2 Renaming from 2 to 3 Renaming from 3 to 4 ^C load: 1.32 cmd: zfs 2363 [tx->tx_sync_done_cv)] 5.56r 0.00u 0.00s 0% 1696k load: 1.32 cmd: zfs 2363 [tx->tx_sync_done_cv)] 6.07r 0.00u 0.00s 0% 1696k load: 1.32 cmd: zfs 2363 [tx->tx_sync_done_cv)] 6.26r 0.00u 0.00s 0% 1696k load: 1.46 cmd: zfs 2363 [tx->tx_sync_done_cv)] 13.42r 0.00u 0.00s 0% 1696k ^C^C^C load: 1.89 cmd: zfs 2363 [tx->tx_sync_done_cv)] 36.59r 0.00u 0.00s 0% 1696k ^C^D load: 0.01 cmd: zfs 2363 [tx->tx_sync_done_cv)] 230096.99r 0.00u 0.00s 0% 1696k ============================================ >Fix: >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201110241513.p9OFDshs043546>