Date: Mon, 22 Aug 2011 12:02:11 +0100 From: Luke Marsden <luke-lists@hybrid-logic.co.uk> To: freebsd-fs@freebsd.org Subject: Re: kern/157728: [zfs] zfs (v28) incremental receive may leave behind temporary clones Message-ID: <1314010931.3477.138.camel@pow> In-Reply-To: <201108221015.p7MAFHpi048670@freefall.freebsd.org> References: <201108221015.p7MAFHpi048670@freefall.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2011-08-22 at 10:15 +0000, mm@FreeBSD.org wrote: > Synopsis: [zfs] zfs (v28) incremental receive may leave behind temporary clones > > State-Changed-From-To: open->closed > State-Changed-By: mm > State-Changed-When: Mon Aug 22 10:15:16 UTC 2011 > State-Changed-Why: > Resolved. Thanks! Brilliant, thanks for fixing this! Do you have any thoughts about what might have caused the other issue I reported, the deadlock? From my email of the 15th July (mfsbsd-se-8.2-zfsv28-amd64 19.06.2011): The biggest issue was a DEADLOCK which occurs quite reliably with a given sequence of events in short succession, on a chroot filesystem with many snapshots and a MySQL socket and nullfs mounts inside it: 1. Force unmount the nullfs mounts which are mounted on top of it 2. Close the MySQL socket in /tmp 3. Force unmount the actual filesystem (even if there are open FDs) 4. 'zfs rename' the filesystem into our 'trash' filesystem (which I understand consists of a clone, promote and destroy) The entire ZFS subsystem then hangs on any new I/O. Here is a procstat of the zfs rename process which hangs after the force unmount: 25674 100871 zfs initial thread mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_wait_synced+0x85 dsl_sync_task_group_wait+0x128 dsl_sync_task_do+0x54 dsl_dir_rename+0x8f dsl_dataset_rename+0x272 zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioctl +0x102 ioctl+0xfd syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 Unfortunately it's not easy to reproduce, it only seems to happen in an environment which is under load with a lot of datasets and a lot of zfs operations happening concurrently on other datasets. I spent two days trying to reproduce it in self-contained test environments but had no luck, so I'm now reporting it anyway. -- Best Regards, Luke Marsden CTO, Hybrid Logic Ltd. Web: http://www.hybrid-cluster.com/ Hybrid Web Cluster - cloud web hosting Mobile: +1-415-449-1165 (US) / +447791750420 (UK)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1314010931.3477.138.camel>