From owner-freebsd-fs@FreeBSD.ORG Fri Jul 15 12:30:44 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 36FD01065670 for ; Fri, 15 Jul 2011 12:30:44 +0000 (UTC) (envelope-from luke@digital-crocus.com) Received: from mail.digital-crocus.com (node2.digital-crocus.com [91.209.244.128]) by mx1.freebsd.org (Postfix) with ESMTP id E4A9F8FC18 for ; Fri, 15 Jul 2011 12:30:43 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dkselector; d=hybrid-logic.co.uk; h=Received:Received:Subject:From:To:Cc:Content-Type:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Spam-Score:X-Digital-Crocus-Maillimit:X-Authenticated-Sender:X-Complaints:X-Admin:X-Abuse; b=prkmMWgEIhbVtjY6Q//xMBPvum82ShlZLy/J9yMbalQUJlymw3OI+b44B294ttlKsxagej1LXC2H+RSfrDd07N8HPCunVW4r5Y8nmDV4wNC2QVtEKG/7KSbnyyh0yvS+; Received: from luke by mail.digital-crocus.com with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1QhhWj-000JAw-Qw for freebsd-fs@freebsd.org; Fri, 15 Jul 2011 13:29:53 +0100 Received: from vlan111.pact.srf.ac.uk ([193.37.225.200] helo=[10.0.111.134]) by mail.digital-crocus.com with esmtpa (Exim 4.69 (FreeBSD)) (envelope-from ) id 1QhhWe-000J2z-QH; Fri, 15 Jul 2011 13:29:53 +0100 From: Luke Marsden To: freebsd-fs@freebsd.org Content-Type: text/plain; charset="UTF-8" Date: Fri, 15 Jul 2011 13:30:49 +0100 Message-ID: <1310733049.26698.69.camel@behemoth> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Digital-Crocus-Maillimit: done X-Authenticated-Sender: luke X-Complaints: abuse@digital-crocus.com X-Admin: admin@digital-crocus.com X-Abuse: abuse@digital-crocus.com (Please include full headers in abuse reports) Cc: tech@hybrid-logic.co.uk Subject: Experiences with ZFS v28 - including deadlock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jul 2011 12:30:44 -0000 Hi all, Having just quite extensively tested the v28 patchset contained within http://mfsbsd.vx.sk/iso/mfsbsd-se-8.2-zfsv28-amd64.iso (updated 19.06.2011) I wanted to share my experiences in the hope that the issues I encountered can be fixed before 8.3 ;-) The biggest issue was a DEADLOCK which occurs quite reliably with a given sequence of events in short succession, on a chroot filesystem with many snapshots and a MySQL socket and nullfs mounts inside it: 1. Force unmount the nullfs mounts which are mounted on top of it 2. Close the MySQL socket in /tmp 3. Force unmount the actual filesystem (even if there are open FDs) 4. 'zfs rename' the filesystem into our 'trash' filesystem (which I understand consists of a clone, promote and destroy) The entire ZFS subsystem then hangs on any new I/O. Here is a procstat of the zfs rename process which hangs after the force unmount: 25674 100871 zfs initial thread mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_wait_synced+0x85 dsl_sync_task_group_wait+0x128 dsl_sync_task_do+0x54 dsl_dir_rename+0x8f dsl_dataset_rename+0x272 zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioctl +0x102 ioctl+0xfd syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 Unfortunately it's not easy to reproduce, it only seems to happen in an environment which is under load with a lot of datasets and a lot of zfs operations happening concurrently on other datasets. I spent two days trying to reproduce it in self-contained test environments but had no luck, so I'm now reporting it anyway. There were two other issues which came up: 1. http://www.freebsd.org/cgi/query-pr.cgi?pr=157728 - we worked around this with a semaphore on 'zfs list' and 'zfs recv' so they never ran simultaneously. 2. After an incremental receive, v28 seems to like to mount the filesystem even if it was unmounted at the start of the receive. (Notably, on previous versions of ZFS, this only happened for non-incremental receives where the filesystem was being created by the receive -- incremental receives correctly left the filesystem in the mount state it started in). This plays very badly when the filesystem then gets modified before we can force unmount it (which we do immediately), because in this case the next receive operation will fail with "filesystem has modifications" - which we handle, but it's expensive to do so on every incremental receive. I had a conversation with jhell on IRC about this and he had this to say: its happened twice before with ZFS basically a lock being held and never free'd something there is happening between the snapshots and datasets though. seems that it for some reason is able to destroy the dataset before it destroys all the snapshots properly then tries to do the renaming of the snapshots and leads to a lock not being free()'d or similar Maybe this can offer a hint for someone to go looking in the right direction to solve this? Thank you for working on ZFS in FreeBSD! v15 is working very well for us. -- Best Regards, Luke Marsden CTO, Hybrid Logic Ltd. Mobile: +447791750420 www.hybrid-cluster.com - Cloud web hosting platform