Date: Fri, 20 May 2011 08:33:59 +0100 From: Luke Marsden <luke-lists@hybrid-logic.co.uk> To: Borja Marcos <borjam@sarenet.es> Cc: Charles Sprickman <spork@bway.net>, stable@FreeBSD.org, Andriy Gapon <avg@FreeBSD.org>, Jeremy Chadwick <freebsd@jdc.parodius.com> Subject: Re: 8.1R possible zfs snapshot livelock? Message-ID: <1305876839.13971.5.camel@pow> In-Reply-To: <FCE5F082-A3BF-4A21-B2E3-FEF3EA715F2C@sarenet.es> References: <alpine.OSX.2.00.1105170120510.1983@hotlap.nat.fasttrackmonkey.com> <20110517073029.GA44359@icarus.home.lan> <4DD25264.8040305@FreeBSD.org> <20110517112952.GA48610@icarus.home.lan> <FCE5F082-A3BF-4A21-B2E3-FEF3EA715F2C@sarenet.es>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2011-05-18 at 14:05 +0200, Borja Marcos wrote:=20 > On May 17, 2011, at 1:29 PM, Jeremy Chadwick wrote: >=20 > > * ZFS send | ssh zfs recv results in ZFS subsystem hanging; > 8.1-RELEASE; > > February 2011: > > > http://lists.freebsd.org/pipermail/freebsd-fs/2011-February/010602.html >=20 > I found a reproducible deadlock condition actually. If you keep some > I/O activity on a dataset on which you are receiving a ZFS incremental > snapshot at the same time, it can deadlock. >=20 > Imagine this situation: Two servers, A and B. A dataset on server A is > replicated at regular intervals to B, so that you keep a reasonably up > to date copy. >=20 > Something like: >=20 > (Runnning on server A): >=20 > zfs snapshot thepool/thedataset@thistime > zfs send -Ri thepooll/thedataser@previoustime > hepool/thedataset@thistime | ssh serverB zfs receive -d thepool >=20 > It works, but I suffered a deadlock when one of the periodic "daily" > scripts was running. Doing some tests, I saw that ZFS can deadlock if > you do a zfs receive onto a dataset which has some read activity. > Disabling atime didn't help either. >=20 > But if you make sure *not* to access the replicated dataset it works, > I haven=C2=B4t seen it failing otherwise.=20 >=20 > If you wish to reproduce it, try creating a dataset for /usr/obj, > running make buildworld on it, replicating at, say, 30 or 60 second > intervals, and keep several scripts (or rsync) reading the target > dataset files and just copying them to another place in the usual, > "classic" way. (example: tar cf - . | ( cd /destination && tar xf -) >=20 Is there a PR for this? I'd like to see it addressed, since read-only I/O on a dataset which is being updated by `zfs recv` is an important part of what we plan to do with ZFS on FreeBSD. --=20 Best Regards, Luke Marsden CTO, Hybrid Logic Ltd. Web: http://www.hybrid-cluster.com/ Hybrid Web Cluster - cloud web hosting Phone: +447791750420
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1305876839.13971.5.camel>