From owner-freebsd-stable@FreeBSD.ORG Wed May 18 12:24:14 2011 Return-Path: Delivered-To: stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5CE8C106566B; Wed, 18 May 2011 12:24:14 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from proxypop02.sare.net (proxypop02.sare.net [194.30.18.43]) by mx1.freebsd.org (Postfix) with ESMTP id 1D7AC8FC19; Wed, 18 May 2011 12:24:13 +0000 (UTC) Received: from [172.16.1.65] (izaro.sarenet.es [192.148.167.11]) by proxypop02.sare.net (Postfix) with ESMTPSA id 1EDEB124E632; Wed, 18 May 2011 14:05:45 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=iso-8859-1 From: Borja Marcos In-Reply-To: <20110517112952.GA48610@icarus.home.lan> Date: Wed, 18 May 2011 14:05:06 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20110517073029.GA44359@icarus.home.lan> <4DD25264.8040305@FreeBSD.org> <20110517112952.GA48610@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1084) Cc: Charles Sprickman , stable@FreeBSD.org, Andriy Gapon Subject: Re: 8.1R possible zfs snapshot livelock? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 May 2011 12:24:14 -0000 On May 17, 2011, at 1:29 PM, Jeremy Chadwick wrote: > * ZFS send | ssh zfs recv results in ZFS subsystem hanging; = 8.1-RELEASE; > February 2011: > = http://lists.freebsd.org/pipermail/freebsd-fs/2011-February/010602.html I found a reproducible deadlock condition actually. If you keep some I/O = activity on a dataset on which you are receiving a ZFS incremental = snapshot at the same time, it can deadlock. Imagine this situation: Two servers, A and B. A dataset on server A is = replicated at regular intervals to B, so that you keep a reasonably up = to date copy. Something like: (Runnning on server A): zfs snapshot thepool/thedataset@thistime zfs send -Ri thepooll/thedataser@previoustime hepool/thedataset@thistime = | ssh serverB zfs receive -d thepool It works, but I suffered a deadlock when one of the periodic "daily" = scripts was running. Doing some tests, I saw that ZFS can deadlock if = you do a zfs receive onto a dataset which has some read activity. = Disabling atime didn't help either. But if you make sure *not* to access the replicated dataset it works, I = haven=B4t seen it failing otherwise.=20 If you wish to reproduce it, try creating a dataset for /usr/obj, = running make buildworld on it, replicating at, say, 30 or 60 second = intervals, and keep several scripts (or rsync) reading the target = dataset files and just copying them to another place in the usual, = "classic" way. (example: tar cf - . | ( cd /destination && tar xf -) Borja