From owner-freebsd-stable@FreeBSD.ORG Tue Sep 29 08:43:43 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D8BB9106566B for ; Tue, 29 Sep 2009 08:43:43 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from proxypop2.sarenet.es (proxypop2.sarenet.es [194.30.0.95]) by mx1.freebsd.org (Postfix) with ESMTP id 9DAFC8FC17 for ; Tue, 29 Sep 2009 08:43:43 +0000 (UTC) Received: from [172.16.1.204] (izaro.sarenet.es [192.148.167.11]) by proxypop2.sarenet.es (Postfix) with ESMTP id E4F2773406; Tue, 29 Sep 2009 10:43:41 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v1076) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes From: Borja Marcos In-Reply-To: <089F63A7-574B-4646-97C7-D82B226CD4CF@sarenet.es> Date: Tue, 29 Sep 2009 10:43:41 +0200 Content-Transfer-Encoding: 7bit Message-Id: <6C7DE346-65C5-4130-86B8-56A60A1DAC28@sarenet.es> References: <089F63A7-574B-4646-97C7-D82B226CD4CF@sarenet.es> To: Borja Marcos X-Mailer: Apple Mail (2.1076) Cc: freebsd-stable@freebsd.org Subject: Re: 8.0RC1, ZFS: deadlock X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Sep 2009 08:43:43 -0000 On Sep 29, 2009, at 10:29 AM, Borja Marcos wrote: > > Hello, > > I have observed a deadlock condition when using ZFS. We are making a > heavy usage of zfs send/zfs receive to keep a replica of a dataset > on a remote machine. It can be done at one minute intervals. Maybe > we're doing a somehow atypical usage of ZFS, but, well, seems to be > a great solution to keep filesystem replicas once this is sorted out. > > > How to reproduce: > > Set up two systems. A dataset with heavy I/O activity is replicated > from the first to the second one. I've used a dataset containing / > usr/obj while I did a make buildworld. > > Replicate the dataset from the first machine to the second one using > an incremental send > > zfs send -i pool/dataset@Nminus1 pool/dataset@N | ssh destination > zfs receive -d pool > > When there is read activity on the second system, reading the > replicated system, I mean, having read access while zfs receive is > updating it, there can be a deadlock. We have discovered this doing > a test on a hopefully soon in production server, with 8 GB RAM. A > Bacula backup agent was running and ZFS deadlocked. Sorry, forgot to explain what was happening on the second system (the one receiving the incremental snapshots) for the deadlock to happen. It was just running an endless loop, copying the contents of /usr/obj to another dataset, in order to keep the reading activity going on. That's how it has deadlocked. On the original test system an rsync did the same trick. Borja