From owner-freebsd-fs@FreeBSD.ORG Thu Oct 14 09:32:01 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 686C9106564A for ; Thu, 14 Oct 2010 09:32:01 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id F1B0D8FC0A for ; Thu, 14 Oct 2010 09:32:00 +0000 (UTC) Received: from localhost (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id C6B44153434; Thu, 14 Oct 2010 11:31:59 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by localhost (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id E0YgW54hr6Yv; Thu, 14 Oct 2010 11:31:57 +0200 (CEST) Received: from [127.0.0.1] (opteron [192.168.10.67]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.digiware.nl (Postfix) with ESMTPSA id 61782153433; Thu, 14 Oct 2010 11:31:57 +0200 (CEST) Message-ID: <4CB6CE09.9070405@digiware.nl> Date: Thu, 14 Oct 2010 11:31:53 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Borja Marcos , fs@freebsd.org References: <4CB1DD0F.6000209@digiware.nl> <98AF4752-7881-4C50-8A59-243F1AD55318@sarenet.es> <4CB5DB47.9010904@digiware.nl> <6E60196F-87D1-4687-AEE6-4964F4212B00@sarenet.es> In-Reply-To: <6E60196F-87D1-4687-AEE6-4964F4212B00@sarenet.es> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: ZFS freeze/livelock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Oct 2010 09:32:01 -0000 On 2010-10-14 11:12, Borja Marcos wrote: > > On Oct 13, 2010, at 6:16 PM, Willem Jan Withagen wrote: > >> On 2010-10-13 13:08, Borja Marcos wrote: > >> Well I think what I did more or less fits your desciption. >> >> But thusfar it did not happen. >> And I'm (very slowly) redoing some of these steps, with all debugging settings in the kernel. > > Sometimes it isn't easy to reproduce, but I found a way. Whenever a new version of FreeBSD comes out I run the following test. > > Start two machines. A and B > Update them (via make buildworld, etc) > set up an automatic replication for a dataset from A to B in 30 second or 1 minute intervals > The chosen dataset is the one which contains /usr/src and /usr/obj > run a loop of make buildworld&& make clean on A > And on B I run a couple of tasks or so that simple keep a loop copying the contents of the destination dataset (example, pool/srcobj) to a different place, for instance using "tar", so that I keep a heavy I/O activity. > > So far I can reproduce the phenomenon in less than 20 minutes. 'mmmm, I'n not really pounding my B-system that hard.... But you haven't found a way to see what lock is the actual cullpit? Let alone, try and determine how all contenders actually got there? Perhaps you should file a PR with the above means of reproducing it, just for history sake. Or perhaps you already did? But my B-system at the moment is already 3 days underway to receive a 225G volume. :( It makes progress but really really very slow. --WjW