From owner-freebsd-fs@FreeBSD.ORG Wed Oct 13 11:40:20 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A5FA51065673 for ; Wed, 13 Oct 2010 11:40:20 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from proxypop2b.sarenet.es (proxypop2b.sarenet.es [194.30.0.107]) by mx1.freebsd.org (Postfix) with ESMTP id 6A5C38FC18 for ; Wed, 13 Oct 2010 11:40:20 +0000 (UTC) Received: from [172.16.1.55] (ssglan.sare.net [192.148.167.100]) by proxypop2b.sarenet.es (Postfix) with ESMTP id 62C517313A; Wed, 13 Oct 2010 13:08:56 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: Borja Marcos In-Reply-To: <4CB1DD0F.6000209@digiware.nl> Date: Wed, 13 Oct 2010 13:08:56 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <98AF4752-7881-4C50-8A59-243F1AD55318@sarenet.es> References: <4CB1DD0F.6000209@digiware.nl> To: Willem Jan Withagen X-Mailer: Apple Mail (2.1081) Cc: fs@freebsd.org Subject: Re: ZFS freeze/livelock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Oct 2010 11:40:20 -0000 On Oct 10, 2010, at 5:34 PM, Willem Jan Withagen wrote: > Hi, >=20 > Just had my FreeBSD freeze on me with what I would think is sort of an = livelock.... >=20 > While I was receiving zfs snapshots on my data pool. >=20 > Top and systat just kept running, > but anything getting near a shell (and perhaps disk-io) ended up in: >=20 > root@zfs.digiware.nl# gpart create -s gpt da6 > load: 0.00 cmd: csh 12393 [zfsvfs->z_teardown_inactive_lock] 26.12r = 0.00u 0.00s 0% 2480k > load: 0.10 cmd: csh 12393 [zfsvfs->z_teardown_inactive_lock] 96.01r = 0.00u 0.00s 0% 2480k >=20 > Trying to execute to execute shutdown -r now had no effect what so = ever. > Neither did the three-finger salute. > (Well at least not in 60 sec I was willing to wait.) >=20 > Only way out of this situation was hard-reset. And I do have to admit = I like ZFS for the speed it recovers after unexpected reboot. >=20 > To bad there was no alt-ctrl-backspace escape to debugger compiled in. = I'll do that with the next kernel, just in case. There is an (as far as I know) unsolved deadlock situation when = receiving a snapshot while you read the target dataset. I found it in a redundant server configuration. I replicate some = datasets periodically doing an incremental send-receive. It works = perfectly but it can deadlock if I have a process reading the = destination dataset on the secondary server. And those things can happen = if you have, for example, one of the nightly periodic tasks running.=20 Were you doing a siimilar thing? Or are you sure there was no reading = activity on the destination dataset? Borja.