From owner-freebsd-stable@FreeBSD.ORG Tue Mar 9 12:44:00 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D3E53106564A; Tue, 9 Mar 2010 12:44:00 +0000 (UTC) (envelope-from stb@lassitu.de) Received: from gilb.zs64.net (gilb.zs64.net [212.12.50.234]) by mx1.freebsd.org (Postfix) with ESMTP id 9D4BE8FC1E; Tue, 9 Mar 2010 12:44:00 +0000 (UTC) Received: by gilb.zs64.net (Postfix, from stb@lassitu.de) id 501FB4F63B; Tue, 9 Mar 2010 12:43:59 +0000 (UTC) Mime-Version: 1.0 (Apple Message framework v1077) Content-Type: text/plain; charset=us-ascii From: Stefan Bethke In-Reply-To: <20100309122954.GE3155@garage.freebsd.pl> Date: Tue, 9 Mar 2010 13:43:58 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <864468D4-DCE9-493B-9280-00E5FAB2A05C@lassitu.de> <20100309122954.GE3155@garage.freebsd.pl> To: Pawel Jakub Dawidek X-Mailer: Apple Mail (2.1077) Cc: FreeBSD Stable Subject: Re: Many processes stuck in zfs X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Mar 2010 12:44:00 -0000 Am 09.03.2010 um 13:29 schrieb Pawel Jakub Dawidek: > On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote: >> Over the past couple of months, I've more or less regularly observed = machines having more and more processes stuck in the zfs wchan. The = processes never recover from that, and trying to reboot only gets the = entire system stuck, without any console messages. I can enter the = debugger, and I have saved a couple of dumps. >>=20 >> The situation seems to be triggered by zfs receive'ing snapshots from = the sister machine (both synchronize their active ZFS filesystems to = each other, using zfs send and zfs receive). It appears it's the = receiving causing trouble. >>=20 >> Both machines run 8-stable from mid-February, with a single-disk ZFS = pool, with ARC limited to 512M, prefetch and ZIL disabled via = loader.conf. >>=20 >> What should I be looking at to further diagnose? >=20 > What kind of hardware do you have there? There is 3-way deadlock I've = a > fix for which would be hard to trigger on single or dual core = machines. FreeBSD lokschuppen.zs64.net 8.0-STABLE FreeBSD 8.0-STABLE #24: Sat Feb = 13 11:20:03 UTC 2010 = root@lokschuppen.zs64.net:/usr/obj/usr/src/sys/EISENBOOT amd64 Copyrig ht (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights = reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.0-STABLE #24: Sat Feb 13 11:20:03 UTC 2010 root@lokschuppen.zs64.net:/usr/obj/usr/src/sys/EISENBOOT amd64 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz (2666.65-MHz = K8-class CPU) Origin =3D "GenuineIntel" Id =3D 0x10676 Stepping =3D 6 = Features=3D0xbfebfbff = Features2=3D0x8e39d AMD Features=3D0x20100800 AMD Features2=3D0x1 TSC: P-state invariant real memory =3D 4294967296 (4096 MB) avail memory =3D 4081422336 (3892 MB) > Feel free to try the fix: >=20 > http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch I'll give it a shot on one of the two boxes. Stefan --=20 Stefan Bethke Fon +49 151 14070811