From owner-freebsd-fs@FreeBSD.ORG Sat Apr 27 09:50:04 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0B325E0 for ; Sat, 27 Apr 2013 09:50:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id F17EE14CF for ; Sat, 27 Apr 2013 09:50:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r3R9o2QL036902 for ; Sat, 27 Apr 2013 09:50:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r3R9o25Z036901; Sat, 27 Apr 2013 09:50:02 GMT (envelope-from gnats) Date: Sat, 27 Apr 2013 09:50:02 GMT Message-Id: <201304270950.r3R9o25Z036901@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: Martin Birgmeier Subject: Re: kern/177536: [zfs] zfs livelock (deadlock) with high write-to-disk load X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Martin Birgmeier List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Apr 2013 09:50:04 -0000 The following reply was made to PR kern/177536; it has been noted by GNATS. From: Martin Birgmeier To: bug-followup@FreeBSD.org, Andriy Gapon Cc: Subject: Re: kern/177536: [zfs] zfs livelock (deadlock) with high write-to-disk load Date: Sat, 27 Apr 2013 11:40:16 +0200 So it happened again... same system (9.1.0 release), except that the kernel has been recompiled with options DDB, KDB, and STACK. I ran procstat -kk -a (twice). Output can be found in http://members.aon.at/xyzzy/procstat.-kk.-a.1.gz and http://members.aon.at/xyzzy/procstat.-kk.-a.2.gz, respectively. I also started kgdb in script(1), executing "thread apply all bt" in it. Output can be found in http://members.aon.at/xyzzy/kgdb.thread.apply.all.bt.gz. More info on the "test case": - As described in the initial report, / is a UFS GPT partition on one of 6 SATA disks. There exists a zpool "hal.1" on one (other) GPT partition on each of these disks. - VirtualBox is run by a user whose home dir is on one of the zfs file systems. - First, a big write load to another zfs file system of the same zpool was started (160 GB copy from a remote machine). - Then, 3 VBoxHeadless instances were started. ==> livelock on zfs - procstat run twice, then script + kgdb - copied output to another machine - shutdown the hung machine (via "shutdown -p") ==> "some processes would not die" ==> "syncing disks" executes until all zeros, then the system just sits there with continuous disk activity (obviously from zfs), shutdown does not proceed further - hard reset - on reboot: UFS file system check (no errors), ZFS starts fine and seems mostly unaffected (except of course that the 160 GB copy is truncated) An analysis would be appreciated, and also a hint whether I should switch to stable/9 instead. Regards, Martin