From owner-freebsd-fs@FreeBSD.ORG Fri Jul 13 13:47:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EB984106566B for ; Fri, 13 Jul 2012 13:47:22 +0000 (UTC) (envelope-from freebsd@penx.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id B87A38FC15 for ; Fri, 13 Jul 2012 13:47:22 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q6DDlIuG082016; Fri, 13 Jul 2012 06:47:18 -0700 (PDT) (envelope-from freebsd@penx.com) From: Dennis Glatting To: Volodymyr Kostyrko In-Reply-To: <4FFFE82B.6010109@gmail.com> References: <1341864787.32803.43.camel@btw.pki2.com> <4FFFE82B.6010109@gmail.com> Content-Type: text/plain; charset="us-ascii" Date: Fri, 13 Jul 2012 06:47:18 -0700 Message-ID: <1342187238.60733.27.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q6DDlIuG082016 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: freebsd@penx.com Cc: freebsd-fs@freebsd.org Subject: Re: ZFS hanging X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jul 2012 13:47:23 -0000 On Fri, 2012-07-13 at 12:19 +0300, Volodymyr Kostyrko wrote: > Dennis Glatting wrote: > > I have a ZFS array of disks where the system simply stops as if forever > > blocked by some IO mutex. This happens often and the following is the > > output of top: > > Try switching to clang. Some time ago I was hit by different error - > some process hangs indefinitely and can't be killed. After building > system with clang I obtained a core dump at first reboot and research > turned out that there was some broken directory entry in file system. > Recreating damaged zfs filesystem (leaving all other pool intact) solved > my problem completely. > I am using clang except on my CVS mirrors. I found on the mirrors that the mirror itself cannot update from itself but other hosts can update from the mirror. Somewhere in that M3/assembly muck something crashes in the process. The only way around the problem is to compile the /OS/ using GCC. On the system in question(iirc) I rebuilt the pool yesterday -- I'm in the process of updating parts across my systems. I also wanted to fool around with different ZFS architectures. This morning, with a load average throughout the night of 42 on a 32 core system writing 4TB of data, it is still alive and kicking but its early in the run.