From owner-freebsd-fs@FreeBSD.ORG Sat Oct 20 23:37:47 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2E7AB1FF; Sat, 20 Oct 2012 23:37:47 +0000 (UTC) (envelope-from freebsd@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id C45E28FC08; Sat, 20 Oct 2012 23:37:46 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q9KNbd7Q043186; Sat, 20 Oct 2012 16:37:39 -0700 (PDT) (envelope-from freebsd@pki2.com) Subject: Re: ZFS hang (system #2) From: Dennis Glatting To: Andriy Gapon In-Reply-To: <508322EC.4080700@FreeBSD.org> References: <1350698905.86715.33.camel@btw.pki2.com> <1350711509.86715.59.camel@btw.pki2.com> <50825598.3070505@FreeBSD.org> <1350744349.88577.10.camel@btw.pki2.com> <1350765093.86715.69.camel@btw.pki2.com> <508322EC.4080700@FreeBSD.org> Content-Type: text/plain; charset="ISO-8859-1" Date: Sat, 20 Oct 2012 16:37:39 -0700 Message-ID: <1350776259.86715.83.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q9KNbd7Q043186 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: freebsd@pki2.com Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 23:37:47 -0000 On Sun, 2012-10-21 at 01:17 +0300, Andriy Gapon wrote: > on 20/10/2012 23:31 Dennis Glatting said the following: > > The following is from a second system working on a 7TB file (started this > > morning), which also hung. However, an important difference is this system's CPU > > is slightly over clocked from 3.6GHz to 4.0GHz; However, prior not over clocking > > made no difference -- it still hanged. > > > > This system has a Gigabyte GA-990FXA-UD7 board. > > To me this again looks like an issue with a stuck zio/bio, and not a deadlock. > > > bd3# /mnt/camcontrol tags da7 -v (** OS - RAID1 **) > > (pass7:mps1:0:0:0): dev_openings 215 > > (pass7:mps1:0:0:0): dev_active 40 > > (pass7:mps1:0:0:0): devq_openings 215 > > (pass7:mps1:0:0:0): devq_queued 0 > > (pass7:mps1:0:0:0): held 0 > > (pass7:mps1:0:0:0): mintags 2 > > (pass7:mps1:0:0:0): maxtags 255 > > Of all the disks this one looks the most suspicious, of course. > > Do you have the zio/bio debug patch there and usable kgdb? > In src/sys/dev/mps/mps.c are the following tunables. /* Grab the unit-instance variables */ snprintf(tmpstr, sizeof(tmpstr), "dev.mps.%d.debug_level", device_get_unit(sc->mps_dev)); TUNABLE_INT_FETCH(tmpstr, &sc->mps_debug); snprintf(tmpstr, sizeof(tmpstr), "dev.mps.%d.disable_msix", device_get_unit(sc->mps_dev)); TUNABLE_INT_FETCH(tmpstr, &sc->disable_msix); snprintf(tmpstr, sizeof(tmpstr), "dev.mps.%d.disable_msi", device_get_unit(sc->mps_dev)); TUNABLE_INT_FETCH(tmpstr, &sc->disable_msi); snprintf(tmpstr, sizeof(tmpstr), "dev.mps.%d.max_chains", device_get_unit(sc->mps_dev)); TUNABLE_INT_FETCH(tmpstr, &sc->max_chains); Whose after boot values are: dev.mps.0.debug_level: 4 dev.mps.0.disable_msix: 0 dev.mps.0.disable_msi: 0 dev.mps.0.firmware_version: 14.00.00.00 dev.mps.0.driver_version: 14.00.00.01-fbsd dev.mps.0.io_cmds_active: 0 dev.mps.0.io_cmds_highwater: 60 dev.mps.0.chain_free: 2048 dev.mps.0.chain_free_lowwater: 2047 dev.mps.0.max_chains: 2048 dev.mps.0.chain_alloc_fail: 0 Is there any value to tweaking these?