From owner-freebsd-fs@FreeBSD.ORG  Tue Apr 26 13:49:07 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3A6C9106566B
	for <freebsd-fs@freebsd.org>; Tue, 26 Apr 2011 13:49:07 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from QMTA11.westchester.pa.mail.comcast.net
	(qmta11.westchester.pa.mail.comcast.net [76.96.59.211])
	by mx1.freebsd.org (Postfix) with ESMTP id D8CD78FC0C
	for <freebsd-fs@freebsd.org>; Tue, 26 Apr 2011 13:49:05 +0000 (UTC)
Received: from omta20.westchester.pa.mail.comcast.net ([76.96.62.71])
	by QMTA11.westchester.pa.mail.comcast.net with comcast
	id cDlg1g00A1YDfWL5BDp6Mw; Tue, 26 Apr 2011 13:49:06 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta20.westchester.pa.mail.comcast.net with comcast
	id cDp41g01Q1t3BNj3gDp5l7; Tue, 26 Apr 2011 13:49:06 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 2879A9B418; Tue, 26 Apr 2011 06:49:03 -0700 (PDT)
Date: Tue, 26 Apr 2011 06:49:03 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Conall O'Brien <conall@conall.net>
Message-ID: <20110426134903.GA62578@icarus.home.lan>
References: <BANLkTinYp674E=96PhMaR0+Uy9e9B6boVA@mail.gmail.com>
	<BANLkTimQ4FWnC12O3cDtptJR+vA2PcNqYA@mail.gmail.com>
	<BANLkTikbPsf1d3p687RDVsaL_FO0KgKbfA@mail.gmail.com>
	<BANLkTi=Jban2q6h0HEpEMhWrfr56k1O_Jw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <BANLkTi=Jban2q6h0HEpEMhWrfr56k1O_Jw@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: Problems Terminating zpool scrub...
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Apr 2011 13:49:07 -0000

On Tue, Apr 26, 2011 at 02:25:00PM +0100, Conall O'Brien wrote:
> On 26 April 2011 13:15, ambrosehuang ambrose <ambrosehua@gmail.com> wrote:
> > Could you post your PR number?I was curious about the driver used by
> > West Digital Disk, cause I use
> > the WR10EARS?
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=156647
> 
> I chalked it up to the SATA controller, since only 2 of my 5 identical
> WD20EARS disks were reporting DMA issues.
> 
> >
> > 2011/4/25 Conall O'Brien <conall@conall.net>
> >>
> >> On 15 April 2011 15:59, Conall O'Brien <conall@conall.net> wrote:
> >> > Hello,
> >> >
> >> >
> >> > I've got a NAS box running 8-STABLEW [1] which I'm running with 5x
> >> > Western Digital 2TB disks.
> >> >
> >> >
> >> > One of the disks was having DMA issues as reported in dmesg, so I
> >> > began the usual zfs workflow of "zpool offline pool dev", physically
> >> > removing it and tried to "zpool replace pool dev" but my attempts to
> >> > do so fail, actually the zpool command keeps ending up in
> >> > uninterruptable wait (the D state). Before resorting to replacing the
> >> > disk, a zpool scrub was in progress. Now, I can't kill it using "zpool
> >> > scrub -s pool", it too ends up in the D state.
> >> >
> >> >
> >> > Is there another way than "zpool scrub -s pool" to terminate a scrub
> >> > process, so I can proceed with the disk replacement. I care more about
> >> > resilvering my pool before getting around to scrubbing it.
> >> >
> >> >
> >> > Thanks!
> >> >
> >> >
> >> > [1] For completeness, uname -a reports FreeBSD galvatron.taku.ie
> >> > 8.2-STABLE FreeBSD 8.2-STABLE #1: Sat Mar 19 13:18:46 UTC 2011
> >> > root@galvatron.taku.ie:/usr/src/obj/usr/src/sys/GALVATRON ??amd64
> >>
> >> I worked out the problem. There's a regression in one of the drivers
> >> between the kernel I was running and my previous kernel:
> >>
> >> FreeBSD galvatron.taku.ie 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #0:
> >> Wed Dec 29 04:00:27 UTC 2010
> >> root@galvatron.taku.ie:/usr/src/obj/usr/src/sys/GALVATRON ??amd64
> >>
> >>
> >> I'll file a PR to get it fixed.

The PR is extremely terse/sub-part quality.  There isn't actual evidence
of the problem being a driver regression.  What needs to be provided in
the PR:

- Relevant dmesg output (pertaining to ataX and adX devices and anything
  else seen around that time; stuff from /var/adm/messages might be more
  useful since it contains timestamps)
- Full dmesg seen during a fresh reboot
- vmstat -i
- atacontrol cap ataX (for each ataX channel.  You can XXX out the
  serial number if desired)
- smartctl -a /dev/adX (for each disk, be sure to label which disk
  is associated with what data.  You can XXX out the serial number if
  desired)

What really needs to be shown are the actual errors themselves, and in
sequential order / with timestamps.  "DMA errors" is too vague; I want
to assume READ_DMA48 but I cannot assume that.

Next:

I'm not sure if your system support its, but can you run the controller
in AHCI mode (BIOS setting) and load ahci.ko instead (ahci_load="yes" in
/boot/loader.conf, your disks will change to /dev/adaX)?  If so, this
would allow you to narrow down whether or not the issue is truly a
driver problem.  You should try this *before* attempting the below.

Next:

Try updating your source to something newer than March 19th.  There have
been ata(4) changes since then that might pertain to your issue.  If the
same issue happens on a present-day build of RELENG_8 then we can start
by trying to narrow it down to commits between, roughly, late December
2010 to mid-March 2011.  Since you follow RELENG_8, you will need to
follow commits.  src/sys/dev/ata is what's relevant here, as well as the
chipsets/ directory under that.

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/chipsets/

Let's get this figured out before other users start correlating their
problems with whatever this is.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |