Date: Tue, 26 Apr 2011 06:49:03 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Conall O'Brien <conall@conall.net> Cc: freebsd-fs@freebsd.org Subject: Re: Problems Terminating zpool scrub... Message-ID: <20110426134903.GA62578@icarus.home.lan> In-Reply-To: <BANLkTi=Jban2q6h0HEpEMhWrfr56k1O_Jw@mail.gmail.com> References: <BANLkTinYp674E=96PhMaR0%2BUy9e9B6boVA@mail.gmail.com> <BANLkTimQ4FWnC12O3cDtptJR%2BvA2PcNqYA@mail.gmail.com> <BANLkTikbPsf1d3p687RDVsaL_FO0KgKbfA@mail.gmail.com> <BANLkTi=Jban2q6h0HEpEMhWrfr56k1O_Jw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 26, 2011 at 02:25:00PM +0100, Conall O'Brien wrote: > On 26 April 2011 13:15, ambrosehuang ambrose <ambrosehua@gmail.com> wrote: > > Could you post your PR number?I was curious about the driver used by > > West Digital Disk, cause I use > > the WR10EARS? > > http://www.freebsd.org/cgi/query-pr.cgi?pr=156647 > > I chalked it up to the SATA controller, since only 2 of my 5 identical > WD20EARS disks were reporting DMA issues. > > > > > 2011/4/25 Conall O'Brien <conall@conall.net> > >> > >> On 15 April 2011 15:59, Conall O'Brien <conall@conall.net> wrote: > >> > Hello, > >> > > >> > > >> > I've got a NAS box running 8-STABLEW [1] which I'm running with 5x > >> > Western Digital 2TB disks. > >> > > >> > > >> > One of the disks was having DMA issues as reported in dmesg, so I > >> > began the usual zfs workflow of "zpool offline pool dev", physically > >> > removing it and tried to "zpool replace pool dev" but my attempts to > >> > do so fail, actually the zpool command keeps ending up in > >> > uninterruptable wait (the D state). Before resorting to replacing the > >> > disk, a zpool scrub was in progress. Now, I can't kill it using "zpool > >> > scrub -s pool", it too ends up in the D state. > >> > > >> > > >> > Is there another way than "zpool scrub -s pool" to terminate a scrub > >> > process, so I can proceed with the disk replacement. I care more about > >> > resilvering my pool before getting around to scrubbing it. > >> > > >> > > >> > Thanks! > >> > > >> > > >> > [1] For completeness, uname -a reports FreeBSD galvatron.taku.ie > >> > 8.2-STABLE FreeBSD 8.2-STABLE #1: Sat Mar 19 13:18:46 UTC 2011 > >> > root@galvatron.taku.ie:/usr/src/obj/usr/src/sys/GALVATRON ??amd64 > >> > >> I worked out the problem. There's a regression in one of the drivers > >> between the kernel I was running and my previous kernel: > >> > >> FreeBSD galvatron.taku.ie 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #0: > >> Wed Dec 29 04:00:27 UTC 2010 > >> root@galvatron.taku.ie:/usr/src/obj/usr/src/sys/GALVATRON ??amd64 > >> > >> > >> I'll file a PR to get it fixed. The PR is extremely terse/sub-part quality. There isn't actual evidence of the problem being a driver regression. What needs to be provided in the PR: - Relevant dmesg output (pertaining to ataX and adX devices and anything else seen around that time; stuff from /var/adm/messages might be more useful since it contains timestamps) - Full dmesg seen during a fresh reboot - vmstat -i - atacontrol cap ataX (for each ataX channel. You can XXX out the serial number if desired) - smartctl -a /dev/adX (for each disk, be sure to label which disk is associated with what data. You can XXX out the serial number if desired) What really needs to be shown are the actual errors themselves, and in sequential order / with timestamps. "DMA errors" is too vague; I want to assume READ_DMA48 but I cannot assume that. Next: I'm not sure if your system support its, but can you run the controller in AHCI mode (BIOS setting) and load ahci.ko instead (ahci_load="yes" in /boot/loader.conf, your disks will change to /dev/adaX)? If so, this would allow you to narrow down whether or not the issue is truly a driver problem. You should try this *before* attempting the below. Next: Try updating your source to something newer than March 19th. There have been ata(4) changes since then that might pertain to your issue. If the same issue happens on a present-day build of RELENG_8 then we can start by trying to narrow it down to commits between, roughly, late December 2010 to mid-March 2011. Since you follow RELENG_8, you will need to follow commits. src/sys/dev/ata is what's relevant here, as well as the chipsets/ directory under that. http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/ http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/chipsets/ Let's get this figured out before other users start correlating their problems with whatever this is. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110426134903.GA62578>