From owner-freebsd-fs@FreeBSD.ORG Tue Apr 26 17:39:59 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EEFFC106566C for ; Tue, 26 Apr 2011 17:39:59 +0000 (UTC) (envelope-from conall@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 7B85F8FC12 for ; Tue, 26 Apr 2011 17:39:59 +0000 (UTC) Received: by wyf23 with SMTP id 23so835121wyf.13 for ; Tue, 26 Apr 2011 10:39:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=eZs0OxCKMhch8UOWlI1/94K5DSCIsLB0x8a6XqkB7KU=; b=g8EUQY1BGP3K2kmoeEIR1TjidbqvzlxbySqrfhgVZwJ27am/nKZ5oqu0r2GGtx9Yr4 FPl/avcvp/B0xFIrSTCeDEANRstCYEjdRB6r865jT6ApTRB6m82l3OLw8kAliGb+bk/+ 9N/yfhgYH8Dp9Qg3YFDRPTptI6U8igmM9bS4k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=mDiFoWECV30p/ZeOFCA6+gd8rBRGBOq2EOEN+wVE0QNJ98uw4sm8RO7Zrm79tz+rCq SJppp+/aMutojhT2S+JbdmVJtDckhz7IsU5pZB03HEwYLJ99fsBMfehN33dtQzKeGuAL aZnBmYH9RKtxKs3WsCYgKKSjsk/csy4qb+8lY= MIME-Version: 1.0 Received: by 10.216.46.21 with SMTP id q21mr4909663web.113.1303839598246; Tue, 26 Apr 2011 10:39:58 -0700 (PDT) Sender: conall@gmail.com Received: by 10.216.48.18 with HTTP; Tue, 26 Apr 2011 10:39:58 -0700 (PDT) In-Reply-To: <20110426134903.GA62578@icarus.home.lan> References: <20110426134903.GA62578@icarus.home.lan> Date: Tue, 26 Apr 2011 18:39:58 +0100 X-Google-Sender-Auth: mHj2Qk83XYWE7eeqnfyin_ytkSY Message-ID: From: "Conall O'Brien" To: Jeremy Chadwick Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: Problems Terminating zpool scrub... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Apr 2011 17:40:00 -0000 On 26 April 2011 14:49, Jeremy Chadwick wrote: > On Tue, Apr 26, 2011 at 02:25:00PM +0100, Conall O'Brien wrote: >> On 26 April 2011 13:15, ambrosehuang ambrose wrot= e: >> > Could you post your PR number?I was curious about the driver used by >> > West Digital Disk, cause I use >> > the WR10EARS? >> >> http://www.freebsd.org/cgi/query-pr.cgi?pr=3D156647 >> >> I chalked it up to the SATA controller, since only 2 of my 5 identical >> WD20EARS disks were reporting DMA issues. >> >> > >> > 2011/4/25 Conall O'Brien >> >> >> >> On 15 April 2011 15:59, Conall O'Brien wrote: >> >> > Hello, >> >> > >> >> > >> >> > I've got a NAS box running 8-STABLEW [1] which I'm running with 5x >> >> > Western Digital 2TB disks. >> >> > >> >> > >> >> > One of the disks was having DMA issues as reported in dmesg, so I >> >> > began the usual zfs workflow of "zpool offline pool dev", physicall= y >> >> > removing it and tried to "zpool replace pool dev" but my attempts t= o >> >> > do so fail, actually the zpool command keeps ending up in >> >> > uninterruptable wait (the D state). Before resorting to replacing t= he >> >> > disk, a zpool scrub was in progress. Now, I can't kill it using "zp= ool >> >> > scrub -s pool", it too ends up in the D state. >> >> > >> >> > >> >> > Is there another way than "zpool scrub -s pool" to terminate a scru= b >> >> > process, so I can proceed with the disk replacement. I care more ab= out >> >> > resilvering my pool before getting around to scrubbing it. >> >> > >> >> > >> >> > Thanks! >> >> > >> >> > >> >> > [1] For completeness, uname -a reports FreeBSD galvatron.taku.ie >> >> > 8.2-STABLE FreeBSD 8.2-STABLE #1: Sat Mar 19 13:18:46 UTC 2011 >> >> > root@galvatron.taku.ie:/usr/src/obj/usr/src/sys/GALVATRON ??amd64 >> >> >> >> I worked out the problem. There's a regression in one of the drivers >> >> between the kernel I was running and my previous kernel: >> >> >> >> FreeBSD galvatron.taku.ie 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #0: >> >> Wed Dec 29 04:00:27 UTC 2010 >> >> root@galvatron.taku.ie:/usr/src/obj/usr/src/sys/GALVATRON ??amd64 >> >> >> >> >> >> I'll file a PR to get it fixed. > > The PR is extremely terse/sub-part quality. =C2=A0There isn't actual evid= ence > of the problem being a driver regression. =C2=A0What needs to be provided= in > the PR: Yeah, I wasn't sure what specifics would be needed, but I wanted to open a PR and go from there. It was the first time I've run into a kernel related issue, PRs for bugs in the ports collection are so much easier to describe. > - Relevant dmesg output (pertaining to ataX and adX devices and anything > =C2=A0else seen around that time; stuff from /var/adm/messages might be m= ore > =C2=A0useful since it contains timestamps) > - Full dmesg seen during a fresh reboot > - vmstat -i > - atacontrol cap ataX (for each ataX channel. =C2=A0You can XXX out the > =C2=A0serial number if desired) > - smartctl -a /dev/adX (for each disk, be sure to label which disk > =C2=A0is associated with what data. =C2=A0You can XXX out the serial numb= er if > =C2=A0desired) > > What really needs to be shown are the actual errors themselves, and in > sequential order / with timestamps. =C2=A0"DMA errors" is too vague; I wa= nt > to assume READ_DMA48 but I cannot assume that. Now that my RAID array is healthy again, I'm happy to reboot into my suspect kernel and collect better diagnostics reports. > Next: > > I'm not sure if your system support its, but can you run the controller > in AHCI mode (BIOS setting) and load ahci.ko instead (ahci_load=3D"yes" i= n > /boot/loader.conf, your disks will change to /dev/adaX)? =C2=A0If so, thi= s > would allow you to narrow down whether or not the issue is truly a > driver problem. =C2=A0You should try this *before* attempting the below. I actually intended to convert my disks over to AHCI anyway, to facilitiate hot swapping better. I assume I can do a "zpool import" to get my ZFS pool to work using the new devices. > Try updating your source to something newer than March 19th. =C2=A0There = have > been ata(4) changes since then that might pertain to your issue. =C2=A0If= the > same issue happens on a present-day build of RELENG_8 then we can start > by trying to narrow it down to commits between, roughly, late December > 2010 to mid-March 2011. =C2=A0Since you follow RELENG_8, you will need to > follow commits. =C2=A0src/sys/dev/ata is what's relevant here, as well as= the > chipsets/ directory under that. Agreed, I probably shouldn't have left it so long between kernel rebuilds. I guess I was hoping there weren't too many changes related to my SATA controller, but that does naively assume the problem is the SATA controller driver. > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/ > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/chipsets/ > > Let's get this figured out before other users start correlating their > problems with whatever this is. Agreed. --=20 Conall O'Brien