From owner-freebsd-fs@FreeBSD.ORG Mon Apr 15 06:24:43 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EC179899 for ; Mon, 15 Apr 2013 06:24:43 +0000 (UTC) (envelope-from radiomlodychbandytow@o2.pl) Received: from moh2-ve2.go2.pl (moh2-ve2.go2.pl [193.17.41.200]) by mx1.freebsd.org (Postfix) with ESMTP id 682846B7 for ; Mon, 15 Apr 2013 06:24:43 +0000 (UTC) Received: from moh2-ve2.go2.pl (unknown [10.0.0.200]) by moh2-ve2.go2.pl (Postfix) with ESMTP id CA82FB010E1 for ; Mon, 15 Apr 2013 08:24:26 +0200 (CEST) Received: from unknown (unknown [10.0.0.108]) by moh2-ve2.go2.pl (Postfix) with SMTP for ; Mon, 15 Apr 2013 08:24:26 +0200 (CEST) Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id IbXflU; Mon, 15 Apr 2013 08:24:26 +0200 Message-ID: <516B9D19.2030909@o2.pl> Date: Mon, 15 Apr 2013 08:24:25 +0200 From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130407 Thunderbird/17.0.5 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: A failed drive causes system to hang References: <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan> <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan> <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan> <516A8646.4000101@o2.pl> <20130414192830.GA38338@icarus.home.lan> <20130414195211.GA39201@icarus.home.lan> <516B1315.8060408@o2.pl> <20130414212440.GA40325@icarus.home.lan> In-Reply-To: <20130414212440.GA40325@icarus.home.lan> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-O2-Trust: 1, 35 X-O2-SPF: neutral Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Apr 2013 06:24:44 -0000 On 14/04/2013 23:24, Jeremy Chadwick wrote: > On Sun, Apr 14, 2013 at 10:35:33PM +0200, Radio m?odych bandytw wrote: >> On 14/04/2013 21:52, Jeremy Chadwick wrote: >>> {snipping lots for brevity} >>> >>> On Sun, Apr 14, 2013 at 12:28:30PM -0700, Jeremy Chadwick wrote: >>>> On Sun, Apr 14, 2013 at 12:34:46PM +0200, Radio m?odych bandytw wrote: >>>>> Sorry. I thought just the error was important. So here you are: >>>>> dmesg.boot: >>>>> http://pastebin.com/LFXPusMX >>>> >>>> Thank you. Please read everything I have written below before doing >>>> anything. >>>> >>>> Based on this output, we can see the following: >>>> >>>> * AHCI is actively in use, and is a slowly-becoming-infamous ATI IXP700 >>>> controller: >>>> >>>> ahci0: port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xf9fffc00-0xf9ffffff irq 19 at device 17.0 on pci0 >>>> >>>> * The system has 3 disks attached to this controller: >>>> >>>> ada0 at ahcich0 bus 0 scbus2 target 0 lun 0 >>>> ada0: ATA-8 SATA 2.x device >>>> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >>>> ada0: Command Queueing enabled >>>> ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) >>>> ada1 at ata0 bus 0 scbus6 target 0 lun 0 >>>> ada1: ATA-8 SATA 2.x device >>>> ada1: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) >>>> ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) >>>> ada2 at ata0 bus 0 scbus6 target 1 lun 0 >>>> ada2: ATA-8 SATA 2.x device >>>> ada2: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) >>>> ada2: 610480MB (1250263728 512 byte sectors: 16H 63S/T 16383C) >>>> >>>> Let's talk about ada0 and ada1 first. >>> >>> Hold up a minute -- I just noticed some key information here (see what >>> happens with big conflated threads?), and it sheds some light on my >>> concerns with AHCI vs. classic ata(4): >>> >>> ada0 -- attached to ahcich0 >>> ada1 -- attached to ata0 (presumably a "master" drive) >>> ada2 -- attached to ata0 (presumably a "slave" drive) >>> >>> This is extremely confusing, because ata0 is a classic ATA controller (I >>> can even tell from the classic ISA I/O port ranges): >>> >>> atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0 >>> ata0: at channel 0 on atapci1 >>> ata1: at channel 1 on atapci1 >>> >>> Yet the WD15EARS and ST3640323AS drives are physically SATA drives. >>> >>> Are you using SATA-to-IDE adapters on these two drives? >> No. >>> >>> If not, this seems to indicate the motherboard and/or SATA controller >>> is actually only binding 1 disk to AHCI, while the others are bound to >>> the same controller operating in (possibly) "SATA Enhanced" mode. >>> >>> This would be the first I've ever seen of this (a controller operating >>> in both modes simultaneously), but I have a lot more experience with >>> Intel SATA controllers than I do AMD. >>> >>> I don't know why a system would do this, unless all of this can be >>> controlled via the BIOS somehow. What a mess. >>> >> I looked into BIOS and it can be controlled. 6 ports are divided into 2 >> triples and I can switch mode of each triple independently. One drive is >> connected to one and two to the other. >> Looks like there's a bug because both triples are set to ATA. >> I left them like that for now. > > What exact motherboard model is this? I'd like to review the manual. > >> Anyway, I got the hang again, so I can provide dmesg. I was not at the >> computer when it happened, so there's only the last screen though... >> pastebin.com/bjYtzPgs > > Thank you. Sadly the log snippet doesn't have timestamps but this is > what transpired: > > The log snippet you showed indicates the following: > > * An NCQ-based write CDB (WRITE_FPDMA_QUEUED) was issued to the > ada0 drive attached to channel ahcich0 of controller ahci0, and > the disk or controller did not respond within 30 seconds (I'm > assuming PC-BSD did not change kern.cam.ada.default_timeout from > the default of 30 seconds) Checked: default. > > * The same request was resubmit to the controller (CAM will try > submission of a CDB up to 5 times (i.e. 4 retries), which is > controlled with kern.cam.ada.retry_count). > Checked: default. > * The AHCI controller (rather the specific channel of the AHCI > controller) also reported that the underlying disk/device was > not responding (re: "Timeout on slot X port X"). I see no > SERR condition. > > * An ATA_IDENTIFY CDB was issued to the ada0 drive attached to > channel ahcich0 of controller ahci0, and this also timed out > after 30 seconds. My gut feeling is that this system is > running smartd(8); it's possible the kernel itself could submit > the CDB to the drive, but in this condition/state I don't know > why it'd do that. Nope, smartd doesn't run. > > * Rinse lather repeat. > > To me, at first glance, this looks like the ada0 disk is going > catatonic. The controller itself seems to be responding fine, just that > the disk attached to ahcich0 is locking up hard. I see no sign of an > AHCI reset ("AHCI reset..." message) either. > > So why does your system "hang" (meaning why can't you log in, why do > applications stop working, etc.) when this happens? Simple: > > You're using ZFS for your root filesystem, as shown here: > > Trying to mount root from zfs:tank1/ROOT/default []... > > Your ZFS pool called tank1 consists of a raidz1 pool of 3 devices > (more specifically partitions): ada0, ada1, and something that is > missing. Recap: > > http://pastebin.com/D3Av7x9X > > The pool is already degraded, and as you know, raidz1 can only suffer up > to loss of one vdev (in this case a disk) before ZFS will begin behaving > based on what the pool's "failmode" property is. > > In effect, when this happens, you're down to only 1 disk: ada1, and > that's not sufficient. So ZFS does exactly what it should (with > failmode=wait, the default): it waits indefinitely, hoping that things > recover. > > Because this is your root filesystem, as well as tons of other > filesystems (including /usr, /var, /var/log, and so on): > > http://pastebin.com/4sT37VqZ > > ...any I/O submit to filesystems part of pool tank1 will indefinitely > block/wait until things recover. Except they don't recover (and that > isn't the fault of ZFS). > > I imagine if you let the system sit for roughly 5*30 seconds (see above > for how I calculated that), you would eventually see a message on the > console that looks something like this: > > (ada0:ahcich0:0:0:0): lost device > (ada0:ahcich0:0:0:0): removing device entry I don't think so. I'm nearly sure it took me longer than that to write the errors down alone. And when I discovered the system lockup it was in such state already. The next time I can take precise time measurements. > > So, the crux of your problem is: > > 1. Your disks fall off the bus for reasons unknown at this time -- I'm > still waiting on smartctl -x output for each of your disks. Your disks > themselves may actually be okay (I need to review the output to > determine that) and the issue may be with SATA cabling, a faulty PSU, or > a completely broken SATA controller or motherboard (bad traces, etc.). > I am not going to help diagnose those problems, because the only > reliable method is to start replacing each part, piece by piece, and see > if the issue goes away. > > 2. Your array is already degraded/broken yet you don't care to fix it. > If the array was in decent shape and ada0 fell off the bus, things would > still work because ada1 and ada2 would be functioning (re: raidz1). > > If you were using UFS instead of ZFS for your root filesystem, you would > still have the same issue, just that the system would kernel panic. You > can induce that behaviour with ZFS as well using failmode=panic. > > There isn't much more for me to say. Everything is behaving how it's > designed, from what I can tell. > > When you lose your root filesystem, you really can't expect the system > to be in some "magical usable state". > The disk is out of the array because I didn't put it back after RMA. I RMA'd it because it used to cause precisely this kind of lockups on a non-degraded array. And I've seen it in the installer running from an entirely different device too. I guess that after reproducing the issue and taking time measurements, I should put the RMA'd drive back. I expect the problem to keep happening. -- Twoje radio