From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 15 06:24:43 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id EC179899
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 06:24:43 +0000 (UTC)
 (envelope-from radiomlodychbandytow@o2.pl)
Received: from moh2-ve2.go2.pl (moh2-ve2.go2.pl [193.17.41.200])
 by mx1.freebsd.org (Postfix) with ESMTP id 682846B7
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 06:24:43 +0000 (UTC)
Received: from moh2-ve2.go2.pl (unknown [10.0.0.200])
 by moh2-ve2.go2.pl (Postfix) with ESMTP id CA82FB010E1
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 08:24:26 +0200 (CEST)
Received: from unknown (unknown [10.0.0.108])
 by moh2-ve2.go2.pl (Postfix) with SMTP
 for <freebsd-fs@freebsd.org>; Mon, 15 Apr 2013 08:24:26 +0200 (CEST)
Received: from unknown [93.175.66.185] by poczta.o2.pl with ESMTP id IbXflU;
 Mon, 15 Apr 2013 08:24:26 +0200
Message-ID: <516B9D19.2030909@o2.pl>
Date: Mon, 15 Apr 2013 08:24:25 +0200
From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= <radiomlodychbandytow@o2.pl>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130407 Thunderbird/17.0.5
MIME-Version: 1.0
To: Jeremy Chadwick <jdc@koitsu.org>
Subject: Re: A failed drive causes system to hang
References: <51672164.1090908@o2.pl> <20130411212408.GA60159@icarus.home.lan>
 <5168821F.5020502@o2.pl> <20130412220350.GA82467@icarus.home.lan>
 <51688BA6.1000507@o2.pl> <20130413000731.GA84309@icarus.home.lan>
 <516A8646.4000101@o2.pl> <20130414192830.GA38338@icarus.home.lan>
 <20130414195211.GA39201@icarus.home.lan> <516B1315.8060408@o2.pl>
 <20130414212440.GA40325@icarus.home.lan>
In-Reply-To: <20130414212440.GA40325@icarus.home.lan>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-O2-Trust: 1, 35
X-O2-SPF: neutral
Cc: freebsd-fs@freebsd.org, support@lists.pcbsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Apr 2013 06:24:44 -0000

On 14/04/2013 23:24, Jeremy Chadwick wrote:
> On Sun, Apr 14, 2013 at 10:35:33PM +0200, Radio m?odych bandytw wrote:
>> On 14/04/2013 21:52, Jeremy Chadwick wrote:
>>> {snipping lots for brevity}
>>>
>>> On Sun, Apr 14, 2013 at 12:28:30PM -0700, Jeremy Chadwick wrote:
>>>> On Sun, Apr 14, 2013 at 12:34:46PM +0200, Radio m?odych bandytw wrote:
>>>>> Sorry. I thought just the error was important. So here you are:
>>>>> dmesg.boot:
>>>>> http://pastebin.com/LFXPusMX
>>>>
>>>> Thank you.  Please read everything I have written below before doing
>>>> anything.
>>>>
>>>> Based on this output, we can see the following:
>>>>
>>>> * AHCI is actively in use, and is a slowly-becoming-infamous ATI IXP700
>>>>   controller:
>>>>
>>>>   ahci0: <ATI IXP700 AHCI SATA controller> port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xf9fffc00-0xf9ffffff irq 19 at device 17.0 on pci0
>>>>
>>>> * The system has 3 disks attached to this controller:
>>>>
>>>>   ada0 at ahcich0 bus 0 scbus2 target 0 lun 0
>>>>   ada0: <WDC WD15EARS-22MVWB0 51.0AB51> ATA-8 SATA 2.x device
>>>>   ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
>>>>   ada0: Command Queueing enabled
>>>>   ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
>>>>   ada1 at ata0 bus 0 scbus6 target 0 lun 0
>>>>   ada1: <WDC WD15EARS-22MVWB0 51.0AB51> ATA-8 SATA 2.x device
>>>>   ada1: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
>>>>   ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
>>>>   ada2 at ata0 bus 0 scbus6 target 1 lun 0
>>>>   ada2: <ST3640323AS SD13> ATA-8 SATA 2.x device
>>>>   ada2: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
>>>>   ada2: 610480MB (1250263728 512 byte sectors: 16H 63S/T 16383C)
>>>>
>>>> Let's talk about ada0 and ada1 first.
>>>
>>> Hold up a minute -- I just noticed some key information here (see what
>>> happens with big conflated threads?), and it sheds some light on my
>>> concerns with AHCI vs. classic ata(4):
>>>
>>> ada0 -- attached to ahcich0
>>> ada1 -- attached to ata0 (presumably a "master" drive)
>>> ada2 -- attached to ata0 (presumably a "slave" drive)
>>>
>>> This is extremely confusing, because ata0 is a classic ATA controller (I
>>> can even tell from the classic ISA I/O port ranges):
>>>
>>> atapci1: <ATI IXP700/800 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0
>>> ata0: <ATA channel> at channel 0 on atapci1
>>> ata1: <ATA channel> at channel 1 on atapci1
>>>
>>> Yet the WD15EARS and ST3640323AS drives are physically SATA drives.
>>>
>>> Are you using SATA-to-IDE adapters on these two drives?
>> No.
>>>
>>> If not, this seems to indicate the motherboard and/or SATA controller
>>> is actually only binding 1 disk to AHCI, while the others are bound to
>>> the same controller operating in (possibly) "SATA Enhanced" mode.
>>>
>>> This would be the first I've ever seen of this (a controller operating
>>> in both modes simultaneously), but I have a lot more experience with
>>> Intel SATA controllers than I do AMD.
>>>
>>> I don't know why a system would do this, unless all of this can be
>>> controlled via the BIOS somehow.  What a mess.
>>>
>> I looked into BIOS and it can be controlled. 6 ports are divided into 2
>> triples and I can switch mode of each triple independently. One drive is
>> connected to one and two to the other.
>> Looks like there's a bug because both triples are set to ATA.
>> I left them like that for now.
> 
> What exact motherboard model is this?  I'd like to review the manual.
> 
>> Anyway, I got the hang again, so I can provide dmesg. I was not at the
>> computer when it happened, so there's only the last screen though...
>> pastebin.com/bjYtzPgs
> 
> Thank you.  Sadly the log snippet doesn't have timestamps but this is
> what transpired:
> 
> The log snippet you showed indicates the following:
> 
> * An NCQ-based write CDB (WRITE_FPDMA_QUEUED) was issued to the
>   ada0 drive attached to channel ahcich0 of controller ahci0, and
>   the disk or controller did not respond within 30 seconds (I'm
>   assuming PC-BSD did not change kern.cam.ada.default_timeout from
>   the default of 30 seconds)
Checked: default.
> 
> * The same request was resubmit to the controller (CAM will try
>   submission of a CDB up to 5 times (i.e. 4 retries), which is
>   controlled with kern.cam.ada.retry_count).
> 
Checked: default.
> * The AHCI controller (rather the specific channel of the AHCI
>   controller) also reported that the underlying disk/device was
>   not responding (re: "Timeout on slot X port X").  I see no
>   SERR condition.
> 
> * An ATA_IDENTIFY CDB was issued to the ada0 drive attached to
>   channel ahcich0 of controller ahci0, and this also timed out
>   after 30 seconds.  My gut feeling is that this system is
>   running smartd(8); it's possible the kernel itself could submit
>   the CDB to the drive, but in this condition/state I don't know
>   why it'd do that.
Nope, smartd doesn't run.
> 
> * Rinse lather repeat.
> 
> To me, at first glance, this looks like the ada0 disk is going
> catatonic.  The controller itself seems to be responding fine, just that
> the disk attached to ahcich0 is locking up hard.  I see no sign of an
> AHCI reset ("AHCI reset..." message) either.
> 
> So why does your system "hang" (meaning why can't you log in, why do
> applications stop working, etc.) when this happens?  Simple:
> 
> You're using ZFS for your root filesystem, as shown here:
> 
> Trying to mount root from zfs:tank1/ROOT/default []...
> 
> Your ZFS pool called tank1 consists of a raidz1 pool of 3 devices
> (more specifically partitions): ada0, ada1, and something that is
> missing.  Recap:
> 
> http://pastebin.com/D3Av7x9X
> 
> The pool is already degraded, and as you know, raidz1 can only suffer up
> to loss of one vdev (in this case a disk) before ZFS will begin behaving
> based on what the pool's "failmode" property is.
> 
> In effect, when this happens, you're down to only 1 disk: ada1, and
> that's not sufficient.  So ZFS does exactly what it should (with
> failmode=wait, the default): it waits indefinitely, hoping that things
> recover.
> 
> Because this is your root filesystem, as well as tons of other
> filesystems (including /usr, /var, /var/log, and so on):
> 
> http://pastebin.com/4sT37VqZ
> 
> ...any I/O submit to filesystems part of pool tank1 will indefinitely
> block/wait until things recover.  Except they don't recover (and that
> isn't the fault of ZFS).
> 
> I imagine if you let the system sit for roughly 5*30 seconds (see above
> for how I calculated that), you would eventually see a message on the
> console that looks something like this:
> 
> (ada0:ahcich0:0:0:0): lost device
> (ada0:ahcich0:0:0:0): removing device entry
I don't think so. I'm nearly sure it took me longer than that to write
the errors down alone. And when I discovered the system lockup it was in
such state already. The next time I can take precise time measurements.
> 
> So, the crux of your problem is:
> 
> 1. Your disks fall off the bus for reasons unknown at this time -- I'm
> still waiting on smartctl -x output for each of your disks.  Your disks
> themselves may actually be okay (I need to review the output to
> determine that) and the issue may be with SATA cabling, a faulty PSU, or
> a completely broken SATA controller or motherboard (bad traces, etc.).
> I am not going to help diagnose those problems, because the only
> reliable method is to start replacing each part, piece by piece, and see
> if the issue goes away.
> 
> 2. Your array is already degraded/broken yet you don't care to fix it.
> If the array was in decent shape and ada0 fell off the bus, things would
> still work because ada1 and ada2 would be functioning (re: raidz1).
> 
> If you were using UFS instead of ZFS for your root filesystem, you would
> still have the same issue, just that the system would kernel panic.  You
> can induce that behaviour with ZFS as well using failmode=panic.
> 
> There isn't much more for me to say.  Everything is behaving how it's
> designed, from what I can tell.
> 
> When you lose your root filesystem, you really can't expect the system
> to be in some "magical usable state".
> 
The disk is out of the array because I didn't put it back after RMA.
I RMA'd it because it used to cause precisely this kind of lockups on a
non-degraded array. And I've seen it in the installer running from an
entirely different device too.
I guess that after reproducing the issue and taking time measurements, I
should put the RMA'd drive back. I expect the problem to keep happening.
-- 
Twoje radio