From owner-freebsd-stable@FreeBSD.ORG  Sat May 15 20:39:17 2010
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EB60B106564A
	for <freebsd-stable@freebsd.org>; Sat, 15 May 2010 20:39:17 +0000 (UTC)
	(envelope-from pieter@os3.nl)
Received: from mail.thelostparadise.com (router.thelostparadise.com
	[IPv6:2a02:898:0:30::30:1])
	by mx1.freebsd.org (Postfix) with ESMTP id 469E98FC0A
	for <freebsd-stable@freebsd.org>; Sat, 15 May 2010 20:39:17 +0000 (UTC)
Received: by mail.thelostparadise.com (Postfix, from userid 127)
	id 25E2C73061; Sat, 15 May 2010 22:39:16 +0200 (CEST)
Received: from localhost by mail.thelostparadise.com (Postfix) with ESMTP id
	854B573038; Sat, 15 May 2010 22:39:15 +0200 (CEST)
Message-ID: <4BEF066F.3090703@os3.nl>
Date: Sat, 15 May 2010 22:39:11 +0200
From: Pieter de Boer <pieter@os3.nl>
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4BED8B89.6010901@os3.nl> <20100514195346.GA8977@icarus.home.lan>
	<4BEDBC08.2040002@os3.nl> <20100514224236.GA11680@icarus.home.lan>
	<4BEE476B.6020407@os3.nl> <20100515162624.GA39585@icarus.home.lan>
In-Reply-To: <20100515162624.GA39585@icarus.home.lan>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@freebsd.org
Subject: Re: Read / write timeouts on SATA disks connected to ICH9
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 15 May 2010 20:39:18 -0000

Hi,

<SNIP: disk without errors timing out>
> That could be caused by a multitude of other known things.  For
> example, some Western Digital "Green" drives (including the
> Enterprise class ones) are known to perform head parking/offloading
> excessively, which could result in the drive spending more time doing
> that than actually serving overall I/O requests.  There are some
> other reports of Samsung Spinpoint drives experiencing other issues
> (I've since forgotten and would have to dig up the threads).

> If you could provide full SMART stats for that drive, it might help.
Attached the SMART output of both disks I replaced about a month ago. It
appears I replaced perfectly fine drives with the current disks with
errors ;(  One of the old disks is in a USB-enclosure now, so 'da0'.

<SNIP: enabling TLER>
> Yes, it's a DOS-based utility (like most firmware upgraders these
> days). I can provide it if you'd like.  I've been meaning to spend
> some time trying to reverse-engineer the binary to figure out what
> ATA commands it sends to the disk to toggle/adjust the feature (so
> that one could do it in real-time rather than have to boot into DOS).
> 
I'd like to try that tool. Since the old WD disks are now lying around
at home, I have some time to get a DOS boot working to try it out. A
FreeBSD-implementation of the WD tool and possibly other brands would be
really useful indeed.

>> At a certain point in time I had read errors from specific LBA's on
>>  ad4. Using dd I was able to pinpoint those to single sectors.

> This isn't very effective (dd will read large chunks/amounts of data 
> (read: multiple LBAs) from the underlying disk at once, rather than
> the disk itself performing a per-LBA test).  My opinion is that the
> "dd method" should only be used on drives which don't support
> selective LBA scanning via SMART.
Will dd read multiple LBAs even when using 'bs=512'? The process I used
was reading using bs=8192, then zooming in on the LBA's mentioned in
the errors in dmesg with bs=512 to find the actual LBA.

A selective scan on ad4 did not reveal any errors today: it 'completed 
without error'. On ad6 it's a whole lot slower; at the time of writing 
it's at 2/3.

> All HD vendors have their own quirks/ordeals right now.  You
> basically just have to go with one who works wells for you, then if
> things start going downhill, switch to another.  None of them are
> perfect.
I figured as much. What irritates though is that I've had consistent 
problems with 4 disks in this specific system, but not (such) issues 
with any other disk in other systems I've had. I generally replace disks 
when I grow out of them, not because they break down.

> What this indicates to me is that if a disk falls off the bus on an
> ICH9 controller in Enhanced (non-AHCI) mode, FreeBSD starts seeing an
> absurd number of interrupts generated from the ICH9.  My guess is
> FreeBSD isn't doing something correctly with the controller when this
> happens; maybe certain commands aren't being sent back to the
> controller or handling of certain events are being done improperly
> when it comes to ICH9 (or possibly earlier ICH revisions too).  This
> should be *very* easy to reproduce.

Unfortunately I'm not really in a position to help reproducing this or 
testing possible fixes; downtime is currently very unwelcome. Although 
one of the previous disks indeed fell of the bus entirely (couldn't get 
it back with atacontrol either), that hasn't happened again so far. I 
only see timeouts (and a few days ago read errors on ad4) which gmirror 
doesn't like. I guess those aren't that simple to reproduce (apart from 
on my system ;).

> If you see any of your disks on the ICH9 controller fall off the bus
> or report ATA errors (doesn't matter what kind), please make note of
> the timestamp (should be in the kernel log), and ASAP run "smartctl
> -a" on the disk.  You should compare attributes before and after the
> event.
> You might also want to consider using smartd, which can log SMART 
> attribute changes on its own.  Note that you might have to tune the 
> arguments in smartd.conf to ignore some attributes which fluctuate 
> naturally (such as drive temperature and seek error rate).

I've configured smartd to poll both disks every 5 minutes. I -think- the 
issues happen specifically under load: the periodic scripts of the host 
and its 4 jails appear to trigger it sometimes. At that time I'm 
normally trying to get some sleep, so smartd will have to do for now. 
Although I'll run a "smartctl -a" asap anyway.

-- 
Pieter