From owner-freebsd-scsi@FreeBSD.ORG  Fri Aug  7 19:42:39 2009
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0DDFA1065673;
	Fri,  7 Aug 2009 19:42:39 +0000 (UTC)
	(envelope-from morganw@chemikals.org)
Received: from warped.bluecherry.net (unknown [IPv6:2001:440:eeee:fffb::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 7C9FC8FC31;
	Fri,  7 Aug 2009 19:42:38 +0000 (UTC)
Received: from volatile.chemikals.org (adsl-67-112-194.shv.bellsouth.net
	[98.67.112.194])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by warped.bluecherry.net (Postfix) with ESMTPSA id 94BCD8E48207;
	Fri,  7 Aug 2009 14:42:36 -0500 (CDT)
Received: from localhost (morganw@localhost [127.0.0.1])
	by volatile.chemikals.org (8.14.3/8.14.3) with ESMTP id n77JgNbq042879; 
	Fri, 7 Aug 2009 14:42:23 -0500 (CDT)
	(envelope-from morganw@chemikals.org)
Date: Fri, 7 Aug 2009 14:42:22 -0500 (CDT)
From: Wes Morgan <morganw@chemikals.org>
To: Artem Belevich <artemb@gmail.com>
In-Reply-To: <ed91d4a80908071106l3951f384r3fa845eda2fcb0d3@mail.gmail.com>
Message-ID: <alpine.BSF.2.00.0908071438540.95674@ibyngvyr.purzvxnyf.bet>
References: <ed91d4a80908071106l3951f384r3fa845eda2fcb0d3@mail.gmail.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-scsi@freebsd.org, freebsd-current@freebsd.org
Subject: Re: mpt errors - UNIT ATTENTION asc:29,0
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Aug 2009 19:42:39 -0000

On Fri, 7 Aug 2009, Artem Belevich wrote:

> Hi,
>
> I'm running 8.0-BETA2 on Asus p5BV/SAS with built-in LSI1068
> controller with 8 SATA ports. 6 of the ports hooked up to 1TB WD Green
> drives. The drives are used as a single raidz2 ZFS pool:
>
> 	NAME        STATE     READ WRITE CKSUM
> 	z2          ONLINE       0     0     0
> 	  raidz2    ONLINE       0     0     0
> 	    da1     ONLINE       0     0     0
> 	    da0     ONLINE       0     0     0
> 	    da2     ONLINE       0     0     0
> 	    da3     ONLINE       0     0     0
> 	    da4     ONLINE       0     0     0
> 	    da5     ONLINE       0     0     0
>
> I'm runing a simple stress test that copies 10GB file until it fills
> the volume and then runs "zfs scrub" on it.
>
> dd if=/dev/urandom of=/z2/f.0 bs=1m count=10240
> for f in {1..350}; do echo $f; cp f.$[$f-1] f.$f; done;
> zpool scrub z2
>
> What concerns me is that I'm periodically getting error messages from
> MPT driver. They usually start few hours after the start of the script
> and by the end of it they are happening every few minutes seemingly
> randomly on all six drives.
>
> Aug  7 10:25:32 buz kernel: mpt0: mpt_cam_event: 0x16
> Aug  7 10:25:32 buz kernel: mpt0: mpt_cam_event: 0x16
> Aug  7 10:25:32 buz kernel: (da4:mpt0:0:4:0): READ(10). CDB: 28 0 46
> 32 97 c0 0 0 80 0
> Aug  7 10:25:32 buz kernel: (da4:mpt0:0:4:0): CAM Status: SCSI Status Error
> Aug  7 10:25:32 buz kernel: (da4:mpt0:0:4:0): SCSI Status: Check Condition
> Aug  7 10:25:32 buz kernel: (da4:mpt0:0:4:0): UNIT ATTENTION asc:29,0
> Aug  7 10:25:32 buz kernel: (da4:mpt0:0:4:0): Power on, reset, or bus
> device reset occurred
> Aug  7 10:25:32 buz kernel: (da4:mpt0:0:4:0): Retrying Command (per Sense Data)
>
> ZFS scrub does not seem to report any issues so far - no checksum or
> read/write errors. WD's hard drive diagnostics tools didn't find any
> issues with te drives either.
>
> Sould somebody shed some light on why would such error happen? Is that
> some sort of hardware issue? Driver bug? Issue with compatibility
> between controller and the drives? System configuration issue (some
> sysctl/tunable needs tweaking, perhaps)?

I have that same board with 8 500gb drives in a raidz2. I used to 
be using a SATA backplane and I would see those timeouts fairly regularly 
when moving lots of data around. To eliminate the cable mess I switched to 
an SAS backplane with fanout cables and since then I have not seen the 
timeouts.