From owner-freebsd-scsi@FreeBSD.ORG Fri Aug 7 19:42:39 2009 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0DDFA1065673; Fri, 7 Aug 2009 19:42:39 +0000 (UTC) (envelope-from morganw@chemikals.org) Received: from warped.bluecherry.net (unknown [IPv6:2001:440:eeee:fffb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 7C9FC8FC31; Fri, 7 Aug 2009 19:42:38 +0000 (UTC) Received: from volatile.chemikals.org (adsl-67-112-194.shv.bellsouth.net [98.67.112.194]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by warped.bluecherry.net (Postfix) with ESMTPSA id 94BCD8E48207; Fri, 7 Aug 2009 14:42:36 -0500 (CDT) Received: from localhost (morganw@localhost [127.0.0.1]) by volatile.chemikals.org (8.14.3/8.14.3) with ESMTP id n77JgNbq042879; Fri, 7 Aug 2009 14:42:23 -0500 (CDT) (envelope-from morganw@chemikals.org) Date: Fri, 7 Aug 2009 14:42:22 -0500 (CDT) From: Wes Morgan To: Artem Belevich In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-scsi@freebsd.org, freebsd-current@freebsd.org Subject: Re: mpt errors - UNIT ATTENTION asc:29,0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Aug 2009 19:42:39 -0000 On Fri, 7 Aug 2009, Artem Belevich wrote: > Hi, > > I'm running 8.0-BETA2 on Asus p5BV/SAS with built-in LSI1068 > controller with 8 SATA ports. 6 of the ports hooked up to 1TB WD Green > drives. The drives are used as a single raidz2 ZFS pool: > > NAME STATE READ WRITE CKSUM > z2 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > da1 ONLINE 0 0 0 > da0 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > > I'm runing a simple stress test that copies 10GB file until it fills > the volume and then runs "zfs scrub" on it. > > dd if=/dev/urandom of=/z2/f.0 bs=1m count=10240 > for f in {1..350}; do echo $f; cp f.$[$f-1] f.$f; done; > zpool scrub z2 > > What concerns me is that I'm periodically getting error messages from > MPT driver. They usually start few hours after the start of the script > and by the end of it they are happening every few minutes seemingly > randomly on all six drives. > > Aug 7 10:25:32 buz kernel: mpt0: mpt_cam_event: 0x16 > Aug 7 10:25:32 buz kernel: mpt0: mpt_cam_event: 0x16 > Aug 7 10:25:32 buz kernel: (da4:mpt0:0:4:0): READ(10). CDB: 28 0 46 > 32 97 c0 0 0 80 0 > Aug 7 10:25:32 buz kernel: (da4:mpt0:0:4:0): CAM Status: SCSI Status Error > Aug 7 10:25:32 buz kernel: (da4:mpt0:0:4:0): SCSI Status: Check Condition > Aug 7 10:25:32 buz kernel: (da4:mpt0:0:4:0): UNIT ATTENTION asc:29,0 > Aug 7 10:25:32 buz kernel: (da4:mpt0:0:4:0): Power on, reset, or bus > device reset occurred > Aug 7 10:25:32 buz kernel: (da4:mpt0:0:4:0): Retrying Command (per Sense Data) > > ZFS scrub does not seem to report any issues so far - no checksum or > read/write errors. WD's hard drive diagnostics tools didn't find any > issues with te drives either. > > Sould somebody shed some light on why would such error happen? Is that > some sort of hardware issue? Driver bug? Issue with compatibility > between controller and the drives? System configuration issue (some > sysctl/tunable needs tweaking, perhaps)? I have that same board with 8 500gb drives in a raidz2. I used to be using a SATA backplane and I would see those timeouts fairly regularly when moving lots of data around. To eliminate the cable mess I switched to an SAS backplane with fanout cables and since then I have not seen the timeouts.