From owner-freebsd-scsi  Wed Sep  8  4:14:43 1999
Delivered-To: freebsd-scsi@freebsd.org
Received: from arjun.niksun.com (gw.niksun.com [206.20.52.122])
	by hub.freebsd.org (Postfix) with ESMTP id 6063E15B88
	for <freebsd-scsi@freebsd.org>; Wed,  8 Sep 1999 04:14:29 -0700 (PDT)
	(envelope-from ath@niksun.com)
Received: from stiegl.niksun.com (stiegl.niksun.com [10.0.0.44])
	by arjun.niksun.com (8.8.8/8.8.8) with ESMTP id HAA15711;
	Wed, 8 Sep 1999 07:12:43 -0400 (EDT)
Received: (from ath@localhost)
	by stiegl.niksun.com (8.9.2/8.8.7) id HAA68752;
	Wed, 8 Sep 1999 07:12:43 -0400 (EDT)
	(envelope-from ath)
To: Andrew Gallatin <gallatin@cs.duke.edu>
Subject: Re: data corruption when using aic7890
Cc: freebsd-scsi@freebsd.org
References: <14293.26481.521753.519004@grasshopper.cs.duke.edu>
From: Andrew Heybey <ath@niksun.com>
Date: 08 Sep 1999 07:12:42 -0400
In-Reply-To: Andrew Gallatin's message of "Tue,  7 Sep 1999 16:00:38 -0400 (EDT)"
Message-ID: <85g10pbqs5.fsf@stiegl.niksun.com>
Lines: 49
X-Mailer: Gnus v5.5/XEmacs 20.4 - "Emerald"
Sender: owner-freebsd-scsi@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Andrew Gallatin <gallatin@cs.duke.edu> writes:

> Hi,
> 
> I have a bunch of ASUS P2B-LS motherboards with on-board AIC7890 U2
> controllers.   I'm running a kernel with rev 1.20 of
> src/sys/pci/ahc_pci.c (eg, after the CACHETHEN fix).
> 
> When I run a local data-integrity checking program, I'm seeing
> occasional data corruption on Seagate ST39140W drives connected to the
> on-board U2 controller.  This program writes a known pattern of data
> into a variable size (512MB in this case) file on disk & reads it back
> over & over again.  If it encounters corruption, it reports just the
> first word where corruption was found & then skips to the next page,
> so its hard to tell how complete the corruption is.  We see things
> like this:
> 
> ##error 0 page 8228 expected [0x030241d8] saw [0x07c5b1d8]
> ##error 1 page 9718 expected [0x035f61f0] saw [0x072081f0]
> ##error 2 page 15719 expected [0x03d671c8] saw [0x016441c8]
> 
> The last 3 bytes are the offset into the page.  Since they are
> non-zero, at least part of the data is correct.  It seems that the
> corruption only occurs after the first 400 or so bytes data in a page.
> It seems to be happening fairly infrequently (about every 500GB of
> data or so).   
> 
> Most importantly, it seems to be happenening only on drives connected
> to the on-board U2 interfaces, so my first guess would be that we can
> rule out anything but a driver or hardware problem.  Eg, this machine
> has 2 more ST39140W drives connected to an ncr 53c875 & I've never
> seen any corruption on them.  Ditto for the an IDE disk connected to
> the on-board ide controller.

This sounds vaguely similar to kern/10243, except that I always saw
corruption at the *end* of a page.  How much data is corrupt?  Is the
bad data recognizable as being from elsewhere in the file?

Try fiddling with the PCI bus latency setting in the bios (increasing
it).  However, the only sure solution that I found to my problem was
to put the disks on the regular Ultra connector and live with
40MB/s.

I have seen the problem with both IBM DRVS drives and Seagate
ST39102LW.  I have also seen the problem on the P2B-LS and a
Supermicro motherboard with on-board AIC7890.

andrew


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message