Skip site navigation (1)Skip section navigation (2)
Date:      Wed,  8 Sep 1999 14:47:35 -0400 (EDT)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        Andrew Heybey <ath@niksun.com>
Cc:        freebsd-scsi@freebsd.org
Subject:   Re: data corruption when using aic7890
Message-ID:  <14294.42593.953415.402280@grasshopper.cs.duke.edu>
In-Reply-To: <85g10pbqs5.fsf@stiegl.niksun.com>
References:  <14293.26481.521753.519004@grasshopper.cs.duke.edu> <85g10pbqs5.fsf@stiegl.niksun.com>

next in thread | previous in thread | raw e-mail | index | archive | help

Andrew Heybey writes:
 > Andrew Gallatin <gallatin@cs.duke.edu> writes:
 > > 
 > > ##error 0 page 8228 expected [0x030241d8] saw [0x07c5b1d8]
 > > ##error 1 page 9718 expected [0x035f61f0] saw [0x072081f0]
 > > ##error 2 page 15719 expected [0x03d671c8] saw [0x016441c8]
 > > 
 > > The last 3 bytes are the offset into the page.  Since they are
 > > non-zero, at least part of the data is correct.  It seems that the
 > > corruption only occurs after the first 400 or so bytes data in a page.
 > > It seems to be happening fairly infrequently (about every 500GB of
 > > data or so).   
 > > 
 > > Most importantly, it seems to be happenening only on drives connected
 > > to the on-board U2 interfaces, so my first guess would be that we can
 > > rule out anything but a driver or hardware problem.  Eg, this machine
 > > has 2 more ST39140W drives connected to an ncr 53c875 & I've never
 > > seen any corruption on them.  Ditto for the an IDE disk connected to
 > > the on-board ide controller.
 > 
 > This sounds vaguely similar to kern/10243, except that I always saw
 > corruption at the *end* of a page.  How much data is corrupt?  Is the
 > bad data recognizable as being from elsewhere in the file?

Well, at least the first 1/2 k of the page are corruption free...
If your suggestion doesn't help, I'll modify my tool so as to reveal
more information about the corrupt data (or just switch to yours..)

With regards to your comment about network interrupts, the problem
*does* seem to get worse when we're using our Myrinet gigabit network
cards.  I'm trying to leave them out of the equation for this test
though.  I managed to achieve my corruption with essentially no
network traffic at all, just 2 other disk controllers were in
contention for the bus. (53c875 & the on-board PIIX4).

 > Try fiddling with the PCI bus latency setting in the bios (increasing
 > it).  However, the only sure solution that I found to my problem was
 > to put the disks on the regular Ultra connector and live with
 > 40MB/s.

This must be what Mike Smith was talking about when Matt Dillon ran
into corruption due to the CACHETHEN problem.  I've used pciconf
to set it to 64 (it was at 32) I'm currently re-running my test at the 
new setting.

Have you tried BIOS any upgrades?

Thanks,

Drew

------------------------------------------------------------------------------
Andrew Gallatin, Sr Systems Programmer	http://www.cs.duke.edu/~gallatin
Duke University				Email: gallatin@cs.duke.edu
Department of Computer Science		Phone: (919) 660-6590



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14294.42593.953415.402280>