Date: Wed, 8 Sep 1999 14:47:35 -0400 (EDT) From: Andrew Gallatin <gallatin@cs.duke.edu> To: Andrew Heybey <ath@niksun.com> Cc: freebsd-scsi@freebsd.org Subject: Re: data corruption when using aic7890 Message-ID: <14294.42593.953415.402280@grasshopper.cs.duke.edu> In-Reply-To: <85g10pbqs5.fsf@stiegl.niksun.com> References: <14293.26481.521753.519004@grasshopper.cs.duke.edu> <85g10pbqs5.fsf@stiegl.niksun.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Andrew Heybey writes: > Andrew Gallatin <gallatin@cs.duke.edu> writes: > > > > ##error 0 page 8228 expected [0x030241d8] saw [0x07c5b1d8] > > ##error 1 page 9718 expected [0x035f61f0] saw [0x072081f0] > > ##error 2 page 15719 expected [0x03d671c8] saw [0x016441c8] > > > > The last 3 bytes are the offset into the page. Since they are > > non-zero, at least part of the data is correct. It seems that the > > corruption only occurs after the first 400 or so bytes data in a page. > > It seems to be happening fairly infrequently (about every 500GB of > > data or so). > > > > Most importantly, it seems to be happenening only on drives connected > > to the on-board U2 interfaces, so my first guess would be that we can > > rule out anything but a driver or hardware problem. Eg, this machine > > has 2 more ST39140W drives connected to an ncr 53c875 & I've never > > seen any corruption on them. Ditto for the an IDE disk connected to > > the on-board ide controller. > > This sounds vaguely similar to kern/10243, except that I always saw > corruption at the *end* of a page. How much data is corrupt? Is the > bad data recognizable as being from elsewhere in the file? Well, at least the first 1/2 k of the page are corruption free... If your suggestion doesn't help, I'll modify my tool so as to reveal more information about the corrupt data (or just switch to yours..) With regards to your comment about network interrupts, the problem *does* seem to get worse when we're using our Myrinet gigabit network cards. I'm trying to leave them out of the equation for this test though. I managed to achieve my corruption with essentially no network traffic at all, just 2 other disk controllers were in contention for the bus. (53c875 & the on-board PIIX4). > Try fiddling with the PCI bus latency setting in the bios (increasing > it). However, the only sure solution that I found to my problem was > to put the disks on the regular Ultra connector and live with > 40MB/s. This must be what Mike Smith was talking about when Matt Dillon ran into corruption due to the CACHETHEN problem. I've used pciconf to set it to 64 (it was at 32) I'm currently re-running my test at the new setting. Have you tried BIOS any upgrades? Thanks, Drew ------------------------------------------------------------------------------ Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: gallatin@cs.duke.edu Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14294.42593.953415.402280>