Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 14 Jul 1996 16:23:13 +0200 (MET DST)
From:      grog@lemis.de (Greg Lehey)
To:        jhs@freebsd.org
Cc:        scsi@freebsd.org
Subject:   Re: 8 * 0xFF bytes at intermittent multiples of 0x1000
Message-ID:  <199607141423.QAA22112@allegro.lemis.de>
In-Reply-To: <199606121518.RAA06093@vector.jhs.no_domain> from "Julian H. Stacey" at Jun 12, 96 05:18:52 pm

next in thread | previous in thread | raw e-mail | index | archive | help
In early June 1996, Julian H. Stacey wrote:
>
> To scsi@freebsd.org
> Cc Adaptec 1542A SCSI Adapter People, Julian Elischer.
>
> [	I last posted to +1542A owners + bugs@ ,
> 	but scsi@ now seems more appropriate than bugs@.
> 	I & some other 1542A people are most probably not on scsi@ list,
> 	so please be careful if trimming CC line.
> ]
>
> I (Julian Stacey <jhs@freebsd.org>) did a load more hardware changes & tests,
> including swapping my Adaptec 1542A for a 1542B, & swapping sd0 & sd1,
> & eventually deduced it was not my 1542A that was mis-behaving,
> 	(returning 8 * 0xFF bytes at intermittent multiples of 0x1000),
> but was one of 2 HP 97548S SCSI 1 633MB disks.
>
> Either the disk is faulty, or maybe the scsi code might not be
> allowing for some strange sequence, or some such.
>
> __HOWEVER__
> We can't dismiss it as an isolated equipment fault, as
> 	- tomppa@fidata.fi detects similar data corruptions,
> 	- scott@relay.forest.com seems to be having similar problems,
> 	  but with a 1542B,
> 	- perhaps other people are suffering similar corruption
> 	  without realising it.
>
> Partial Conclusion:
> 	1542A people can `relax',  to the extent that 1542B seems to be
> 	able to trigger the fault too (I don't have a1542C or 2940 etc)

I've just run into this same problem, but I can't confirm your
findings.  I'm putting together a machine out of old junk parts.
Currently it has a 486/66 with 16 MB and two full-height 5\(14"
drives:

(aha0:0:0): "CDC 94161-9 6226" type 0 fixed SCSI 1
sd0(aha0:0:0): Direct-Access 148MB (304605 512 byte sectors)
(aha0:1:0): "CDC 94171-9 5836" type 0 fixed SCSI 1
sd1(aha0:1:0): Direct-Access 308MB (631017 512 byte sectors)

Although these drives both claim to be CDC, the second one has a
Seagate label on it.

I installed 2.1-RELEASE on the machine from CD-ROM, and immediately
after booting lots of programs SIGSEGVed.  I compared them with the
original and found almost exactly the same symptoms you describe:
here's the result of comparing /usr/bin at a later time:

/usr/bin/cu bin/cu differ: char 40961, line 131
/usr/bin/uucp bin/uucp differ: char 32769, line 97
/usr/bin/uupick bin/uupick differ: char 32769, line 102
/usr/bin/uustat bin/uustat differ: char 32769, line 111
/usr/bin/as bin/as differ: char 81921, line 185
/usr/bin/awk bin/awk differ: char 32769, line 83
/usr/bin/bc bin/bc differ: char 32769, line 134
/usr/bin/cvs bin/cvs differ: char 212993, line 725
/usr/bin/gdb bin/gdb differ: char 475137, line 5209
/usr/bin/grep bin/grep differ: char 32771, line 107
/usr/bin/egrep bin/egrep differ: char 32771, line 107
/usr/bin/fgrep bin/fgrep differ: char 32771, line 107
(many more)

It's interesting to note how many come immediately after the first 32
KB.  In the cases I looked at, a number of bytes had been replaced by
0xff; the total size of the executable didn't change.  In most other
cases, too, the corruption was at or immediately after the beginning
of a memory page.

Another point: I've only seen this corruption on the second disk.
Considering that they're almost identical, that's interesting.  I
don't know how to explain it, except that maybe it's a coincidence.

The big difference from your experience is that I replaced the 1542A
with a 1542B, and the problems completely disappeared.  Let's look at
the other responders:

>> Date: Tue, 11 Jun 1996 16:56:50 -0400
>> From: Scott Kelly <scott@relay.forest.com>
>> To: jhs@freebsd.org
>> Subject: Adaptec 1542A Users (from 12 Apr 1996)
>>
>>
>> I seem to be having similar problems, but with a 1542B... Do you know if there
>> has been a driver update since April?

Are you sure that these are the exact problems?  What other hardware
are you running?

> For reference, I'll append parts of my <jhs> last mail:
>> Tomi Vainio <tomppa@fidata.fi>
>> Has confirmed he sees the same Adaptec 1542A SCSI adapter bug that I do.
>>
>> > I connected sd1 to my 1542A and here are results:
>> >
>> > 1. No problems if testblock is only one that generates disk activity.
>> > 2. I launched couple find processes to sd0 and at same time I
>> >    run testblock. Testblock failed only 1/10 of test runs.
>> > 3. I copied files with cp to sd1 when running testblock on
>> >    sd1. Testblock failed on every time.

Yes, I had a vague feeling that it was related to the amount of disk
activity.


>> So it looks like a generic bug in FreeBSD code:
>> 	With a 1542A (& not a 1542B, which seems OK),
>> 	In simultaneous multiple task write mode to sd1 (or 2 or 3 or 4),
>> 	At random multiples of 0x1000 bytes,
>> 	The first 8 bytes of a block get forced to 0xFF.
>> (Of course it may well be that FreeBSD code is not `in error' but merely
>> doesnt allow for some wart in the 1542A, that's fixed in the 1542B,
>> but whatever, we need a fix).
>
> As above in this mail, I think I'm wrong there, it's not 1542A sepcific,
> I get it with 2 different 1542B's as well

Do you have 1542Bs with which you don't get it?

When I get a bit of time, I intend to install BSD/OS on the same
configuration and see if it has the same problems.

Greg



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199607141423.QAA22112>