Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Jun 1996 17:18:52 +0200 (MET DST)
From:      "Julian H. Stacey" <jhs@freebsd.org>
To:        scsi@freebsd.org
Cc:        fabio@cesar.unicamp.br, fty@mcnc.org, gcrutchr@nightflight.com, j@uriah.heep.sax.de, jc@irbs.com, julian@freebsd.org, kuku@gilberto.physik.rwth-aachen.de, lehey.pad@sni.de, mrm@Sceard.com, nikm@ixa.net, tomppa@fidata.fi, wilko@yedi.iaf.nl, Scott Kelly <scott@relay.forest.com>, jhs@freebsd.org
Subject:   8 * 0xFF bytes at intermittent multiples of 0x1000
Message-ID:  <199606121518.RAA06093@vector.jhs.no_domain>

next in thread | raw e-mail | index | archive | help
To scsi@freebsd.org
Cc Adaptec 1542A SCSI Adapter People, Julian Elischer.

[	I last posted to +1542A owners + bugs@ ,
	but scsi@ now seems more appropriate than bugs@.
	I & some other 1542A people are most probably not on scsi@ list,
	so please be careful if trimming CC line.
]

I (Julian Stacey <jhs@freebsd.org>) did a load more hardware changes & tests,
including swapping my Adaptec 1542A for a 1542B, & swapping sd0 & sd1,
& eventually deduced it was not my 1542A that was mis-behaving,
	(returning 8 * 0xFF bytes at intermittent multiples of 0x1000),
but was one of 2 HP 97548S SCSI 1 633MB disks.

Either the disk is faulty, or maybe the scsi code might not be
allowing for some strange sequence, or some such.

__HOWEVER__
We can't dismiss it as an isolated equipment fault, as 
	- tomppa@fidata.fi detects similar data corruptions,
	- scott@relay.forest.com seems to be having similar problems, 
	  but with a 1542B,
	- perhaps other people are suffering similar corruption
	  without realising it.

Partial Conclusion:
	1542A people can `relax',  to the extent that 1542B seems to be
	able to trigger the fault too (I don't have a1542C or 2940 etc)

I've written a test program: testblock/ .c & .1 under my web page ~jhs/src/ .
It merely reads & writes a large file in user mode, you don't need to be root
& it does nothing nasty (except it will fill your file system with a single
very large file, if you dont use `-l number_of_bytes' )

The previous owner of my disk was also a skilled FreeBSD person, 
he wasn't aware of a problem, (& I trust him on that ! :-) so it appears
either my disk went bad when he transferred it to me, or the fault was always
there, but that as it does not materialise too often, he didn't notice 
corruption caused.

This could be a good reason for you to run my testblock.c even if you think
you have no problem - think of it as a free disk check, that doesnt disrupt,
no need to run dos, drop to debugger, be root, repartition, backup the file
system or any other hastle :-).

I don't have a scsi analyser equipment, & the bug does not cause a crash,
I don't know what more I can do, except what I already am:
	(treating the disc as a backup with corruption guaranteed,
	 but still usefull if another disc has a hard crash ! )
If any scsi people send me test code I'll happily compile & run it.
The system exhibiting the phenomena is 2.1-Rel, but I have current src/
here too, & can easily cross compile & run a current kernel instead.

------------

> Date: Tue, 11 Jun 1996 16:56:50 -0400
> From: Scott Kelly <scott@relay.forest.com>
> To: jhs@freebsd.org
> Subject: Adaptec 1542A Users (from 12 Apr 1996)
> 
> 
> I seem to be having similar problems, but with a 1542B... Do you know if there
> has been a driver update since April? 

Don't know,
	cd /sys ; find . -type f -print | xargs grep 1542 
I guess a scsi person might want to try:
	vi -c/1542 i386/conf/LINT i386/isa/aha1542.c i386/isa/isa.h \
		scsi/README scsi/sd.c pci/ncr.c

> I'm running 2.1...
Me too (on the box in question).

------------

For reference, I'll append parts of my <jhs> last mail:
> Tomi Vainio <tomppa@fidata.fi>
> Has confirmed he sees the same Adaptec 1542A SCSI adapter bug that I do.
> 
> > I connected sd1 to my 1542A and here are results:
> > 
> > 1. No problems if testblock is only one that generates disk activity.
> > 2. I launched couple find processes to sd0 and at same time I
> >    run testblock. Testblock failed only 1/10 of test runs.
> > 3. I copied files with cp to sd1 when running testblock on
> >    sd1. Testblock failed on every time.
> > 
> >   Tomppa
........
> > 
> > ../testblock -v -l 10000000 /v/fish
> > ../testblock: Neither -w or -r specified, so will both write then read.
> > Using a block size of 61440, to a limit of 10000000.
> > ../testblock writing then reading /v/fish.
> > ../testblock: Started rewinding /v/fish.
> > ../testblock: Finished rewinding /v/fish.
> > ../testblock: In /v/fish, data mismatch at byte 49153 (0xc001), after 0 (0x0) previously checked ok.
> > Byte read 255, byte expected 0
> > ../testblock: With /v/fish, only checked 0 bytes, 10,014,720 failed.
> > ../testblock: Finished.
......
> 
> So it looks like a generic bug in FreeBSD code:
> 	With a 1542A (& not a 1542B, which seems OK),
> 	In simultaneous multiple task write mode to sd1 (or 2 or 3 or 4),
> 	At random multiples of 0x1000 bytes,
> 	The first 8 bytes of a block get forced to 0xFF.
> (Of course it may well be that FreeBSD code is not `in error' but merely
> doesnt allow for some wart in the 1542A, that's fixed in the 1542B,
> but whatever, we need a fix).

As above in this mail, I think I'm wrong there, it's not 1542A sepcific,
I get it with 2 different 1542B's as well

> Those who have not yet proven this on their system might like to try something
> like this:
> 	sync ; echo maybe even dump sd1 to tape # See below
>         cd <<<sd1_mount_point>>>/tmp
>         testblock -l 10000000 rubbish1 &
>         testblock -l 10000000 rubbish2 &
>         testblock -l 10000000 rubbish3 &
>         & do some other sd0 to sd1 copying in parallel.
>         Then run my 8f on all the data files youve run.
>         
> Remember if you have a swap partition on sd1, & you swapped,
> the swap may be damaged so you might crash.
> If you'r really unlucky, while the system is creating new inodes for the 
> rubbish files, & is manipulating the file system, 8 bytes (out of several 0x1000)
> bytes of file system structure data may get mangled.
> 
> I have supplied CC readers with testblock.c & 8f.c,
> for others interested, I'll toss them in http://www.freebsd.org/~jhs/src/

Since Done.

Julian
--
Julian H. Stacey	jhs@freebsd.org  	http://www.freebsd.org/~jhs/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199606121518.RAA06093>