Date: Sat, 03 Aug 1996 15:38:03 +0200 From: "Julian H. Stacey" <jhs@freebsd.org> To: grog@lemis.de (Greg Lehey) Cc: scsi@freebsd.org, fabio@cesar.unicamp.br, fty@mcnc.org, gcrutchr@nightflight.com, j@uriah.heep.sax.de, jc@irbs.com, julian@freebsd.org, kuku@gilberto.physik.rwth-aachen.de, mrm@Sceard.com, nikm@ixa.net, tomppa@fidata.fi, wilko@yedi.iaf.nl, Scott Kelly <scott@relay.forest.com> Subject: Re: 8 * 0xFF bytes at intermittent multiples of 0x1000 Message-ID: <199608031338.PAA01488@vector.jhs.no_domain> In-Reply-To: Your message of "Sun, 14 Jul 1996 16:23:13 %2B0200." <199607141423.QAA22112@allegro.lemis.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, Reference: > From: grog@lemis.de (Greg Lehey) > Date: Sun, 14 Jul 1996 16:23:13 +0200 (MET DST) > > In early June 1996, Julian H. Stacey wrote: > > > > To scsi@freebsd.org > > Cc Adaptec 1542A SCSI Adapter People, Julian Elischer. > > > > [ I last posted to +1542A owners + bugs@ , > > but scsi@ now seems more appropriate than bugs@. > > I & some other 1542A people are most probably not on scsi@ list, > > so please be careful if trimming CC line. > > ] > > > > I (Julian Stacey <jhs@freebsd.org>) did a load more hardware changes & test - s, > > including swapping my Adaptec 1542A for a 1542B, & swapping sd0 & sd1, > > & eventually deduced it was not my 1542A that was mis-behaving, > > (returning 8 * 0xFF bytes at intermittent multiples of 0x1000), > > but was one of 2 HP 97548S SCSI 1 633MB disks. > > > > Either the disk is faulty, or maybe the scsi code might not be > > allowing for some strange sequence, or some such. > > > > __HOWEVER__ > > We can't dismiss it as an isolated equipment fault, as > > - tomppa@fidata.fi detects similar data corruptions, > > - scott@relay.forest.com seems to be having similar problems, > > but with a 1542B, > > - perhaps other people are suffering similar corruption > > without realising it. > > > > Partial Conclusion: > > 1542A people can `relax', to the extent that 1542B seems to be > > able to trigger the fault too (I don't have a1542C or 2940 etc) > > I've just run into this same problem, but I can't confirm your > findings. I wasn't clear which findings you can't confirm, so I read ahead, & conclude you mean you can't confirm my disc hardware error suspicion; I conclude you suspect software error, like I used to ? > I'm putting together a machine out of old junk parts. > > Currently it has a 486/66 with 16 MB and two full-height 5\(14" > drives: > > (aha0:0:0): "CDC 94161-9 6226" type 0 fixed SCSI 1 > sd0(aha0:0:0): Direct-Access 148MB (304605 512 byte sectors) > (aha0:1:0): "CDC 94171-9 5836" type 0 fixed SCSI 1 > sd1(aha0:1:0): Direct-Access 308MB (631017 512 byte sectors) > > Although these drives both claim to be CDC, the second one has a > Seagate label on it. My good drive is: "HP 97548S 8928" type 0 fixed SCSI 1 Direct-Access 633MB (1296512 512 byte sectors) My flaky drive is: "HP 97548S C023" type 0 fixed SCSI 1 Direct-Access 633MB (1296512 512 byte sectors) (`good` & `flaky` being independent of 1542A or 1542B, also independent of sd0 & sd1 physical allocation, also independent of whether running 2.0.5 Rel or 2.1.0 Rel ) > I installed 2.1-RELEASE on the machine from CD-ROM, and immediately > after booting lots of programs SIGSEGVed. I compared them with the > original and found almost exactly the same symptoms you describe: > here's the result of comparing /usr/bin at a later time: > > /usr/bin/cu bin/cu differ: char 40961, line 131 > /usr/bin/uucp bin/uucp differ: char 32769, line 97 > /usr/bin/uupick bin/uupick differ: char 32769, line 102 > /usr/bin/uustat bin/uustat differ: char 32769, line 111 > /usr/bin/as bin/as differ: char 81921, line 185 > /usr/bin/awk bin/awk differ: char 32769, line 83 > /usr/bin/bc bin/bc differ: char 32769, line 134 > /usr/bin/cvs bin/cvs differ: char 212993, line 725 > /usr/bin/gdb bin/gdb differ: char 475137, line 5209 > /usr/bin/grep bin/grep differ: char 32771, line 107 > /usr/bin/egrep bin/egrep differ: char 32771, line 107 > /usr/bin/fgrep bin/fgrep differ: char 32771, line 107 > (many more) > > It's interesting to note how many come immediately after the first 32 > KB. In the cases I looked at, a number of bytes had been replaced by > 0xff; the total size of the executable didn't change. In most other > cases, too, the corruption was at or immediately after the beginning > of a memory page. Ah ! new perspective :-) i'd been thinking only in times of disc PCB ICs, & size of on disc card buffer chips. > Another point: I've only seen this corruption on the second disk. Yes that's what I first saw, but then, observations changed, can't explain that ! > Considering that they're almost identical, that's interesting. I > don't know how to explain it, except that maybe it's a coincidence. > > The big difference from your experience is that I replaced the 1542A > with a 1542B, and the problems completely disappeared. Let's look at > the other responders: > > >> Date: Tue, 11 Jun 1996 16:56:50 -0400 > >> From: Scott Kelly <scott@relay.forest.com> > >> To: jhs@freebsd.org > >> Subject: Adaptec 1542A Users (from 12 Apr 1996) > >> > >> > >> I seem to be having similar problems, but with a 1542B... Do you know if t - here > >> has been a driver update since April? > > Are you sure that these are the exact problems? What other hardware > are you running? > > > For reference, I'll append parts of my <jhs> last mail: > >> Tomi Vainio <tomppa@fidata.fi> > >> Has confirmed he sees the same Adaptec 1542A SCSI adapter bug that I do. > >> > >> > I connected sd1 to my 1542A and here are results: > >> > > >> > 1. No problems if testblock is only one that generates disk activity. > >> > 2. I launched couple find processes to sd0 and at same time I > >> > run testblock. Testblock failed only 1/10 of test runs. > >> > 3. I copied files with cp to sd1 when running testblock on > >> > sd1. Testblock failed on every time. > > Yes, I had a vague feeling that it was related to the amount of disk > activity. > > > >> So it looks like a generic bug in FreeBSD code: > >> With a 1542A (& not a 1542B, which seems OK), > >> In simultaneous multiple task write mode to sd1 (or 2 or 3 or 4), > >> At random multiples of 0x1000 bytes, > >> The first 8 bytes of a block get forced to 0xFF. > >> (Of course it may well be that FreeBSD code is not `in error' but merely > >> doesnt allow for some wart in the 1542A, that's fixed in the 1542B, > >> but whatever, we need a fix). > > > > As above in this mail, I think I'm wrong there, it's not 1542A sepcific, > > I get it with 2 different 1542B's as well > > Do you have 1542Bs with which you don't get it? No, I only have 2 1542Bs & 1 A, all show error on same drive. > When I get a bit of time, I intend to install BSD/OS on the same > configuration and see if it has the same problems. Let us know your further deductions from that please :-) > Greg I used to feel I had found a bug in the driver, but now tend to view my problem here as a bad disc, but its worrying when I hear you observe the same things I do, & others see similar things too ! I have NETBSD src/ here (but no bins & no OS-BSD), but not much time, & anyway seem to recall Julian Elischer wrote scsi for both Net & Free, so if Free & Net are resumably similar scsi code, it'd be a less meaningful test than you trying OS BSD on your system). Anyone else who even just suspects misbehaving discs, is welcome to a copy of my testblock.c & .man (it runs in user not root mode, & wont destroy your file systems & data :-) Julian -- Julian H. Stacey jhs@freebsd.org http://www.freebsd.org/~jhs/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199608031338.PAA01488>