Date: Thu, 13 Jul 1995 08:33:54 -0700 From: "Jordan K. Hubbard" <jkh@time.cdrom.com> To: Karl Denninger <karl@Mcs.Net> Cc: tom@misery.sdf.com (Tom Samplonius), rgrimes@gndrsh.aac.dev.com, freebsd-hackers@FreeBSD.ORG Subject: Re: SCSI disk wedge Message-ID: <4397.805649634@time.cdrom.com> In-Reply-To: Your message of "Thu, 13 Jul 1995 10:15:24 CDT." <199507131515.KAA01784@Jupiter.mcs.net>
next in thread | previous in thread | raw e-mail | index | archive | help
> There are THREE machines involved in our testing: > [details elided] Karl, Thanks for the synopsis! I think that at this point it'd probably be a good idea if the armchair generals sort of calmed down a bit and let the core team look into this for a little while. I've seen Karl spending a lot of time responding to various weird and often pointless suggestions in this thread and I'm sure he's got things to do that he'd much rather be doing. Match this with a high probability that Karl's right and this *is* some lurking bogon in the SCSI code and I see a good argument for curtailing this discussion for the time being, at least until we can have a chance to talk with the guys who WROTE the code and see if the reported symtoms suggest anything. This may indeed turn out to be some sort of really obscure hardware problem, but it's still not much help to have people yelling "check the tires! look in the carburator!" at someone who's car is stalled on the freeway.. One trained mechanic working in peace can generally accomplish a lot more in such cases than 12 helpful bystanders.. :-) Jordan > > 1) A new system which has a 2742 and four Micropolis 2G disks. This > one has tagged queueing enabled right now, and runs for anywhere > from an hour to four days before locking up. When it locks, it is > with a message about timeouts in the SCSI driver. This machine > *cannot* be tested with BSDI, as the 2742 is not supported. > > 2) A second system, new, with the standard BSDI configuration we run > here -- 1742A/Seagate Hawk 1G disk, 64MB RAM, ASUS P90 EISA/PCI > motherboard. This one freezes within 24 hours with no messages of > any kind. Its definitely the SCSI system, however, as the kernel IS > running (I can ping it, telnet to it -- no login prompt, obviously > -- hit CPU-only things that already have connected sockets, etc.) > > 3) A THIRD system, which USED to run BSDI 1.x and 2.x for more than > 6 months, identical to system #2 above in configuration. Same > response as #2 as well. > > Note that #2 and #3 only HAVE one disk attached, so removing a "few" of them > won't be very useful. #1 has four disks on it, but before we added the > other three, it showed the same behavior with only one drive. The disk > which goes offline first on #1 is random (no pattern detectable). > > Note that the hang happens under ALL load conditions. I have had it happen > when reading news over NFS (which has nearly no local disk activity), when > sitting at the shell prompt, when pounding the hell out of the drives, etc. > > It *looks* like something is trashing the adapter's idea of the world and it > is wedging tight in response. That's a guess, as I don't have a bus probe, > but note that the hangs happen with the SCSI bus activity light *OFF*. Most > of the Adaptec problems I've seen with termination and the like wedge with > the light *ON*. > > I can surmise that the following aren't at issue: > > 1) The SCSI bus itself. System #3 was in production for almost a year > with BSDI 2.x and 1.x before it was reloaded, and it has NEVER had > disk related problems of any kind. > > 2) The disks. Ditto, as BSDI 2.x does do scatter-gather and clustered > I/O, which pounds the heck out of the disk subsystem. No problems. > > 3) The adapter. Again, system #3 was in production for an extended > period without ANY trouble. > > 4) The CPU, RAM, or other adapters in the system. See above. > > Now, we beat the hell out of both our hardware and software here, in ways > which few, if any, other firms and locations do. For that reason we > frequently find problems in both hardware and software that others miss. > > -- > -- > Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity > Modem: [+1 312 248-0900] | (shell, PPP, SLIP, leased) in Chicagoland > Voice: [+1 312 248-8649] | 7 Chicagoland POPs, ISDN, 28.8, much more > Fax: [+1 312 248-9865] | Email to "info@mcs.net" WWW: http://www.mcs.ne t > ISDN - Get it here TODAY! | Home of Chicago's only FULL AP Clarinet feed!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4397.805649634>
