Date: Thu, 13 Jul 1995 10:15:24 -0500 (CDT) From: Karl Denninger <karl@Mcs.Net> To: tom@misery.sdf.com (Tom Samplonius) Cc: karl@Mcs.Net, rgrimes@gndrsh.aac.dev.com, freebsd-hackers@FreeBSD.ORG Subject: Re: SCSI disk wedge Message-ID: <199507131515.KAA01784@Jupiter.mcs.net> In-Reply-To: <Pine.BSF.3.91.950712191425.1872B-100000@misery.sdf.com> from "Tom Samplonius" at Jul 12, 95 07:28:52 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > > On Wed, 12 Jul 1995, Karl Denninger wrote: > > > If FreeBSD is going to be a production platform then it is going to have to > > start behaving like one. This means that pushing things off on drive > > vendors is not acceptable. > > Ok, I just needed to be convinced :) (I fondly remember the story about > someone complaing that FreeBSD wouldn't work on their system but Linux > did, so he went and re-installed Linux and would it didn't work either!) > The 1742 driver has been around for a long time and is very similar to > the driver in NetBSD. However, the 2742/2842/2942 driver is quite > recent. It is _very_ odd that you have problems with both adapters. > > Chances are slim to none that this can be fixed if someone on the core > team can not replicate this problem. And so far, it appears that no one has. > > Since a system that locks up once a day can't be that useful, would it > be possible for you to remove some of drives for a couple of days and see > if that has any affect? > > Tom > There are THREE machines involved in our testing: 1) A new system which has a 2742 and four Micropolis 2G disks. This one has tagged queueing enabled right now, and runs for anywhere from an hour to four days before locking up. When it locks, it is with a message about timeouts in the SCSI driver. This machine *cannot* be tested with BSDI, as the 2742 is not supported. 2) A second system, new, with the standard BSDI configuration we run here -- 1742A/Seagate Hawk 1G disk, 64MB RAM, ASUS P90 EISA/PCI motherboard. This one freezes within 24 hours with no messages of any kind. Its definitely the SCSI system, however, as the kernel IS running (I can ping it, telnet to it -- no login prompt, obviously -- hit CPU-only things that already have connected sockets, etc.) 3) A THIRD system, which USED to run BSDI 1.x and 2.x for more than 6 months, identical to system #2 above in configuration. Same response as #2 as well. Note that #2 and #3 only HAVE one disk attached, so removing a "few" of them won't be very useful. #1 has four disks on it, but before we added the other three, it showed the same behavior with only one drive. The disk which goes offline first on #1 is random (no pattern detectable). Note that the hang happens under ALL load conditions. I have had it happen when reading news over NFS (which has nearly no local disk activity), when sitting at the shell prompt, when pounding the hell out of the drives, etc. It *looks* like something is trashing the adapter's idea of the world and it is wedging tight in response. That's a guess, as I don't have a bus probe, but note that the hangs happen with the SCSI bus activity light *OFF*. Most of the Adaptec problems I've seen with termination and the like wedge with the light *ON*. I can surmise that the following aren't at issue: 1) The SCSI bus itself. System #3 was in production for almost a year with BSDI 2.x and 1.x before it was reloaded, and it has NEVER had disk related problems of any kind. 2) The disks. Ditto, as BSDI 2.x does do scatter-gather and clustered I/O, which pounds the heck out of the disk subsystem. No problems. 3) The adapter. Again, system #3 was in production for an extended period without ANY trouble. 4) The CPU, RAM, or other adapters in the system. See above. Now, we beat the hell out of both our hardware and software here, in ways which few, if any, other firms and locations do. For that reason we frequently find problems in both hardware and software that others miss. -- -- Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity Modem: [+1 312 248-0900] | (shell, PPP, SLIP, leased) in Chicagoland Voice: [+1 312 248-8649] | 7 Chicagoland POPs, ISDN, 28.8, much more Fax: [+1 312 248-9865] | Email to "info@mcs.net" WWW: http://www.mcs.net ISDN - Get it here TODAY! | Home of Chicago's only FULL AP Clarinet feed!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199507131515.KAA01784>