From owner-freebsd-hackers Wed Jul 12 18:43:14 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id SAA12922 for hackers-outgoing; Wed, 12 Jul 1995 18:43:14 -0700 Received: from kitten.mcs.com (Kitten.mcs.com [192.160.127.90]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id SAA12915 for ; Wed, 12 Jul 1995 18:43:12 -0700 Received: from Jupiter.mcs.net (Jupiter.mcs.net [192.160.127.89]) by kitten.mcs.com (8.6.10/8.6.9) with ESMTP id UAA21863; Wed, 12 Jul 1995 20:43:07 -0500 Received: (from karl@localhost) by Jupiter.mcs.net (8.6.11/8.6.9) id UAA00551; Wed, 12 Jul 1995 20:43:04 -0500 From: Karl Denninger Message-Id: <199507130143.UAA00551@Jupiter.mcs.net> Subject: Re: SCSI disk wedge To: tom@misery.sdf.com (Tom Samplonius) Date: Wed, 12 Jul 1995 20:43:04 -0500 (CDT) Cc: karl@Mcs.Net, rgrimes@gndrsh.aac.dev.com, freebsd-hackers@FreeBSD.ORG In-Reply-To: from "Tom Samplonius" at Jul 12, 95 06:36:06 pm X-Mailer: ELM [version 2.4 PL24] Content-Type: text Content-Length: 1824 Sender: hackers-owner@FreeBSD.ORG Precedence: bulk > On Wed, 12 Jul 1995, Karl Denninger wrote: > > > This hang is only seen about once a day, and it is NOT load related. It > > happens infrequently enough that tracking it is going to be a real bitch. > > I don't see this at all on a 1742 equiped system. I have seen uptimes > of 25 days before rebooting for a hardware upgrade. I have DEC 3210 > drives though. > > It could be that one of the drives has a firware bug. This is not that > uncommon. It was reported in hackers that some Conner drives have such > problems. I also remember getting bug-fix firmware upgrades for old > Micropolis drives. > > Tom The drives on these machines are (1) less than two months old, (2) have current firmware, and (3) don't have ANY problems with BSDI. If FreeBSD is going to be a production platform then it is going to have to start behaving like one. This means that pushing things off on drive vendors is not acceptable. If you have a problem with a device, you *report it*. Silent death is never acceptable. The kernel is running in this case, but the system is hung waiting on I/O completion. I am not at all convinced this is a firmware issue. If it was then the 83 days of uptime on identically-configured BSDI machines wouldn't be happening. But they are. Those 83-day uptimes are recorded on our production NFS servers which run a much heavier disk load, with the same devices, on a different OS with no problems. -- -- Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity Modem: [+1 312 248-0900] | (shell, PPP, SLIP, leased) in Chicagoland Voice: [+1 312 248-8649] | 7 Chicagoland POPs, ISDN, 28.8, much more Fax: [+1 312 248-9865] | Email to "info@mcs.net" WWW: http://www.mcs.net ISDN - Get it here TODAY! | Home of Chicago's only FULL AP Clarinet feed!