From owner-freebsd-hackers Thu Jul 13 09:09:47 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id JAA14717 for hackers-outgoing; Thu, 13 Jul 1995 09:09:47 -0700 Received: from who.cdrom.com (who.cdrom.com [192.216.222.3]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id JAA14711 for ; Thu, 13 Jul 1995 09:09:43 -0700 Received: from kitten.mcs.com (Kitten.mcs.com [192.160.127.90]) by who.cdrom.com (8.6.11/8.6.11) with ESMTP id JAA09879 for ; Thu, 13 Jul 1995 09:09:34 -0700 Received: from Jupiter.mcs.net (Jupiter.mcs.net [192.160.127.89]) by kitten.mcs.com (8.6.10/8.6.9) with ESMTP id LAA10261; Thu, 13 Jul 1995 11:07:06 -0500 Received: (from karl@localhost) by Jupiter.mcs.net (8.6.11/8.6.9) id LAA01870; Thu, 13 Jul 1995 11:07:06 -0500 From: Karl Denninger Message-Id: <199507131607.LAA01870@Jupiter.mcs.net> Subject: Re: SCSI disk wedge To: rgrimes@gndrsh.aac.dev.com (Rodney W. Grimes) Date: Thu, 13 Jul 1995 11:07:05 -0500 (CDT) Cc: karl@Mcs.Net, freebsd-hackers@FreeBSD.ORG In-Reply-To: <199507130214.TAA19888@gndrsh.aac.dev.com> from "Rodney W. Grimes" at Jul 12, 95 07:14:53 pm X-Mailer: ELM [version 2.4 PL24] Content-Type: text Content-Length: 6003 Sender: hackers-owner@FreeBSD.ORG Precedence: bulk > > The drives on these machines are (1) less than two months old, (2) have > > current firmware, and (3) don't have ANY problems with BSDI. > > Slow down... (1) new drives are often prone to firmware bugs if by > new you also mean new model. (2) good!! But the ``new'' firmware > could still have a bug in it (3) This is good, but it does not > necessarily mean the bug is in FreeBSD. We do things like > very large I/O requests through the vm system, perhaps one of your > drives does not like it when we drop a 64K I/O operation to it. This is possible, but I believe that BSDI 2.x does support it. > > Those 83-day uptimes are recorded on our production NFS servers which run a > > much heavier disk load, with the same devices, on a different OS with no > > problems. > > Same _exact_ devices, or same _model_/_pn_/_revision_/_date_code_? Same EXACT devices in one machine's case, in that the machine WAS running BSDI 2.x. and is now running FreeBSD. > I know these things: > a) You have a hang problem on a 2742 with no error message > b) You have a hang problem on a 1742 with some error before it, but > I did not see any error in your mail. > c) You are using Seagate and Micropolis (I think that is what you said) > disk drives, but I have no idea as to what models). > d) You have running on similiar hardware (maybe even the exact hardware) > BSDI with long uptimes. > e) You crash once a day. > f) You publically posted that you get a 200% performance boost running > FreeBSD over BSDI, telling me we are probably pushing your hardware > quite a bit harder than BSDI did. > > What I do not know: > > a) Are you using active termination? Yes. > b) Do your scsi cables meet the SCSI-ii spec with respect to all > parameters (length, impendence, capacitance, etc)? Yes. We use HP SCSI-II cables in most of our applications. No cheap stuff here. > c) What exact model of disk drives you are using? For the one which has the 274X adapter: ahc1: target 0 synchronous at 10.0MB/s, offset = 0xf ahc1: target 0 Tagged Queuing Device (ahc1:0:0): "MICROP 3221-10MZ 1128K1 HT02" type 0 fixed SCSI 2 sd0(ahc1:0:0): Direct-Access 1955MB (4004219 512 byte sectors) ahc1: target 1 synchronous at 10.0MB/s, offset = 0xf ahc1: target 1 Tagged Queuing Device (ahc1:1:0): "MICROP 4221-09MZ Q4D HT02" type 0 fixed SCSI 2 sd1(ahc1:1:0): Direct-Access 1955MB (4004219 512 byte sectors) ahc1: target 2 synchronous at 10.0MB/s, offset = 0xf ahc1: target 2 Tagged Queuing Device (ahc1:2:0): "MICROP 4221-09MZ Q4D HT02" type 0 fixed SCSI 2 sd2(ahc1:2:0): Direct-Access 1955MB (4004219 512 byte sectors) ahc1: target 3 synchronous at 10.0MB/s, offset = 0xf ahc1: target 3 Tagged Queuing Device (ahc1:3:0): "MICROP 4221-09MZ Q4D HT02" type 0 fixed SCSI 2 sd3(ahc1:3:0): Direct-Access 1955MB (4004219 512 byte sectors) For the systems (2) which have the 1742 adapters: ahb0 waiting for scsi devices to settle (ahb0:0:0): "SEAGATE ST31200N 8648" type 0 fixed SCSI 2 sd0(ahb0:0:0): Direct-Access 1006MB (2061108 512 byte sectors) > d) What that error message you get is? I'll get it the next time we get a crash; I have posted this one before. It is a timeout message. We probably have it in the logbook, but I want to make sure the message matches exactly. > e) What motherboard you are running on, as much detail as possible. ASUS Dual Pentia motherboard, single P90 processor. This is the EISA/PCI model of their product and has been EXTREMELY stable in other applications. All systems in consideration have 64MB RAM. > f) What exact model/revision aha174x and 274x are you using. I'll have to get this one; both of these board are *very* recent production (less than 2 months old for most; the one machine which was converted has a 1742 that is about a year old, but is the same revision -- which indicates that they haven't changed it) > g) What other I/O cards are in the machine. Two SMC Ethernet cards in each, one standard (512k) VGA ISA board. > h) What is the system running as far as a work load, does any one specific > work load tend to bring the crash out? Varied workloads; one is a news server (INN), the others run http and user processes. There is no pattern to the crashes related to time of day or work in progress at the time. > i) Are you willing to pay for production type support, or is this the > reason you switched from BSDI to FreeBSD and now expect to get that > level of support for free? Contracted support is avaliable from > several people if you expect that level of service. Sure, provided we really get the fixes. I am not adverse to paying for support that actually performs. What I won't pay for is support that doesn't get answers to us in a timely fashion. > What I am willing to do: > > a) As long as you keep answering the questions and filling in the > details I will continue to follow the thread so that we might > come to a final resolution of your problem. > > b) Reserect my DX2/66 EISA 1742 based system to run some testing on > duplicating your environment as much as I can with time permitting > (and I am one very busy person) to try and duplicate the bug here. The DX2 may NOT have the problem due to it being significantly slower. > c) Loan you my aha1742 that I know has worked for 2.5 years with > FreeBSD with out a single hickup. > > d) Since you mentioned ``production'': If you are in a real hurry > to get it fixed, you can pay me at contracted rates and I will be > at your site with my equipment within 2 days. This is an expensive > option, but one that does exist. -- -- Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity Modem: [+1 312 248-0900] | (shell, PPP, SLIP, leased) in Chicagoland Voice: [+1 312 248-8649] | 7 Chicagoland POPs, ISDN, 28.8, much more Fax: [+1 312 248-9865] | Email to "info@mcs.net" WWW: http://www.mcs.net ISDN - Get it here TODAY! | Home of Chicago's only FULL AP Clarinet feed!