From owner-freebsd-stable Fri Aug 6 20:33: 8 1999 Delivered-To: freebsd-stable@freebsd.org Received: from hotmail.com (law-f60.hotmail.com [209.185.131.123]) by hub.freebsd.org (Postfix) with SMTP id 684F914CE5 for ; Fri, 6 Aug 1999 20:32:54 -0700 (PDT) (envelope-from lightningweb@hotmail.com) Received: (qmail 17072 invoked by uid 0); 7 Aug 1999 03:32:41 -0000 Message-ID: <19990807033241.17071.qmail@hotmail.com> Received: from 24.4.2.215 by www.hotmail.com with HTTP; Fri, 06 Aug 1999 20:32:40 PDT X-Originating-IP: [24.4.2.215] From: "lweb Lightningweb" To: freebsd-stable@FreeBSD.ORG Cc: greg@lightningweb.com, jeremy@lightningweb.com, keith@lightningweb.com, criter@lightningweb.com Subject: continued crashes with 3.1-Stable Date: Fri, 06 Aug 1999 20:32:40 PDT Mime-Version: 1.0 Content-Type: text/plain; format=flowed Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG We have not resolved our problem with frequent freezes with our web server. We had two responses to our first mail to this list, but neither one was the solution. The problem is that the server will stop responding to ANYTHING except pings. No telnet, no ssh, no web, no ftp, nothing. Open telnet sessions don't drop, there's just no response to keyboard activity. One suggestion was to fix the "pthreads library," whic we did. The other was: "You may have hardware problems." This server is going down more frequently. Three times so far today. There is no apparant pattern to the crashes. They seem to happen most often during an Mysql database query, but it's happened many times without any queries (a few times just by running "pine" with a large mailbox file). We cannot recreate the crash when we want, it just crashes at random times. We've tried hammering it with web, database queries, and benchamrking programs that slam the RAID array and memory and processors, but it chuggs right along. We have replaced drives in the RAID array, we are now replacing drive caddies. Next step I think will be the RAID controller. I have a strong gut feeling that it is software however. There's nothing to substantiate this, except that that more often than not, the crash happens during an MySQL query. Some (NOT ALL) of the suspect errors that we've recorded from the console during a crash are: (da0:dpt0:0:0:0): Invalidating pack biodone: buffer already done spec_getpages: I/O read failure: (error code=6) size: 32768, resid: 32768, a_count: 32768, valid: 0x0 nread: 0, reqpage: 0, pindex: 0, pcount: 8 Everyone please take a second look at this and help us brainstorm the problem? I am including a list of the hardware, the original message we sent to the list, and a recent dmesg: FreeBSD 3.1-STABLE #1 Dual-Proc PII 450 512MB RAM DPT PM334UW RAID controller - 16MB RAM - dual bus Ultra Wide - Six 9.1GB Quantum VikingII SCSI3 U2W drives - Three drives per bus, RAID5, one drive is hot-spare Intel EtherExpress Pro 10/100B Ethernet TOSHIBA CD-ROM XM-6201TA -------------- I've recently had the job of system administration dumped in my lap. I'm looking forward to getting on top of it, but I'm a little behind the 8-ball right now. If my subject matter varies too far from the allowed context of this list, please don't flame me too badly. Background: We are running a dual PII 450 system with a 45 gig raid array, controlled by a DPT PM334. The O/S: FreeBSD 3.1-STABLE #1 For several months this has been rock solid. However, in the past three weeks, we've had a number of crashes, most of which seem to be related to mysql queries. The system would be totally unresponsive to ssh/telnet and web, but would still return pings. The server is colocated at our ISP, so it's been tricky to track down the exact 'on screen' console errors. Today, shortly after we upgraded our mysql version, I did see the error. (da0:dpt0:0:0:0): Invalidating Pack (da0:dpt0:0:0:0): Invalidating Pack devstat_end_transaction: HELP!! busy_count for da0 is < 0 (-1)! biodone: buffer already done (da0:dpt0:0:0:0): Read (10). CDB: 28 0 3 87 33 1f 0 0 80 0 (da0:dpt0:0:0:0): ILLEGAL REQUEST asc:20,0 (da0:dpt0:0:0:0): Invalid command operation code devstat_end_transaction: HELP!! busy_count for da0 is < 0 (-1)! biodone: buffer already done Followed by a complete system freeze, including the console. Some hunting and searching has led us to believe that we are encountering a driver failure and that we should bring the OS back to -stable. As I said, I haven't done this before, so I'm a little anxious. Before I take that step, I would be very greatful to hear some input from those who surely know more about this than I do. Is bringing the system back to -stable likely to correct our problem? Am I missing some indicator in the error above? Has someone else encountered similar trouble (and found a fix?) I'll be happy to take replies in private e-mail if this is off topic. Any help would be great. Thanks, Jeremy _______________________________________________________________ Get Free Email and Do More On The Web. Visit http://www.msn.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message