From owner-freebsd-fs Mon Oct 16 2:47:34 2000 Delivered-To: freebsd-fs@freebsd.org Received: from goliath.siemens.de (goliath.siemens.de [194.138.37.131]) by hub.freebsd.org (Postfix) with ESMTP id BF68737B66C; Mon, 16 Oct 2000 02:47:27 -0700 (PDT) X-Envelope-Sender-Is: andre.albsmeier@mchp.siemens.de (at relayer goliath.siemens.de) Received: from mail2.siemens.de (mail2.siemens.de [139.25.208.11]) by goliath.siemens.de (8.11.0/8.11.0) with ESMTP id e9G9lOq13594; Mon, 16 Oct 2000 11:47:24 +0200 (MET DST) Received: from curry.mchp.siemens.de (curry.mchp.siemens.de [139.25.42.7]) by mail2.siemens.de (8.11.0/8.11.0) with ESMTP id e9G9lND04089; Mon, 16 Oct 2000 11:47:23 +0200 (MET DST) Received: (from localhost) by curry.mchp.siemens.de (8.11.1/8.11.1) id e9G9lNq17303; Date: Mon, 16 Oct 2000 11:47:23 +0200 From: Andre Albsmeier To: dbhague@allstor-sw.co.uk Cc: Andre Albsmeier , freebsd-scsi@FreeBSD.org, freebsd-fs@FreeBSD.org, smcintyre@allstor-sw.co.uk Subject: Re: Stressed SCSI subsystem locks up the system Message-ID: <20001016114723.A22193@curry.mchp.siemens.de> References: <8025697A.00340E6C.00@mail.plasmon.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <8025697A.00340E6C.00@mail.plasmon.co.uk>; from dbhague@allstor-sw.co.uk on Mon, Oct 16, 2000 at 10:28:34AM +0100 X-Echelon: BND CIA NSA Mossad KGB MI6 IRA detonator nuclear assault strike Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon, 16-Oct-2000 at 10:28:34 +0100, dbhague@allstor-sw.co.uk wrote: > Andre, > What were your SCSI errors ? Oct 13 10:05:28 server /ktry: (da2:ahc0:0:2:0): data overrun detected in Data-out phase. +Tag == 0xe. Oct 13 10:09:40 server /ktry: (da2:ahc0:0:2:0): Have seen Data Phase. Length = 65536. +NumSGs = 16. These appeared with the 3940AU. When replacing it with two 2940 everything worked great for several days now. I am keeping it this way now. When Justin does some driver changes, I will try my 3940AU again... -Andre > > We have one system that has now run for five days without failure. Today we > will start to deconstruct this unit, any advice would be welcome. > > We also ran five system over the weekend and all but the one, the IDE system, > failed. > These were: > A repeat of the passing system above, failed with > Bad blocks 135666304, inode 5142534 > 6 seconds later, Bad blocks 135666304, inode 5634466 > then, panic ffs_blkfree: freeing free frag, this is on the /RAID partition. > Test run against an IDE disk, still running but slowly > Test run against a SCSI disk > Test run using a Symbios dual SCSI card, > Test running FreeBSD 3.0 > > Two of the above tests have got struck in iowait, for example. > root 451 0.0 0.1 368 172 p0 D Fri06PM 0:17.77 rm -rf /RAID/5 > root 454 0.0 0.2 368 196 p0 D Fri06PM 0:17.85 rm -rf /RAID/7 > root 455 0.0 0.2 368 196 p0 D Fri06PM 0:17.42 rm -rf /RAID/1 > root 457 0.0 0.2 368 196 p0 D Fri06PM 0:17.44 rm -rf /RAID/2 > root 459 0.0 0.2 368 196 p0 D Fri06PM 0:17.71 rm -rf /RAID/6 > root 461 0.0 0.2 368 196 p0 D Fri06PM 0:17.10 rm -rf /RAID/4 > root 463 0.0 0.2 368 196 p0 D Fri06PM 0:17.56 rm -rf /RAID/3 > > Just a few minutes ago cron started to die with a signal 10, we don't think this > is relevant but... > Oct 16 09:55:02 birch /kernel: pid 3551 (cron), uid 0: exited on signal 10 (core > dumped) > Oct 16 10:00:00 birch /kernel: pid 3555 (cron), uid 0: exited on signal 10 (core > dumped) > Oct 16 10:00:00 birch /kernel: pid 3556 (cron), uid 0: exited on signal 10 (core > dumped) > Oct 16 10:05:01 birch /kernel: pid 3558 (cron), uid 0: exited on signal 10 (core > dumped) > Oct 16 10:10:00 birch /kernel: pid 3560 (cron), uid 0: exited on signal 10 (core > dumped) > Oct 16 10:15:00 birch /kernel: pid 3562 (cron), uid 0: exited on signal 10 (core > dumped) > Oct 16 10:20:00 birch /kernel: pid 3564 (cron), uid 0: exited on signal 10 (core > dumped) > > Regards Dave > -- Micro$oft: Which virus will you get today? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message