From owner-freebsd-scsi Fri Nov 5 3:10: 6 1999 Delivered-To: freebsd-scsi@freebsd.org Received: from mail.powertech.no (intentia.powertech.no [195.159.0.220]) by hub.freebsd.org (Postfix) with ESMTP id 4D2EF14D09 for ; Fri, 5 Nov 1999 03:09:53 -0800 (PST) (envelope-from shamz@login1.powertech.no) Received: from login1.powertech.no (IDENT:root@login1.powertech.no [195.159.0.151]) by mail.powertech.no (8.9.3/8.8.5) with ESMTP id MAA11686 for ; Fri, 5 Nov 1999 12:09:52 +0100 Received: (from shamz@localhost) by login1.powertech.no (8.9.3/8.9.3) id MAA01772 for freebsd-scsi@freebsd.org; Fri, 5 Nov 1999 12:09:51 +0100 Date: Fri, 5 Nov 1999 12:09:51 +0100 From: Shaun Jurrens To: freebsd-scsi@freebsd.org Subject: scsi bus errors Message-ID: <19991105120951.C1083@shamz.net> Reply-To: shaun@shamz.net Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Hi, After reading the lists and trying about everything under the sun to get the errors to abate, I am finally writing. The setup is about the same as all the others with SCB timeout errors. The messages started (as far as I can tell) after a makeworld on June 29, 1999. I have a 2940W on a pci card with the same system running the 3.x Stable tree since January. Now that I see that IBM disks get the same errors as my two Quantum drives, it can't just be firmware or termination. The disks run without problems under NT and 95. I have checked the disks with Adaptec's software for bad sectors and no new bad sectors can be detected (to the original default). I have also changed the settings to run the disks below the full transfer rate, based on the ahc man page, but to no avail. The problem is starting to get really annoying because it is clobbering files and crashing programs. I now have to shutdown and fsck at least once a week, independent of load. The system can do a -j16 buildworld without a timeout, but normal operation sometimes causes it. The problem is so irregular, that I haven't been able to determine the cause. I do have softupdates enable (though I don't know for how much longer) but they also have been enabled the entire time. I tried to determine if there was any changes to the ahc driver between April and June which might have affected the performance, but since I am not very versed in C, I can't really work with what I have found. The process of elimination brings me down to the driver, because the system itself has no evident hardware problems. I could try fiddling with the BIOS a little (a FIC-2011? with a K6-2 not overclocked, Matrox graphic card, PCI nic from Realtek), but I hate the eternal reboot cycles (feel like I'm usi! ng NT again). I guess if I wasn't convinced that it was a driver problem I wouldn't have written. I left out the logs because they don't seem to have been more than grounds for speculation about termination and such up until now. Seeing the number of people who are getting these messages on various hardware platforms, I would seem to have reason to suspect the one common denominator: the ahc driver. I would suspect a bad disk if it was just one of the two, but the chances of having bad firmware (Quantum, I have heard, has poor firmware anyway) on two totally different models of harddisks, seems to be relatively improbable. The chance that they both begin to have the same problem at the same time even less probable. I would appreciate some comments from the maintainer of ahc and could really do without speculation. Assume I have used the scientific method over the last four months and have allieviated the most obvious things first. A last note, I currently am not subscribed to the lists because problems with my provider. A cc to me would be greatly appreciated (assuming I'm not bounced for not being on the list in the first place). To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message