From owner-freebsd-scsi Sat Feb 1 07:20:55 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id HAA26321 for freebsd-scsi-outgoing; Sat, 1 Feb 1997 07:20:55 -0800 (PST) Received: from sax.sax.de (sax.sax.de [193.175.26.33]) by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id HAA26311 for ; Sat, 1 Feb 1997 07:20:48 -0800 (PST) Received: (from uucp@localhost) by sax.sax.de (8.6.12/8.6.12-s1) with UUCP id QAA07013; Sat, 1 Feb 1997 16:20:42 +0100 Received: (from j@localhost) by uriah.heep.sax.de (8.8.5/8.6.9) id QAA06590; Sat, 1 Feb 1997 16:03:46 +0100 (MET) Message-ID: Date: Sat, 1 Feb 1997 16:03:46 +0100 From: j@uriah.heep.sax.de (J Wunsch) To: Don.Lewis@tsc.tdk.com (Don Lewis) Cc: freebsd-scsi@freebsd.org Subject: Re: SCSI disk MEDIUM ERROR with a few twists References: <199702011424.GAA28908@salsa.gv.tsc.tdk.com> X-Mailer: Mutt 0.55-PL10 Mime-Version: 1.0 X-Phone: +49-351-2012 669 X-PGP-Fingerprint: DC 47 E6 E4 FF A6 E9 8F 93 21 E0 7D F9 12 D6 4E Reply-To: joerg_wunsch@uriah.heep.sax.de (Joerg Wunsch) In-Reply-To: <199702011424.GAA28908@salsa.gv.tsc.tdk.com>; from Don Lewis on Feb 1, 1997 06:24:59 -0800 Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk As Don Lewis wrote: > } It could be the drive itself. > > The MEDIUM ERROR itself and the falling offline a week or so later > are definitely the fault of the drive. That the error wasn't reported > to userland lies somewhere between the driver and userland, inclusive. See my other mail. For buffered (filesystem) writes, it's no surprise. Reads should, however, always report it. > Jan 18 04:30:33 news /kernel: sd0(ahc0:0:0): MEDIUM ERROR info:14683a asc:11,0 Unrecovered read error field replaceable unit: ea sks:80,11 > Always the same info:#. Which means: always the same block # (in hex). > I also can't quote messages from it's death throes before it wedged, > because this disk also contains /var and nothing was syslogged until > after I got the machine running multi-user again. I *think* the message > was: "Logical unit is in process of becoming ready", but if so it was > lying. Btw., you should no longer see this error message now. This case is retried forever, until it either turns into a `real' error, or eventually succeeds. > It gave me at least two weeks warning last time. If it gets sick again, > then I can at least file a more complete report ;-) Are there any > experiments you want me to try? Well, you could see why the read error isn't reported to userland then. :-) > } Also, go through SCSI reformatting it. This will cause the drive to > } recreate the bad sector table as necessary. You can even do this > } without using the adapter BIOS, there's always /sbin/scsiformat for > } this. > > The painful part is that this is the root disk, and I'm pretty sure the > 2.1.x fixit disk doesn't contain scsiformat. scsiformat is simple: scsi -s 7200 -f /dev/rsdX.ctl -c "4 0 0 0 0 0" (Put it into background if you prefer, once started, you can't break it with ^Z.) > Doesn't remapping the sector > add the original to the drive's grown defect list? Yes, but reformatting does IMHO often a more complete check, so if an adjacent sector is flakey, it will more likely be put there as well. We need a remapping tool as well. Anybody here who ever dealt with defect list management? Since we do already know the block number (from the info field in the syslog message), it should be easy to add it to the defect list. -- cheers, J"org joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-)