From owner-freebsd-scsi Sat Feb 1 07:37:21 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id HAA26957 for freebsd-scsi-outgoing; Sat, 1 Feb 1997 07:37:21 -0800 (PST) Received: from gatekeeper.tsc.tdk.com (root@gatekeeper.tsc.tdk.com [207.113.159.21]) by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id HAA26952 for ; Sat, 1 Feb 1997 07:37:19 -0800 (PST) Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191]) by gatekeeper.tsc.tdk.com (8.8.4/8.8.4) with ESMTP id HAA18417; Sat, 1 Feb 1997 07:37:12 -0800 (PST) Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194]) by sunrise.gv.tsc.tdk.com (8.8.4/8.8.4) with ESMTP id HAA01119; Sat, 1 Feb 1997 07:37:11 -0800 (PST) Received: (from gdonl@localhost) by salsa.gv.tsc.tdk.com (8.8.4/8.8.4) id HAA28985; Sat, 1 Feb 1997 07:37:09 -0800 (PST) From: Don Lewis Message-Id: <199702011537.HAA28985@salsa.gv.tsc.tdk.com> Date: Sat, 1 Feb 1997 07:37:09 -0800 In-Reply-To: j@uriah.heep.sax.de (J Wunsch) "Re: SCSI disk MEDIUM ERROR with a few twists" (Feb 1, 4:03pm) X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95) To: joerg_wunsch@uriah.heep.sax.de (Joerg Wunsch), Don.Lewis@tsc.tdk.com (Don Lewis) Subject: Re: SCSI disk MEDIUM ERROR with a few twists Cc: freebsd-scsi@freebsd.org Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Feb 1, 4:03pm, J Wunsch wrote: } Subject: Re: SCSI disk MEDIUM ERROR with a few twists } As Don Lewis wrote: } > I also can't quote messages from it's death throes before it wedged, } > because this disk also contains /var and nothing was syslogged until } > after I got the machine running multi-user again. I *think* the message } > was: "Logical unit is in process of becoming ready", but if so it was } > lying. } } Btw., you should no longer see this error message now. This case is } retried forever, until it either turns into a `real' error, or } eventually succeeds. Actually, this was kind of wierd too. When I checked the console, it was covered with this message. I tapped a few keys on the keyboard and I got a "press any key to reboot" message. There was no sign of a panic. That's when it tried to reboot and hung in the SCSI BIOS waiting for the drive ... } > It gave me at least two weeks warning last time. If it gets sick again, } > then I can at least file a more complete report ;-) Are there any } > experiments you want me to try? } } Well, you could see why the read error isn't reported to userland } then. :-) If I don't get caught in a maze of twisty little passages ;-) Yeah, I can try tar again, and dd the raw partition to /dev/null. That should narrow it down a bit. } scsiformat is simple: } } scsi -s 7200 -f /dev/rsdX.ctl -c "4 0 0 0 0 0" } } (Put it into background if you prefer, once started, you can't break } it with ^Z.) Since it's the root disk, I won't be doing much else. } > Doesn't remapping the sector } > add the original to the drive's grown defect list? } } Yes, but reformatting does IMHO often a more complete check, so if an } adjacent sector is flakey, it will more likely be put there as well. I've got another question. I read in the archives why this sector wouldn't be automagically remapped by the drive on a read failure even though automagic remapping is turned on. But wouldn't the drive remember that the sector was bad and remap it the next time it was written (assuming it hadn't been powered off in between)? I'd be willing to bet that this sector had been written at least once between the failures that were logged. } We need a remapping tool as well. Anybody here who ever dealt with } defect list management? Since we do already know the block number } (from the info field in the syslog message), it should be easy to add } it to the defect list. I was reading the SCSI spec and thinking about writing something that would at dump out the current defect list, but then my brain started hurting too much :-( --- Truck