Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 1 Feb 1997 14:29:17 +0100
From:      j@uriah.heep.sax.de (J Wunsch)
To:        Don.Lewis@tsc.tdk.com (Don Lewis)
Cc:        freebsd-fs@freebsd.org, freebsd-scsi@freebsd.org
Subject:   Re: SCSI disk MEDIUM ERROR with a few twists
Message-ID:  <Mutt.19970201142917.j@uriah.heep.sax.de>
In-Reply-To: <199702010811.AAA28411@salsa.gv.tsc.tdk.com>; from Don Lewis on Feb 1, 1997 00:11:02 -0800
References:  <199702010811.AAA28411@salsa.gv.tsc.tdk.com>

next in thread | previous in thread | raw e-mail | index | archive | help
As Don Lewis wrote:

(It would be fine if you could structure your report better.  It's
very hard to browse through, all the paragraphs were filled up with
words where it's hard to figure out the essence of your problem.)

> 	/etc/daily doesn't report this

(and others don't report this)

Of course.  That's because buffered writes cannot report media errors
to their caller.  The caller has already got an OK indication about
the write operation, when the device driver finally notices the write
error.  All the driver can do at this point is syslogging the problem.

You ought to check your syslog regularly.  The easiest way is to drop
it onto all your logged in terminals :) (seriously, i do).

> It could be the filesystem, the SCSI driver, or the drive firmware.

It could be the drive itself.

What MEDIUM ERRORs are these?  You forgot to quote the most important
thing, the driver message.

> I don't know whether the SCSI code isn't reporting this to the filesystem,
> or the filesystem isn't reporting this to userland code, but dump didn't
> seem to see a problem, tar didn't seem to see a problem.

It's interesting to know that dump didn't see the problem, since dump
operates on the raw device, where error reporting is possible.  Are
you sure they were _unrecovered_ medium errors, i.e. the kernel didn't
successfully retry them?  Again, please *quote* the error messages,
instead of assuming we know them.

> Before replacing the drive, I decided to run the Adaptec disk verification.
> It found a grand total of one bad sector and remapped it.  The only
> remaining damage was that fsck had deleted my newsgroups file and
> history.pag had one formerly bad sector.  Since the disk didn't appear
> to be hopeless, I replaced the newsgroups file and rebuilt history.pag,
> and things have been working flawlessly ever since.

I wouldn't use that disk for serious work again.  It's certainly good
for storing news articles, but no longer reliable enough for storing
your history database there.

Also, go through SCSI reformatting it.  This will cause the drive to
recreate the bad sector table as necessary.  You can even do this
without using the adapter BIOS, there's always /sbin/scsiformat for
this.  I've once recovered another Seacrate drive that suffered from
medium errors, and am using this until now (more than one year after
those problems).  However, i resorted it to a scratch drive for
release testing etc., and do no longer use it for mission-critical
work.

-- 
cheers, J"org

joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Mutt.19970201142917.j>