From owner-freebsd-scsi  Tue Oct  8  9:57:18 2002
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 305D637B401
	for <freebsd-scsi@freebsd.org>; Tue,  8 Oct 2002 09:57:14 -0700 (PDT)
Received: from aurora.regenstrief.org (aurora.regenstrief.org [134.68.31.122])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 884BB43E3B
	for <freebsd-scsi@freebsd.org>; Tue,  8 Oct 2002 09:57:13 -0700 (PDT)
	(envelope-from gunther@aurora.regenstrief.org)
Received: from aurora.regenstrief.org (rgnout.regenstrief.org [134.68.31.38])
	by aurora.regenstrief.org (8.11.6/8.11.6) with ESMTP id g98Gtvh49627
	for <freebsd-scsi@freebsd.org>; Tue, 8 Oct 2002 11:55:57 -0500 (EST)
	(envelope-from gunther@aurora.regenstrief.org)
Message-ID: <3DA30E67.8000206@aurora.regenstrief.org>
Date: Tue, 08 Oct 2002 11:57:11 -0500
From: Gunther Schadow <gunther@aurora.regenstrief.org>
Organization: Regenstrief Institute for Health Care
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4) Gecko/20011019 Netscape6/6.2
X-Accept-Language: en-us
MIME-Version: 1.0
To: freebsd-scsi@freebsd.org
Subject: SCSI bad block remapping doesn't work!?#@$
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-scsi@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-scsi.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-scsi>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-scsi>
X-Loop: FreeBSD.org

Hi,

I can't seem to be able to make my SCSI disk map bad blocks. Can someone
please look over my shoulder and see what I may be doing wrong? I have

$ uname -a
FreeBSD ... 4.4-RELEASE FreeBSD 4.4-RELEASE ... i386

$ camcontrol inquiry da1
pass1: <COMPAQPC WDE4360 1.52> Fixed Direct Access SCSI-2 device
pass1: Serial Number WS7010556513
pass1: 20.000MB/s transfers (20.000MHz, offset 15), Tagged Queueing Enabled

So, here are the errors:

(da1:ahc0:0:2:0): READ(06). CDB: 8 15 f6 90 80 0
(da1:ahc0:0:2:0): MEDIUM ERROR info:15f6b2 csi:ff,ff,ff,ff asc:11,1
(da1:ahc0:0:2:0): Read retries exhausted sks:80,ac
(da1:ahc0:0:2:0): READ(06). CDB: 8 15 f6 a0 70 0
(da1:ahc0:0:2:0): MEDIUM ERROR info:15f6b2 csi:ff,ff,ff,ff asc:11,1
(da1:ahc0:0:2:0): Read retries exhausted sks:80,ac
(da1:ahc0:0:2:0): READ(06). CDB: 8 15 f6 a0 20 0
(da1:ahc0:0:2:0): MEDIUM ERROR info:15f6b2 csi:ff,ff,ff,ff asc:11,1
(da1:ahc0:0:2:0): Read retries exhausted sks:80,ac
...

and so on. Apparently only one of my files was affected near the end. So,
I tried to save that file doing this:

$ dd bs=8192 if=thefile of=thebackup bs=8192 conv=noerror

and I seem to have all I can reasonably expect to get.

Now, I did make sure the auto reallocation is enabled:

$ camcontrol modepage da1 -m 1
AWRE (Auto Write Reallocation Enbld):  1
ARRE (Auto Read Reallocation Enbld):  1
TB (Transfer Block):  0
RC (Read Continuous):  0
EER (Enable Early Recovery):  0
PER (Post Error):  0
DTE (Disable Transfer on Error):  0
DCR (Disable Correction):  0
Read Retry Count:  255
Correction Span:  48
Head Offset Count:  0
Data Strobe Offset Count:  0
Write Retry Count:  255
Recovery Time Limit:  0

and checked the defect lists:

$ camcontrol defects da1 -f phys -P
Got 119 defects:
59:4:-1
77:0:77
77:0:78
...
5133:6:128
5319:3:39
5365:4:121

$ camcontrol defects da1 -f phys -G
Got 0 defects.

the latter makes me suspicious. Too good to be true. I need to write
to this block to get it listed, so I thought, and I did this:

$ dd if=thefile of=thefile bs=8192 conv=noerror,sync,notrunc

just to check if we can "refresh" a file in place, the idea of which
I found pretty neat (thefile is really big, so it comes handy to save
space.)

When it came to the bad block I got the same errors as above and still:

$ camcontrol defects da1 -f phys -G
Got 0 defects.

redoing the same with

$ dd if=thebackup of=thefile bs=8192

didn't help either. No remapping took place and the bad block would still
be there.

Finally I deleted the file and wrote a big file into the directory:

$ dd if=/dev/zero of=bigfile bs=8192

until that command failed because of disk full. Now I thought I should
get the sucker remapped, but still nothing:

$ camcontrol defects da1 -f phys -G
Got 0 defects.

I also tried to read the whole bigfile:

$ dd if=/dev/zero of=/dev/null bs=8192

and I still get this bad block error! So, now I'm gonna use badsect(8) to
isolate that stupid block so it won't hurt me again. In order to do that
I needed to find out the sector number(s) to use for badsect. So I did

$ dd if=/dev/da1s1e of=/dev/null bs=512

and with that I manually try to get all the bad blocks which is a pain in
the but! The conv=notrunc option to dd doesn't seem to work and report
all errors that it finds. So now I use dd with skip and count to get a
list of all bad blocks with their relative sector numbers starting from
the beginning of the partition:

$ dd if=/dev/da1s1e of=/dev/null bs=512
dd: /dev/da1s1e: Input/output error
1336978+0 records in
...

probe for this sector in particular

$ dd if=/dev/da1s1e of=/dev/null bs=512 skip=1336978 count=1
dd: /dev/da1s1e: Input/output error
0+0 records in
...

(also probe the next one and the previous one to be sure it's just
him.) Then skip over it and scan the rest:

$ dd if=/dev/da1s1e of=/dev/null bs=512 skip=1336979
dd: /dev/da1s1e: Input/output error
1860+0 records in
...

now add the number of records in to where we started to get the next
sector number, probe it to be sure:

$ dd if=/dev/da1s1e of=/dev/null bs=512 skip=1338839 count=1
dd: /dev/da1s1e: Input/output error
0+0 records in
...

and skip over that one again to scan the rest

$ dd if=/dev/da1s1e of=/dev/null bs=512 skip=1338840
dd: /dev/da1s1e: Input/output error
3480+0 records in
...

and so on until the rest is read without errors.

This is a pain in the butt!

Now I have 6 bad sectors (fortunately only 6!). According to badsect(8)
I make a directory BAD in the root directory of that filesystem and
say:

$ badsect BAD 1336978 1338839 1342320 1343737 ...

and all the bad blocks. Now umount that fs and fsck, hold the bad block
yes, fsck warns about "softupdate inconsistency" I can't get it right
the first time so I give in to its persistent suggestions to delete the
BAD/* files. Then do it again with the BIGFILE deleted that crosslinked
these bad blocks, and this time it works.

Why did I have to go through those hassles? Why didn't the SCSI subsystem,
the disk drive itself do the bad sector remapping? I remember 2 years
ago I had the same hassle with a different disk and I don't remember this
automatic reallocation had ever worked for me inspite of me turning it
on and double and triple checking the modepage 1 that it was indeed
enabled. What am I doing wrong????

thanks,
-Gunther






To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message