From owner-freebsd-scsi Tue Oct 8 9:57:18 2002 Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 305D637B401 for ; Tue, 8 Oct 2002 09:57:14 -0700 (PDT) Received: from aurora.regenstrief.org (aurora.regenstrief.org [134.68.31.122]) by mx1.FreeBSD.org (Postfix) with ESMTP id 884BB43E3B for ; Tue, 8 Oct 2002 09:57:13 -0700 (PDT) (envelope-from gunther@aurora.regenstrief.org) Received: from aurora.regenstrief.org (rgnout.regenstrief.org [134.68.31.38]) by aurora.regenstrief.org (8.11.6/8.11.6) with ESMTP id g98Gtvh49627 for ; Tue, 8 Oct 2002 11:55:57 -0500 (EST) (envelope-from gunther@aurora.regenstrief.org) Message-ID: <3DA30E67.8000206@aurora.regenstrief.org> Date: Tue, 08 Oct 2002 11:57:11 -0500 From: Gunther Schadow Organization: Regenstrief Institute for Health Care User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4) Gecko/20011019 Netscape6/6.2 X-Accept-Language: en-us MIME-Version: 1.0 To: freebsd-scsi@freebsd.org Subject: SCSI bad block remapping doesn't work!?#@$ Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Hi, I can't seem to be able to make my SCSI disk map bad blocks. Can someone please look over my shoulder and see what I may be doing wrong? I have $ uname -a FreeBSD ... 4.4-RELEASE FreeBSD 4.4-RELEASE ... i386 $ camcontrol inquiry da1 pass1: Fixed Direct Access SCSI-2 device pass1: Serial Number WS7010556513 pass1: 20.000MB/s transfers (20.000MHz, offset 15), Tagged Queueing Enabled So, here are the errors: (da1:ahc0:0:2:0): READ(06). CDB: 8 15 f6 90 80 0 (da1:ahc0:0:2:0): MEDIUM ERROR info:15f6b2 csi:ff,ff,ff,ff asc:11,1 (da1:ahc0:0:2:0): Read retries exhausted sks:80,ac (da1:ahc0:0:2:0): READ(06). CDB: 8 15 f6 a0 70 0 (da1:ahc0:0:2:0): MEDIUM ERROR info:15f6b2 csi:ff,ff,ff,ff asc:11,1 (da1:ahc0:0:2:0): Read retries exhausted sks:80,ac (da1:ahc0:0:2:0): READ(06). CDB: 8 15 f6 a0 20 0 (da1:ahc0:0:2:0): MEDIUM ERROR info:15f6b2 csi:ff,ff,ff,ff asc:11,1 (da1:ahc0:0:2:0): Read retries exhausted sks:80,ac ... and so on. Apparently only one of my files was affected near the end. So, I tried to save that file doing this: $ dd bs=8192 if=thefile of=thebackup bs=8192 conv=noerror and I seem to have all I can reasonably expect to get. Now, I did make sure the auto reallocation is enabled: $ camcontrol modepage da1 -m 1 AWRE (Auto Write Reallocation Enbld): 1 ARRE (Auto Read Reallocation Enbld): 1 TB (Transfer Block): 0 RC (Read Continuous): 0 EER (Enable Early Recovery): 0 PER (Post Error): 0 DTE (Disable Transfer on Error): 0 DCR (Disable Correction): 0 Read Retry Count: 255 Correction Span: 48 Head Offset Count: 0 Data Strobe Offset Count: 0 Write Retry Count: 255 Recovery Time Limit: 0 and checked the defect lists: $ camcontrol defects da1 -f phys -P Got 119 defects: 59:4:-1 77:0:77 77:0:78 ... 5133:6:128 5319:3:39 5365:4:121 $ camcontrol defects da1 -f phys -G Got 0 defects. the latter makes me suspicious. Too good to be true. I need to write to this block to get it listed, so I thought, and I did this: $ dd if=thefile of=thefile bs=8192 conv=noerror,sync,notrunc just to check if we can "refresh" a file in place, the idea of which I found pretty neat (thefile is really big, so it comes handy to save space.) When it came to the bad block I got the same errors as above and still: $ camcontrol defects da1 -f phys -G Got 0 defects. redoing the same with $ dd if=thebackup of=thefile bs=8192 didn't help either. No remapping took place and the bad block would still be there. Finally I deleted the file and wrote a big file into the directory: $ dd if=/dev/zero of=bigfile bs=8192 until that command failed because of disk full. Now I thought I should get the sucker remapped, but still nothing: $ camcontrol defects da1 -f phys -G Got 0 defects. I also tried to read the whole bigfile: $ dd if=/dev/zero of=/dev/null bs=8192 and I still get this bad block error! So, now I'm gonna use badsect(8) to isolate that stupid block so it won't hurt me again. In order to do that I needed to find out the sector number(s) to use for badsect. So I did $ dd if=/dev/da1s1e of=/dev/null bs=512 and with that I manually try to get all the bad blocks which is a pain in the but! The conv=notrunc option to dd doesn't seem to work and report all errors that it finds. So now I use dd with skip and count to get a list of all bad blocks with their relative sector numbers starting from the beginning of the partition: $ dd if=/dev/da1s1e of=/dev/null bs=512 dd: /dev/da1s1e: Input/output error 1336978+0 records in ... probe for this sector in particular $ dd if=/dev/da1s1e of=/dev/null bs=512 skip=1336978 count=1 dd: /dev/da1s1e: Input/output error 0+0 records in ... (also probe the next one and the previous one to be sure it's just him.) Then skip over it and scan the rest: $ dd if=/dev/da1s1e of=/dev/null bs=512 skip=1336979 dd: /dev/da1s1e: Input/output error 1860+0 records in ... now add the number of records in to where we started to get the next sector number, probe it to be sure: $ dd if=/dev/da1s1e of=/dev/null bs=512 skip=1338839 count=1 dd: /dev/da1s1e: Input/output error 0+0 records in ... and skip over that one again to scan the rest $ dd if=/dev/da1s1e of=/dev/null bs=512 skip=1338840 dd: /dev/da1s1e: Input/output error 3480+0 records in ... and so on until the rest is read without errors. This is a pain in the butt! Now I have 6 bad sectors (fortunately only 6!). According to badsect(8) I make a directory BAD in the root directory of that filesystem and say: $ badsect BAD 1336978 1338839 1342320 1343737 ... and all the bad blocks. Now umount that fs and fsck, hold the bad block yes, fsck warns about "softupdate inconsistency" I can't get it right the first time so I give in to its persistent suggestions to delete the BAD/* files. Then do it again with the BIGFILE deleted that crosslinked these bad blocks, and this time it works. Why did I have to go through those hassles? Why didn't the SCSI subsystem, the disk drive itself do the bad sector remapping? I remember 2 years ago I had the same hassle with a different disk and I don't remember this automatic reallocation had ever worked for me inspite of me turning it on and double and triple checking the modepage 1 that it was indeed enabled. What am I doing wrong???? thanks, -Gunther To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message