Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 9 Oct 2004 13:50:07 -0600 (MDT)
From:      Doug Russell <drussell@saturn-tech.com>
To:        John Von Essen <john@essenz.com>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: hacking SCO....
Message-ID:  <20041009125651.H41465-100000@mxb.saturn-tech.com>
In-Reply-To: <0D27BFB8-1A20-11D9-883D-0003933DDCFA@essenz.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sat, 9 Oct 2004, John Von Essen wrote:

> The SCSI card is an old Adaptec, AIC-7880 and I believe it does not
> support automatic bad block detection/redirection.

If it has a BIOS it should have the verify tool in there...

All the verify tool does, though, is issue a verify command to each
sector.  You can do this yourself, even on a running system, also.

> This disk came from a spares kits, so even though it is "new" and never
> used, it is still 5-6 years old. There were 6 bad blocks, once they
> were put in the bad block table, everything was fine.

Exactly.  Most of my SCSI disks are old spares, or old surplus disks.
They've likely been moved several times, bumped around, and time itself
can make a marginal section of the magnetic coating stop holding data
perfectly.

> Is sformat the freebsd equivalent of the badtrk utility. I have always

No, I don't believe so.  I think badtrk is probably like the old bad144
system that was abandoned because it is unnecessary on all modern disks.
All modern IDE disks have built-in re-allocation tables, similar to the
way SCSI disks work, but you can't manipulate them manually as easily as
you can on a SCSI disk.

sformat is a handy formatting utility written by Joerg Schilling.
It has special options for partitioning disks on SUN machines, but the
format routines and defect scans will run on virtually any platform.

It has full patern testing abilities.  First it writes and verifies every
byte on the drive as all  00000000's  then all  11111111's...  then
01010101  and  10101010, etc. etc...  stressing the media in every
possible combination.  (Many, many, errors won't show up if you just write
all zeros or ones, for example.  It is much harder to store a zero next to
a one, so trying every combination pre-tests anything that might be
written.)

This kind of testing is done at the factory, and limited pattern testing
is generally done by the built-in low-level format routine on most drives.

> used Ultra2 LVD SCSI and higher on FreeBSD and have never had this
> issue of bad blocks. Is that because those newer SCSI disks and
> controllers have better ECC handling and take care of the bad blocks
> internally without notifying the user?

If you play with the SCSI modepages, you can tell the drive to post an
error or not under various conditions (a correctible ECC error, an
uncorrectible error, a re-allocated block, etc.)

You've probably never seen errors before because the drives were set with
Automatic (Write / Read) Reallocation Enable (AWRE and ARRE) set in
modepage 1.

This disk you're working with now obviously has ARRE and probably AWRE
disabled, so it isn't trying to re-allocate the blocks when it finds an
error.  I'd check that and turn it on.  Then, the next time you try to
read the bad block, the drive should remap it on its own.

The exact behavior varies by drive and by settings in the modepages.  Some
drives may have AWRE and ARRE enabled, but not re-allocate a certain block
because they can't get the data off the sector in the retry time allowed.
Cranking up the retry timer (when available) might work, or else you have
to do it manually by sending the re-allocate block command.

I sure love SCSI disks...  They let you do virtually ANYTHING to them if
you know the technical details of how to send the commands.  (The
technical manuals for the disks are very handy in this regard.)

On FreeBSD, you can see what was in the defect table at the factory by
doing a  camcontrol defects daX -f phys -P  (I like phys, as it shows the
actual physical head and cylinder number with a byte offset -- you can
actually 'see' the areas that are defective).  You can see the GROWN
defect list (rather than the primary) with -G.  VERY often the grown
defects are simply the next sector or two to the sides of an existing
defect.  If a series of defects span several cylinders on the same head at
about the same offset, it's probably a media defect 'scratch' across the
disk.  If it is on the same cylinder on several heads at about the same
place, it is probably from a mini-head-crash due to the disk getting
bumped, etc, where it actually damaged spots on several disks at once when
the heads touched (or almost touched).  etc. etc.

Interestingly enough, the way sformat sends the block format commands,
some disks add the new defects found to their PRIMARY defect list, instead
of the GROWN list, as if they had been re-tested at the factory.
There is a command to clear the GROWN list, but not the PRIMARY.  (Some
cheesy drives re-do their primary table when you send them the single
low-level-format command, but most just add to the table.  If you ever
have LESS primary defects after sending a LLF command, it would be a VERY
good idea to use sformat to better pattern test the drive before service)

Here are some examples from a couple of disks here:

Script started on Sat Oct  9 13:13:01 2004
ROOT mxb:/home/drussell 101 > camcontrol defects da0 -f phys -P

	/* This is the PRIMARY (factory) defect table) /*

Got 277 defects:
3:0:105
4:0:105
5:0:105
6:0:105
7:0:105
8:0:105
9:0:105
10:0:105	/* See, must be a 'scratch across the first 60 tracks */
11:0:105	/* on head 0.	*/
12:0:105
13:0:105
14:0:105
14:1:10
15:0:105
16:0:105
17:0:105
18:0:105
19:0:105
20:0:105
21:0:105
21:19:34	/* One other defect on head 19 (this disk has 11 media */
22:0:105	/* platters with 21 heads (one unused surface)	       */
23:0:105	/* the defect is on 19, the second-last head	       */
24:0:105
25:0:105
26:0:105
27:0:105
28:0:105
29:0:105
30:0:105
31:0:105
41:0:105
42:0:105
43:0:105
44:0:105
45:0:105
46:0:105
47:0:105
48:0:105
49:0:105
50:0:105
51:0:105
52:0:105
53:0:105
54:0:105
55:0:105
56:0:105
57:0:105
58:0:105
59:0:105
60:0:105
114:1:91
135:4:89
227:14:63
252:11:97
299:2:74
343:5:56
396:17:119
449:20:115
463:2:110
522:5:45
523:8:45
531:15:103
569:15:15
598:17:42
685:7:103
687:15:57
692:17:35
737:16:9
801:3:79
863:20:29
902:8:112
919:15:18
974:20:29
975:20:29
976:20:29
977:20:29
1054:2:38
1092:13:70
1116:8:78
1134:17:75
1236:5:28
1276:17:40
1343:16:46
1351:13:60
1352:13:60
1353:13:60
1354:13:60
1355:13:60
1356:13:60
1357:13:60
1358:13:60
1359:13:60
1360:13:60
1361:13:60
1362:13:60
1363:13:60
1364:13:60
1404:18:6
1440:20:103
1596:5:96
1598:5:96
1634:12:64
1698:19:54
1699:19:54
1700:19:54
1701:19:54
1702:19:54
1703:19:54
1704:19:54
1705:19:54
1706:19:54
1707:19:54
1708:19:54
1709:19:54
1710:19:54
1711:19:54
1712:19:54
1713:19:54
1714:19:54
1715:19:54
1716:19:54
1717:19:54
1746:20:100
1791:19:13
1829:13:94
1835:14:63
1836:14:63
1852:13:92
1853:13:92
1854:13:92
1855:13:92
1856:13:92
1875:15:80
1935:15:89
2008:14:52
2009:14:52
2010:14:52
2010:14:83
2011:14:52
2012:14:52
2013:14:52
2014:14:52
2015:14:52
2016:14:52
2017:14:52
2018:14:52
2019:14:52
2020:14:52
2021:14:52
2022:14:52
2028:7:46
2028:20:27
2038:18:53
2151:5:66
2153:17:65
2198:6:97
2278:15:10
2279:15:10
2280:15:10
2281:15:10
2447:16:7
2521:1:68
2612:20:6
2625:15:56
2767:8:69
2828:7:81
2865:13:59
2866:13:59
2867:13:59
2868:13:59
2909:13:18
2958:3:10
3001:13:58
3002:13:58
3003:13:58
3004:13:58
3005:13:58
3006:13:58
3007:13:58
3008:13:58
3009:5:49
3009:13:58
3010:13:58
3011:13:58
3012:13:58
3013:13:58
3014:13:58
3015:13:58
3016:13:58
3017:13:58
3018:13:58
3019:13:58
3020:13:58
3021:13:58
3022:13:58
3023:13:58
3024:13:58
3025:13:58
3026:13:58
3027:13:58
3028:13:58
3078:13:58
3079:13:58
3080:13:58
3081:13:58
3082:13:58
3083:13:58
3084:13:58
3085:13:58
3086:13:58
3087:13:57
3088:13:57
3089:13:57
3090:13:57
3091:13:57
3213:2:54
3255:17:42
3256:17:42
3257:17:42
3258:17:42
3259:17:42
3260:17:42
3261:17:42
3262:17:42
3263:17:42
3264:17:42
3265:17:42
3266:17:42
3267:17:42
3268:17:42
3269:17:42
3270:17:42
3271:17:42
3272:17:42
3273:17:42
3290:2:39
3331:20:38
3332:20:38
3333:20:38
3334:20:37
3335:20:37
3336:20:37
3337:20:37
3338:20:37
3339:20:37
3340:20:37
3341:20:37
3342:20:37
3343:20:37
3344:20:37
3650:20:28
3651:20:28
3652:20:28
3653:20:28
3654:20:28
3655:20:28
3656:20:28
3657:20:28
3658:20:28
3658:20:29
3659:20:28
3660:20:28
3661:20:28
3662:20:28
3663:20:28
3664:20:28
3665:20:28
3666:20:28
3676:15:14
3690:14:31
3702:20:41
3703:20:41
3704:20:41
3705:20:41
3710:3:31
3711:3:31

ROOT mxb:/home/drussell 102 > ^P^G
camcontrol defects da0 -f phys -G
Got 0 defects.

	/* no new defects detected since last 'factory' format */

ROOT mxb:/home/drussell 103 > ^0^1
camcontrol defects da0 -f phys -G
Got 0 defects.

	/* no grown defects on da1, either */

ROOT mxb:/home/drussell 104 > ^G^P
camcontrol defects da0 -f phys -P

	/* this disk has 4 platters, and all 8 surfaces have r/w heads */

Got 156 defects:
86:7:84
86:7:85
151:5:224
150:5:224
149:5:224
327:5:258
327:5:259
355:3:189
355:3:190
395:3:244
395:3:245
394:3:244
394:3:245
609:4:195
609:4:196
656:7:17
687:3:126
687:3:127
687:3:128
687:3:129
687:3:130
711:6:84
711:6:85
827:7:12
827:7:13
933:3:248
933:3:249
1053:4:186
1053:4:187
1058:7:11
1058:7:12
1058:7:13
1086:4:59
1086:4:60
1267:4:184
1315:5:133
1315:5:134
1513:4:183
1513:4:184
1580:4:198
1583:4:185
1592:2:197
1592:2:198
1593:3:72
1749:6:101
1787:5:72
1790:6:99
1909:6:99
2008:2:140
2043:5:96
2043:5:97
2078:6:99
2184:2:80
2259:5:114
2276:6:96
2276:6:97
2354:6:96
2354:6:97
2356:6:96
2356:6:97
2390:6:96
2390:6:97
2402:6:96
2402:6:97
2425:6:96
2425:6:97
2428:6:96
2428:6:97
2449:3:224
2449:3:225
2528:4:174
2528:4:175
2528:4:176
2528:4:177
2536:6:96
2536:6:97
2565:3:42
2565:3:43
2626:4:213
2640:4:179
2716:6:93
2716:6:94
2813:6:93
2813:6:94
2831:4:159
2909:4:186
2909:4:187
2984:6:93
2984:6:94
3000:2:77
3000:2:78
3003:2:204
3024:6:93
3024:6:94
3058:2:77
3058:6:93
3058:6:94
3164:6:91
3202:4:179
3240:5:173
3348:2:71
3346:7:18
3346:7:19
3413:4:182
3488:4:182
3488:4:183
3494:7:40
3494:7:41
3525:4:22
3525:4:23
3657:7:15
3657:7:16
3788:4:176
4024:5:154
4024:5:155
4108:4:4
4139:6:45
4139:6:46
4158:7:18
4158:7:19
4228:2:69
4228:2:70
4253:4:152
4287:6:117
4287:6:118
4311:2:63
4311:2:64
4337:6:155
4383:2:63
4507:2:62
4546:6:18
4566:5:87
4566:5:88
4593:7:74
4593:7:75
4668:6:167
4807:4:162
4889:2:71
4926:4:17
5038:4:54
5038:4:55
5554:7:57
5554:7:58
5741:4:132
5805:3:86
5802:5:80
5862:7:72
5862:7:73
5899:4:169
5953:7:19
6038:7:187
6029:7:25
6081:4:139
6081:4:140
6689:3:10
6866:4:89

Script done on Sat Oct  9 13:13:46 2004

Script started on Sat Oct  9 13:15:20 2004

ROOT killarney:/home/drussell 101 > camcontrol defects da0 -f phys -P

Got 144 defects:
0:13:84
5:20:3
31:13:50
43:9:70
140:10:94
222:10:66
281:17:65
282:17:65
283:17:65
284:17:65
285:17:65
286:17:65
287:17:65
288:17:65
289:17:65
290:17:65
291:17:65
292:17:65
293:17:65
294:17:65
330:12:57
331:12:57
332:12:57
333:12:57
463:11:123
477:16:71
487:19:46
504:19:40
510:19:46
586:17:16
675:0:76
703:12:31
772:9:120
774:10:105
894:10:78
941:0:35
974:3:16
1019:4:11
1029:20:21
1068:16:72
1069:16:72
1070:16:72
1071:16:72
1131:14:78
1132:14:78
1133:14:78
1134:14:78
1135:14:78	/* another good-sized defect that looks like a 'scratch' */
1136:14:78
1137:14:78
1138:14:78
1139:14:78
1140:14:78
1141:14:78
1142:14:78
1143:14:78
1144:14:78
1145:14:78
1146:14:78
1147:14:78
1148:14:78
1149:14:77
1150:14:77
1151:14:77
1152:14:77
1153:14:77
1154:14:77
1155:14:77
1156:14:77
1157:14:77
1158:14:77
1159:14:77
1160:14:77
1172:14:76
1173:14:76
1174:14:76
1175:14:76
1176:14:76
1177:14:76
1178:14:76
1179:14:76
1180:14:76
1181:14:76
1182:14:76
1183:14:76
1184:14:76
1185:14:76
1186:14:76
1187:14:76
1188:14:76
1189:14:76
1190:14:76
1191:14:76
1192:14:76
1193:14:76
1194:14:76
1720:13:72
1721:13:72
1722:13:72
1723:13:72
1724:13:72
1832:18:44
1833:18:44
1834:18:44
1835:18:44
1836:18:44
1967:16:84
1973:13:97
1997:2:74
2130:14:56
2251:8:8
2443:15:83
2444:15:83
2445:15:83
2446:8:45
2446:15:83
2466:19:85
2766:11:61
2767:11:61
2768:11:61
2769:11:61
2796:17:66
3012:20:24
3038:20:92
3426:17:64
3525:10:73
3527:17:5
3529:3:44
3656:10:15
3664:0:47
3686:19:10
3687:19:10
3688:19:10
3689:19:10
3690:19:10
3691:19:10
3692:19:10
3693:19:10
3694:19:10
3695:19:10
3696:19:10
3697:19:10
3698:19:10
3699:19:10

ROOT killarney:/home/drussell 102 > ^P^G
camcontrol defects da0 -f phys -P

Got 18 defects:
1702:10:66
1703:10:66
1704:10:66	/* This one's got grown defects that I detected using  */
1705:10:66	/* sformat before putting it into service (old spare)  */
1706:10:66	/* these defects were almost certainly caused by heads */
1707:10:66	/* hitting platters while spinning.  Same area,        */
1708:10:66	/* different platters, with the cylinders right after  */
1709:10:66	/* each other...  The drive was seeking the heads when */
1718:14:66	/* it was jarred hard enough to cause a head crash     */
1719:14:66	/* it would certainly seem..			       */
1720:14:66
1721:14:66
1722:14:66
1723:14:66
1724:14:66
1725:14:66
2990:10:77
3386:10:29

Script done on Sat Oct  9 13:15:42 2004

I keep logs of the contents of the PLIST and the GLIST before and after I
sformat it before it goes into service.  Checking this disk now (which has
AWRE and ARRE enabled) shows that the GLIST is still the same as it was at
least 25000 power-on-hours ago.  It is probably about time to take this
disk out of service for another sformat run (been running non-stop for
over 3 years) as a periodic test, but then it can go right back into
service.

Just because that drive had a couple new defects that were found in
testing, doesn't mean it was about to die.  (On the contrary; it has
performed flawlessly since going into service)...  Would it have remapped
those on the fly if it were just put into service without tests?  Perhaps,
eventually, but I'd rather do a dedicated heavy-duty test myself, first.

THIS is why sformat is your friend, my fellow -hackers.  :)

Oh, and thanks should go out to Joerg Schilling for writing such a good
utility that I've never had to go back and write my own.  (Wrote a MUCH
less advanced one back in the day for my MFM drives on the perstor to
watch for ECC errors, even correctible ones, and make a real 'map')

Long live SCSI disks!  :)

Later......						<Doug>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041009125651.H41465-100000>