Date: Sat, 9 Oct 2004 13:50:07 -0600 (MDT) From: Doug Russell <drussell@saturn-tech.com> To: John Von Essen <john@essenz.com> Cc: freebsd-hackers@freebsd.org Subject: Re: hacking SCO.... Message-ID: <20041009125651.H41465-100000@mxb.saturn-tech.com> In-Reply-To: <0D27BFB8-1A20-11D9-883D-0003933DDCFA@essenz.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 9 Oct 2004, John Von Essen wrote: > The SCSI card is an old Adaptec, AIC-7880 and I believe it does not > support automatic bad block detection/redirection. If it has a BIOS it should have the verify tool in there... All the verify tool does, though, is issue a verify command to each sector. You can do this yourself, even on a running system, also. > This disk came from a spares kits, so even though it is "new" and never > used, it is still 5-6 years old. There were 6 bad blocks, once they > were put in the bad block table, everything was fine. Exactly. Most of my SCSI disks are old spares, or old surplus disks. They've likely been moved several times, bumped around, and time itself can make a marginal section of the magnetic coating stop holding data perfectly. > Is sformat the freebsd equivalent of the badtrk utility. I have always No, I don't believe so. I think badtrk is probably like the old bad144 system that was abandoned because it is unnecessary on all modern disks. All modern IDE disks have built-in re-allocation tables, similar to the way SCSI disks work, but you can't manipulate them manually as easily as you can on a SCSI disk. sformat is a handy formatting utility written by Joerg Schilling. It has special options for partitioning disks on SUN machines, but the format routines and defect scans will run on virtually any platform. It has full patern testing abilities. First it writes and verifies every byte on the drive as all 00000000's then all 11111111's... then 01010101 and 10101010, etc. etc... stressing the media in every possible combination. (Many, many, errors won't show up if you just write all zeros or ones, for example. It is much harder to store a zero next to a one, so trying every combination pre-tests anything that might be written.) This kind of testing is done at the factory, and limited pattern testing is generally done by the built-in low-level format routine on most drives. > used Ultra2 LVD SCSI and higher on FreeBSD and have never had this > issue of bad blocks. Is that because those newer SCSI disks and > controllers have better ECC handling and take care of the bad blocks > internally without notifying the user? If you play with the SCSI modepages, you can tell the drive to post an error or not under various conditions (a correctible ECC error, an uncorrectible error, a re-allocated block, etc.) You've probably never seen errors before because the drives were set with Automatic (Write / Read) Reallocation Enable (AWRE and ARRE) set in modepage 1. This disk you're working with now obviously has ARRE and probably AWRE disabled, so it isn't trying to re-allocate the blocks when it finds an error. I'd check that and turn it on. Then, the next time you try to read the bad block, the drive should remap it on its own. The exact behavior varies by drive and by settings in the modepages. Some drives may have AWRE and ARRE enabled, but not re-allocate a certain block because they can't get the data off the sector in the retry time allowed. Cranking up the retry timer (when available) might work, or else you have to do it manually by sending the re-allocate block command. I sure love SCSI disks... They let you do virtually ANYTHING to them if you know the technical details of how to send the commands. (The technical manuals for the disks are very handy in this regard.) On FreeBSD, you can see what was in the defect table at the factory by doing a camcontrol defects daX -f phys -P (I like phys, as it shows the actual physical head and cylinder number with a byte offset -- you can actually 'see' the areas that are defective). You can see the GROWN defect list (rather than the primary) with -G. VERY often the grown defects are simply the next sector or two to the sides of an existing defect. If a series of defects span several cylinders on the same head at about the same offset, it's probably a media defect 'scratch' across the disk. If it is on the same cylinder on several heads at about the same place, it is probably from a mini-head-crash due to the disk getting bumped, etc, where it actually damaged spots on several disks at once when the heads touched (or almost touched). etc. etc. Interestingly enough, the way sformat sends the block format commands, some disks add the new defects found to their PRIMARY defect list, instead of the GROWN list, as if they had been re-tested at the factory. There is a command to clear the GROWN list, but not the PRIMARY. (Some cheesy drives re-do their primary table when you send them the single low-level-format command, but most just add to the table. If you ever have LESS primary defects after sending a LLF command, it would be a VERY good idea to use sformat to better pattern test the drive before service) Here are some examples from a couple of disks here: Script started on Sat Oct 9 13:13:01 2004 ROOT mxb:/home/drussell 101 > camcontrol defects da0 -f phys -P /* This is the PRIMARY (factory) defect table) /* Got 277 defects: 3:0:105 4:0:105 5:0:105 6:0:105 7:0:105 8:0:105 9:0:105 10:0:105 /* See, must be a 'scratch across the first 60 tracks */ 11:0:105 /* on head 0. */ 12:0:105 13:0:105 14:0:105 14:1:10 15:0:105 16:0:105 17:0:105 18:0:105 19:0:105 20:0:105 21:0:105 21:19:34 /* One other defect on head 19 (this disk has 11 media */ 22:0:105 /* platters with 21 heads (one unused surface) */ 23:0:105 /* the defect is on 19, the second-last head */ 24:0:105 25:0:105 26:0:105 27:0:105 28:0:105 29:0:105 30:0:105 31:0:105 41:0:105 42:0:105 43:0:105 44:0:105 45:0:105 46:0:105 47:0:105 48:0:105 49:0:105 50:0:105 51:0:105 52:0:105 53:0:105 54:0:105 55:0:105 56:0:105 57:0:105 58:0:105 59:0:105 60:0:105 114:1:91 135:4:89 227:14:63 252:11:97 299:2:74 343:5:56 396:17:119 449:20:115 463:2:110 522:5:45 523:8:45 531:15:103 569:15:15 598:17:42 685:7:103 687:15:57 692:17:35 737:16:9 801:3:79 863:20:29 902:8:112 919:15:18 974:20:29 975:20:29 976:20:29 977:20:29 1054:2:38 1092:13:70 1116:8:78 1134:17:75 1236:5:28 1276:17:40 1343:16:46 1351:13:60 1352:13:60 1353:13:60 1354:13:60 1355:13:60 1356:13:60 1357:13:60 1358:13:60 1359:13:60 1360:13:60 1361:13:60 1362:13:60 1363:13:60 1364:13:60 1404:18:6 1440:20:103 1596:5:96 1598:5:96 1634:12:64 1698:19:54 1699:19:54 1700:19:54 1701:19:54 1702:19:54 1703:19:54 1704:19:54 1705:19:54 1706:19:54 1707:19:54 1708:19:54 1709:19:54 1710:19:54 1711:19:54 1712:19:54 1713:19:54 1714:19:54 1715:19:54 1716:19:54 1717:19:54 1746:20:100 1791:19:13 1829:13:94 1835:14:63 1836:14:63 1852:13:92 1853:13:92 1854:13:92 1855:13:92 1856:13:92 1875:15:80 1935:15:89 2008:14:52 2009:14:52 2010:14:52 2010:14:83 2011:14:52 2012:14:52 2013:14:52 2014:14:52 2015:14:52 2016:14:52 2017:14:52 2018:14:52 2019:14:52 2020:14:52 2021:14:52 2022:14:52 2028:7:46 2028:20:27 2038:18:53 2151:5:66 2153:17:65 2198:6:97 2278:15:10 2279:15:10 2280:15:10 2281:15:10 2447:16:7 2521:1:68 2612:20:6 2625:15:56 2767:8:69 2828:7:81 2865:13:59 2866:13:59 2867:13:59 2868:13:59 2909:13:18 2958:3:10 3001:13:58 3002:13:58 3003:13:58 3004:13:58 3005:13:58 3006:13:58 3007:13:58 3008:13:58 3009:5:49 3009:13:58 3010:13:58 3011:13:58 3012:13:58 3013:13:58 3014:13:58 3015:13:58 3016:13:58 3017:13:58 3018:13:58 3019:13:58 3020:13:58 3021:13:58 3022:13:58 3023:13:58 3024:13:58 3025:13:58 3026:13:58 3027:13:58 3028:13:58 3078:13:58 3079:13:58 3080:13:58 3081:13:58 3082:13:58 3083:13:58 3084:13:58 3085:13:58 3086:13:58 3087:13:57 3088:13:57 3089:13:57 3090:13:57 3091:13:57 3213:2:54 3255:17:42 3256:17:42 3257:17:42 3258:17:42 3259:17:42 3260:17:42 3261:17:42 3262:17:42 3263:17:42 3264:17:42 3265:17:42 3266:17:42 3267:17:42 3268:17:42 3269:17:42 3270:17:42 3271:17:42 3272:17:42 3273:17:42 3290:2:39 3331:20:38 3332:20:38 3333:20:38 3334:20:37 3335:20:37 3336:20:37 3337:20:37 3338:20:37 3339:20:37 3340:20:37 3341:20:37 3342:20:37 3343:20:37 3344:20:37 3650:20:28 3651:20:28 3652:20:28 3653:20:28 3654:20:28 3655:20:28 3656:20:28 3657:20:28 3658:20:28 3658:20:29 3659:20:28 3660:20:28 3661:20:28 3662:20:28 3663:20:28 3664:20:28 3665:20:28 3666:20:28 3676:15:14 3690:14:31 3702:20:41 3703:20:41 3704:20:41 3705:20:41 3710:3:31 3711:3:31 ROOT mxb:/home/drussell 102 > ^P^G camcontrol defects da0 -f phys -G Got 0 defects. /* no new defects detected since last 'factory' format */ ROOT mxb:/home/drussell 103 > ^0^1 camcontrol defects da0 -f phys -G Got 0 defects. /* no grown defects on da1, either */ ROOT mxb:/home/drussell 104 > ^G^P camcontrol defects da0 -f phys -P /* this disk has 4 platters, and all 8 surfaces have r/w heads */ Got 156 defects: 86:7:84 86:7:85 151:5:224 150:5:224 149:5:224 327:5:258 327:5:259 355:3:189 355:3:190 395:3:244 395:3:245 394:3:244 394:3:245 609:4:195 609:4:196 656:7:17 687:3:126 687:3:127 687:3:128 687:3:129 687:3:130 711:6:84 711:6:85 827:7:12 827:7:13 933:3:248 933:3:249 1053:4:186 1053:4:187 1058:7:11 1058:7:12 1058:7:13 1086:4:59 1086:4:60 1267:4:184 1315:5:133 1315:5:134 1513:4:183 1513:4:184 1580:4:198 1583:4:185 1592:2:197 1592:2:198 1593:3:72 1749:6:101 1787:5:72 1790:6:99 1909:6:99 2008:2:140 2043:5:96 2043:5:97 2078:6:99 2184:2:80 2259:5:114 2276:6:96 2276:6:97 2354:6:96 2354:6:97 2356:6:96 2356:6:97 2390:6:96 2390:6:97 2402:6:96 2402:6:97 2425:6:96 2425:6:97 2428:6:96 2428:6:97 2449:3:224 2449:3:225 2528:4:174 2528:4:175 2528:4:176 2528:4:177 2536:6:96 2536:6:97 2565:3:42 2565:3:43 2626:4:213 2640:4:179 2716:6:93 2716:6:94 2813:6:93 2813:6:94 2831:4:159 2909:4:186 2909:4:187 2984:6:93 2984:6:94 3000:2:77 3000:2:78 3003:2:204 3024:6:93 3024:6:94 3058:2:77 3058:6:93 3058:6:94 3164:6:91 3202:4:179 3240:5:173 3348:2:71 3346:7:18 3346:7:19 3413:4:182 3488:4:182 3488:4:183 3494:7:40 3494:7:41 3525:4:22 3525:4:23 3657:7:15 3657:7:16 3788:4:176 4024:5:154 4024:5:155 4108:4:4 4139:6:45 4139:6:46 4158:7:18 4158:7:19 4228:2:69 4228:2:70 4253:4:152 4287:6:117 4287:6:118 4311:2:63 4311:2:64 4337:6:155 4383:2:63 4507:2:62 4546:6:18 4566:5:87 4566:5:88 4593:7:74 4593:7:75 4668:6:167 4807:4:162 4889:2:71 4926:4:17 5038:4:54 5038:4:55 5554:7:57 5554:7:58 5741:4:132 5805:3:86 5802:5:80 5862:7:72 5862:7:73 5899:4:169 5953:7:19 6038:7:187 6029:7:25 6081:4:139 6081:4:140 6689:3:10 6866:4:89 Script done on Sat Oct 9 13:13:46 2004 Script started on Sat Oct 9 13:15:20 2004 ROOT killarney:/home/drussell 101 > camcontrol defects da0 -f phys -P Got 144 defects: 0:13:84 5:20:3 31:13:50 43:9:70 140:10:94 222:10:66 281:17:65 282:17:65 283:17:65 284:17:65 285:17:65 286:17:65 287:17:65 288:17:65 289:17:65 290:17:65 291:17:65 292:17:65 293:17:65 294:17:65 330:12:57 331:12:57 332:12:57 333:12:57 463:11:123 477:16:71 487:19:46 504:19:40 510:19:46 586:17:16 675:0:76 703:12:31 772:9:120 774:10:105 894:10:78 941:0:35 974:3:16 1019:4:11 1029:20:21 1068:16:72 1069:16:72 1070:16:72 1071:16:72 1131:14:78 1132:14:78 1133:14:78 1134:14:78 1135:14:78 /* another good-sized defect that looks like a 'scratch' */ 1136:14:78 1137:14:78 1138:14:78 1139:14:78 1140:14:78 1141:14:78 1142:14:78 1143:14:78 1144:14:78 1145:14:78 1146:14:78 1147:14:78 1148:14:78 1149:14:77 1150:14:77 1151:14:77 1152:14:77 1153:14:77 1154:14:77 1155:14:77 1156:14:77 1157:14:77 1158:14:77 1159:14:77 1160:14:77 1172:14:76 1173:14:76 1174:14:76 1175:14:76 1176:14:76 1177:14:76 1178:14:76 1179:14:76 1180:14:76 1181:14:76 1182:14:76 1183:14:76 1184:14:76 1185:14:76 1186:14:76 1187:14:76 1188:14:76 1189:14:76 1190:14:76 1191:14:76 1192:14:76 1193:14:76 1194:14:76 1720:13:72 1721:13:72 1722:13:72 1723:13:72 1724:13:72 1832:18:44 1833:18:44 1834:18:44 1835:18:44 1836:18:44 1967:16:84 1973:13:97 1997:2:74 2130:14:56 2251:8:8 2443:15:83 2444:15:83 2445:15:83 2446:8:45 2446:15:83 2466:19:85 2766:11:61 2767:11:61 2768:11:61 2769:11:61 2796:17:66 3012:20:24 3038:20:92 3426:17:64 3525:10:73 3527:17:5 3529:3:44 3656:10:15 3664:0:47 3686:19:10 3687:19:10 3688:19:10 3689:19:10 3690:19:10 3691:19:10 3692:19:10 3693:19:10 3694:19:10 3695:19:10 3696:19:10 3697:19:10 3698:19:10 3699:19:10 ROOT killarney:/home/drussell 102 > ^P^G camcontrol defects da0 -f phys -P Got 18 defects: 1702:10:66 1703:10:66 1704:10:66 /* This one's got grown defects that I detected using */ 1705:10:66 /* sformat before putting it into service (old spare) */ 1706:10:66 /* these defects were almost certainly caused by heads */ 1707:10:66 /* hitting platters while spinning. Same area, */ 1708:10:66 /* different platters, with the cylinders right after */ 1709:10:66 /* each other... The drive was seeking the heads when */ 1718:14:66 /* it was jarred hard enough to cause a head crash */ 1719:14:66 /* it would certainly seem.. */ 1720:14:66 1721:14:66 1722:14:66 1723:14:66 1724:14:66 1725:14:66 2990:10:77 3386:10:29 Script done on Sat Oct 9 13:15:42 2004 I keep logs of the contents of the PLIST and the GLIST before and after I sformat it before it goes into service. Checking this disk now (which has AWRE and ARRE enabled) shows that the GLIST is still the same as it was at least 25000 power-on-hours ago. It is probably about time to take this disk out of service for another sformat run (been running non-stop for over 3 years) as a periodic test, but then it can go right back into service. Just because that drive had a couple new defects that were found in testing, doesn't mean it was about to die. (On the contrary; it has performed flawlessly since going into service)... Would it have remapped those on the fly if it were just put into service without tests? Perhaps, eventually, but I'd rather do a dedicated heavy-duty test myself, first. THIS is why sformat is your friend, my fellow -hackers. :) Oh, and thanks should go out to Joerg Schilling for writing such a good utility that I've never had to go back and write my own. (Wrote a MUCH less advanced one back in the day for my MFM drives on the perstor to watch for ECC errors, even correctible ones, and make a real 'map') Long live SCSI disks! :) Later...... <Doug>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041009125651.H41465-100000>