Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 2 Apr 2006 09:19:09 -0700
From:      "Gayn Winters" <gayn.winters@bristolsystems.com>
To:        "'Shane Ambler'" <Shane@007Marketing.com>
Cc:        freebsd-questions@freebsd.org
Subject:   RE: Hard Disk problems
Message-ID:  <03aa01c65671$2ec95f00$6501a8c0@workdog>
In-Reply-To: <C054A3A1.3EC92%Shane@007Marketing.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> [mailto:owner-freebsd-questions@freebsd.org] On Behalf Of Shane Ambler
> Sent: Saturday, April 01, 2006 3:10 AM
> To: FreeBSD Mailing Lists
> Subject: Hard Disk problems
> 
> 
> A few days ago I started getting some disk errors and can't 
> seem to find a
> reference to find a way to fix them (other than the obvious re-format)
> 
> 
> The daily security run output contains the following (abbreviated)
> 
> Checking setuid files and devices:
> find: /usr/ports/databases/db43/work/db-4.3.28/db: Input/output error
> find: /usr/ports/devel/git/Makefile: Input/output error
> 
> ~ repeated 32 times for different files (thankfully all in 
> the ports tree)
> 
> tower.home.com kernel log messages:
> > ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR>
> error=40<UNCORRECTABLE> LBA=139102367
> > ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR>
> error=1<ILLEGAL_LENGTH> LBA=139102367
> 
> These 2 error codes are repeated a total of 38 times all with 
> the same LBA
> 
> If I start in single user mode and do fsck it takes about 
> half an hour to
> get through and repeats similar errors many times for just 
> about every check
> it does.
> 
> Running #fsck -y >> fsckout (while in multiuser mode) is as follows -
> followed by dmesg output since boot
> 
> > cat fsckout 
> ** /dev/ad0s1a (NO WRITE)
> ** Last Mounted on /
> ** Root file system
> ** Phase 1 - Check Blocks and Sizes
> ** Phase 2 - Check Pathnames
> ** Phase 3 - Check Connectivity
> ** Phase 4 - Check Reference Counts
> ** Phase 5 - Check Cyl groups
> 2259 files, 44188 used, 82651 free (251 frags, 10300 blocks, 0.2%
> fragmentation)
> ** /dev/ad0s1e (NO WRITE)
> ** Last Mounted on /tmp
> ** Phase 1 - Check Blocks and Sizes
> ** Phase 2 - Check Pathnames
> ** Phase 3 - Check Connectivity
> ** Phase 4 - Check Reference Counts
> ** Phase 5 - Check Cyl groups
> 591 files, 4501 used, 122338 free (242 frags, 15262 blocks, 0.2%
> fragmentation)
> ** /dev/ad0s1f (NO WRITE)
> ** Last Mounted on /usr
> ** Phase 1 - Check Blocks and Sizes
> 
> CANNOT READ BLK: 135486944
> UNEXPECTED SOFT UPDATE INCONSISTENCY
> 
> CONTINUE? yes
> 
> THE FOLLOWING DISK SECTORS COULD NOT BE READ: 135486944, 135486945,
> 135486946, 135486947, 135486948, 135486949, 135486950, 
> 135486951, 135486952,
> 135486953, 135486954, 135486955, 135486956, 135486957, 
> 135486958, 135486959,
> 135486960, 135486961, 135486962, 135486963, 135486964, 
> 135486965, 135486966,
> 135486967, 135486968, 135486969, 135486970,
> ** Phase 2 - Check Pathnames
> UNALLOCATED  I=5049385  OWNER=squid MODE=100600
> SIZE=15032 MTIME=Apr  1 21:07 2006
> FILE=/local/squid/cache/00/26/000026C2
> 
> UNEXPECTED SOFT UPDATE INCONSISTENCY
> 
> REMOVE? no
> 
> UNALLOCATED  I=5049875  OWNER=squid MODE=100600
> SIZE=10825 MTIME=Apr  1 21:07 2006
> FILE=/local/squid/cache/00/26/000026CA
> 
> UNEXPECTED SOFT UPDATE INCONSISTENCY
> 
> REMOVE? no
> 
> UNALLOCATED  I=5049896  OWNER=squid MODE=100600
> SIZE=15008 MTIME=Apr  1 21:07 2006
> FILE=/local/squid/cache/00/26/000026D1
> 
> UNEXPECTED SOFT UPDATE INCONSISTENCY
> 
> REMOVE? no
> 
> ** Phase 3 - Check Connectivity
> ** Phase 4 - Check Reference Counts
> LINK COUNT FILE I=5740857  OWNER=squid MODE=0
> SIZE=0 MTIME=Apr  1 21:09 2006  COUNT 0 SHOULD BE -1
> ADJUST? no
> 
> LINK COUNT FILE I=5792561  OWNER=squid MODE=0
> SIZE=0 MTIME=Apr  1 21:07 2006  COUNT 0 SHOULD BE -1
> ADJUST? no
> 
> LINK COUNT FILE I=5875155  OWNER=squid MODE=0
> SIZE=0 MTIME=Apr  1 21:09 2006  COUNT 0 SHOULD BE -1
> ADJUST? no
> 
> LINK COUNT FILE I=5970461  OWNER=squid MODE=0
> SIZE=0 MTIME=Apr  1 21:09 2006  COUNT 0 SHOULD BE -1
> ADJUST? no
> 
> ** Phase 5 - Check Cyl groups
> SUMMARY INFORMATION BAD
> SALVAGE? no
> 
> ALLOCATED FRAGS 1936880-1936911 MARKED FREE
> ALLOCATED FRAGS 1936976-1936983 MARKED FREE
> BLK(S) MISSING IN BIT MAPS
> SALVAGE? no
> 
> ALLOCATED FILE 5740857 MARKED FREE
> ALLOCATED FRAG 22922007 MARKED FREE
> ALLOCATED FILE 5792561 MARKED FREE
> ALLOCATED FILE 5856663 MARKED FREE
> ALLOCATED FILE 5875155 MARKED FREE
> ALLOCATED FRAG 23448111 MARKED FREE
> ALLOCATED FILE 5970461 MARKED FREE
> ALLOCATED FRAG 23889647 MARKED FREE
> ALLOCATED FILE 6077762 MARKED FREE
> ALLOCATED FRAG 24353503 MARKED FREE
> ALLOCATED FRAGS 26021808-26021813 MARKED FREE
> ALLOCATED FRAGS 26301688-26301690 MARKED FREE
> 1534559 files, 15746410 used, 21222026 free (2172530 frags, 
> 2381187 blocks,
> 5.9% fragmentation)
> ** /dev/ad0s1d (NO WRITE)
> ** Last Mounted on /var
> ** Phase 1 - Check Blocks and Sizes
> ** Phase 2 - Check Pathnames
> ** Phase 3 - Check Connectivity
> ** Phase 4 - Check Reference Counts
> UNREF FILE I=8278  OWNER=mysql MODE=100600
> SIZE=0 MTIME=Apr  1 19:13 2006
> CLEAR? no
> 
> UNREF FILE I=8301  OWNER=mysql MODE=100600
> SIZE=0 MTIME=Apr  1 19:13 2006
> CLEAR? no
> 
> UNREF FILE I=8306  OWNER=mysql MODE=100600
> SIZE=0 MTIME=Apr  1 19:13 2006
> CLEAR? no
> 
> UNREF FILE I=25696  OWNER=root MODE=140666
> SIZE=0 MTIME=Apr  1 19:13 2006
> CLEAR? no
> 
> ** Phase 5 - Check Cyl groups
> 3681 files, 59732 used, 67107 free (1275 frags, 8229 blocks, 1.0%
> fragmentation)
> 
> > cat dmesg output
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102367
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=40<UNCORRECTABLE>
> LBA=139102368
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102369
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102370
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102371
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102372
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102373
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=40<UNCORRECTABLE>
> LBA=139102374
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102375
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102376
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102377
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102378
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102379
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=40<UNCORRECTABLE>
> LBA=139102380
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102381
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102382
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=40<UNCORRECTABLE>
> LBA=139102383
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=40<UNCORRECTABLE>
> LBA=139102384
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102385
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102386
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102387
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102388
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102389
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102390
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=40<UNCORRECTABLE>
> LBA=139102391
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102392
> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> 
> error=1<ILLEGAL_LENGTH>
> LBA=139102393
> 
> 
> 
> 
> -- 
> 
> Shane Ambler

Looks to me like your disk subsystem is dying.  Most likely it is just
the disk ad0.  If you don't have a good backup, do that immediately.
Get a new disk in there and test it thoroughly (with the manufacturer's
diagnostics.)  If all is well, restore to it.  You'll probably want to
reread the section in the Handbook on Moving to a Larger Disk, since
this is a good time to rethink the sizes of your partitions.

Incidentally, you can just install the new disk (as ad1), install FBSD
on it, and dump|restore from ad0 to ad1. 

Once restored, you'll still have to clean up the damage.  This is easier
if your new new disk has a separate partition for user data, since you
can use a fresh install of the OS, the ports, etc. and worry about
repairing the user data as best you can.

Good luck!

-gayn

Bristol Systems Inc.
714/532-6776
www.bristolsystems.com 





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?03aa01c65671$2ec95f00$6501a8c0>