Date: Fri, 18 Aug 2006 21:52:02 -0500 From: Brooks Davis <brooks@one-eyed-alien.net> To: Antony Mawer <fbsd-stable@mawer.org> Cc: Kirk Strauser <kirk@daycos.com>, freebsd-stable@freebsd.org Subject: Re: The need for initialising disks before use? Message-ID: <20060819025202.GA11181@lor.one-eyed-alien.net> In-Reply-To: <44E65027.6060605@mawer.org> References: <44E47092.7050104@mawer.org> <200608180919.04651.kirk@daycos.com> <20060818142925.GA2463@lor.one-eyed-alien.net> <44E65027.6060605@mawer.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--+HP7ph2BbKc20aGI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Aug 18, 2006 at 01:41:27PM -1000, Antony Mawer wrote: > On 18/08/2006 4:29 AM, Brooks Davis wrote: > >On Fri, Aug 18, 2006 at 09:19:04AM -0500, Kirk Strauser wrote: > >>On Thursday 17 August 2006 8:35 am, Antony Mawer wrote: > >> > >>>A quick question - is it recommended to initialise disks before using > >>>them to allow the disks to map out any "bad spots" early on? > >>Note: if you once you actually start seeing bad sectors, the drive is= =20 > >>almost dead. A drive can remap a pretty large number internally, but= =20 > >>once that pool is exhausted (and the number of errors is still growing= =20 > >>exponentially), there's not a lot of life left. > > > >There are some exceptions to this. The drive can not remap a sector > >which failes to read. You must perform a write to cause the remap to > >occur. If you get a hard write failure it's gameover, but read failures > >aren't necessicary a sign the disk is hopeless. For example, the drive > >I've had in my laptop for most of the last year developed a three sector= [0] > >error within a week or so of arrival. After dd'ing zeros over the > >problem sectors the problem sectors I've had no problems. >=20 > This is what prompted it -- I've been seeing lots of drives that are=20 > showing up with huge numbers of read errors - for instance: >=20 > >Aug 19 04:02:27 server kernel: ad0: FAILURE - READ_DMA=20 > >status=3D51<READY,DSC,ERROR> error=3D40<UNCORRECTABLE> LBA=3D66293984 > >Aug 19 04:02:27 server kernel:=20 > >g_vfs_done():ad0s1f[READ(offset=3D30796791808, length=3D16384)]error =3D= 5 > >Aug 19 04:02:31 server kernel: ad0: FAILURE - READ_DMA=20 > >status=3D51<READY,DSC,ERROR> error=3D40<UNCORRECTABLE> LBA=3D47702304 > >Aug 19 04:02:31 server kernel:=20 > >g_vfs_done():ad0s1f[READ(offset=3D21277851648, length=3D16384)]error =3D= 5 > >Aug 19 04:02:36 server kernel: ad0: FAILURE - READ_DMA=20 > >status=3D51<READY,DSC,ERROR> error=3D40<UNCORRECTABLE> LBA=3D34943296 > >Aug 19 04:02:36 server kernel:=20 > >g_vfs_done():ad0s1f[READ(offset=3D14745239552, length=3D16384)]error =3D= 5 > >Aug 19 04:03:08 server kernel: ad0: FAILURE - READ_DMA=20 > >status=3D51<READY,DSC,ERROR> error=3D40<UNCORRECTABLE> LBA=3D45514848 > >Aug 19 04:03:08 server kernel:=20 > >g_vfs_done():ad0s1f[READ(offset=3D20157874176, length=3D16384)]error =3D= 5 >=20 > I have /var/log/messages flooded with incidents of these "FAILURE -=20 > READ_DMA" messages. I've seen it on more than one machine with=20 > relatively "young" drives. >=20 > I'm trying to determining of running a dd if=3D/dev/zero over the whole= =20 > drive prior to use will help reduce the incidence of this, or if it is=20 > likely that these are developing after the initial install, in which=20 > case this will make negligible difference... I really don't know. The only way I can think of to find out is to own a large number of machine and perform an experiment. We (the general computing public) don't have the kind of models needed to really say anything definitive. Drive are too darn opaque. > Once I do start seeing these, is there an easy way to: >=20 > a) determine what file/directory entry might be affected? Not easily, but this question has been asked and answered on the mailing lists recently (I don't remember the answer, but I think there were some ports that can help). > b) dd if=3D/dev/zero over the affected sectors only, in order to > trigger a sector remapping without nuking the whole drive You can use src/tools/tools/recover disk to refresh all of the disk except the parts that don't work and then use dd and the console error output to do the rest. > c) depending on where that sector is allocated, I presume I'm > either going to end up with: > i) zero'd bytes within a file (how can I tell which?!) > ii) a destroyed inode > iii) ??? Presumably it will be one of i, ii or a mangled superblock. I don't know how you'd tell which off the top of my head. This is one of the reasons I think Sun is on the right track with zfs's checksum everything approach. At least that way you actually know when something goes wrong. > Any thoughts/comments/etc appreciated... >=20 > How do other operating systems handle this - Windows, Linux, Solaris,=20 > MacOSX ...? I would have hoped this would be a condition the OS would=20 > make some attempt to trigger a sector remap... or are OSes typically=20 > ignorant of such things? The OS is generally unaware of such events except to the extent that=20 they know a fatal read error occurred or that they read the SMART data =66rom the drive in the case of write failures. -- Brooks --+HP7ph2BbKc20aGI Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFE5nzRXY6L6fI4GtQRAllDAKCfPYCVDPkoGz/l4NVQKxnhnfIGlQCgr3Hm Py1uqPAS552Gj5nA5WKlq2Y= =wM1Q -----END PGP SIGNATURE----- --+HP7ph2BbKc20aGI--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060819025202.GA11181>