Date: Thu, 5 Jul 2018 19:28:55 -0600 From: Alan Somers <asomers@freebsd.org> To: "Rodney W. Grimes" <freebsd-rwg@pdx.rh.cn85.dnsmgr.net> Cc: Wojciech Puchar <wojtek@puchar.net>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, Stefan Blachmann <sblachmann@gmail.com>, Lev Serebryakov <lev@freebsd.org>, George Mitchell <george+freebsd@m5p.com> Subject: Re: Confusing smartd messages Message-ID: <CAOtMX2j91ptTZN8Z8WUofQzueAWv%2BeYVk6Yo%2B%2Bm__zyMsVZKhg@mail.gmail.com> In-Reply-To: <201807060106.w6616Bs4049980@pdx.rh.CN85.dnsmgr.net> References: <CAOtMX2ijjJ5jdSU_effzY-rF9Pyg%2Bb09dmNcOZprN=dx7Sy-ww@mail.gmail.com> <201807060106.w6616Bs4049980@pdx.rh.CN85.dnsmgr.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 5, 2018 at 7:06 PM, Rodney W. Grimes < freebsd-rwg@pdx.rh.cn85.dnsmgr.net> wrote: > [ Charset UTF-8 unsupported, converting... ] > > On Thu, Jul 5, 2018 at 12:15 PM, Rodney W. Grimes < > > freebsd-rwg@pdx.rh.cn85.dnsmgr.net> wrote: > > > > > > On Thu, Jul 5, 2018 at 11:03 AM, Wojciech Puchar <wojtek@puchar.net> > > > wrote: > > > > > > > > > > > > > >> Rewriting suspicious sectors is useless in this day and age. > HDDs and > > > > >> SSDs > > > > >> already do it internally and have for years. Even healthy > sectors get > > > > >> > > > > > > > > > > unreadable sectors cannot be rewritten by drive electronics as it > > > doesn't > > > > > know what to rewrite. it may possibly remap it but still report > read > > > error > > > > > until some data will be written - unless giving no error and > returning > > > > > meaningless data is an accepted behaviour. > > > > > > > > > > > > > But if that disk is already managed by ZFS, the pool is redundant, > and > > > the > > > > bad sector is allocated by ZFS, then ZFS will immediately rewrite the > > > > unreadable sector. > > > > > > ZFS, if it gets a re error, will rewrite the unreadable sector > > > to a DIFFERENT block, not over the top of the bad spot. > > > > > > > Are you sure? For read errors, I think ZFS rewrites the data in-place, > so > > it doesn't have to rewrite it on all other members of the same > mirror/raid > > group. For persistent write errors of course, it would have to move it > to > > a different LBA as you describe. > > Your right, I am not sure exactly what happens during a scrub that finds > a checksum error, or encounters a low level device I/O error. I was > wrongly > assuming that given the COW nature of the whole system that it would > never overwrite anything. > > I wonder if you can send ZFS into a loop with a hard write failing sector. > Not if you have zfsd enabled. zfsd will fault the device after too many errors. And even without zfsd, I think zfs must give up on that sector after awhile, but I'm not positive. If a single bad sector could cause an endless resilver loop, I think I would've seen it by now. > > > > > > > > > > > only on write it can be done properly. > > > > > > > > > > that the HDD/SSD won't fix itself would be a checksum error. > Those are > > > > >> > > > > > > > > > > yes and this will happen if you powerdown your disk on write. or > get > > > some > > > > > power spike or other source of noise that would affect electronic > > > > > components. > > > > > > > > > > > > > It happens surprisingly rarely. Even on a sudden power loss, the > drive > > > is > > > > usually able to finish its current write operation. When you run > into > > > > problems would be if the power loss were coincident with a mechanical > > > shock > > > > that knocks the head off-track, or something like that. > > > > > > I agree that "power failure" are rare causes of write errors, and an > > > idea of how often this might of happened is look at the emergency > > > retract counter, if your gettng lots of those you should try to find > > > out why and stop that. Vibration has become a serious problem though, > > > at todays head flight hight drives are sensitive to this, you can > > > even cause a drive to do retires by yelling at it with a loud > > > voice :-) Look at the "high fly" counter to see if your getting > > > this issue. > > > > > > > > performing full disk rewrite (so not zfs rebuilds) and THEN > looking at > > > > > smart stats and THEN performing regular smartctl -t long will tell > the > > > > > truth. > > > > > > > > > > which usually is "drive is fine" in my practice. really faulty > drive > > > will > > > > > QUICKLY develop new problems. > > > > > > > > > > > > > Yeah, that should make the error go away. It takes a long time, > though. > > > > With a SCSI drive, you can get the exact LBAs affected with a "READ > > > > DEFECTS" command. But there isn't a vendor-independent equivalent > for > > > > SATA, unfortunately. > > > > > > My bitch exactly about ATA missing this. Though there are vendor > specific > > > commands to get it. > > > > > > -- > > > Rod Grimes > > > rgrimes@freebsd.org > > > > > -- > Rod Grimes > rgrimes@freebsd.org >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2j91ptTZN8Z8WUofQzueAWv%2BeYVk6Yo%2B%2Bm__zyMsVZKhg>