From owner-freebsd-hackers@freebsd.org  Fri Jul  6 01:28:58 2018
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1CEC91025A98
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Fri,  6 Jul 2018 01:28:58 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: from mail-lj1-x243.google.com (mail-lj1-x243.google.com
 [IPv6:2a00:1450:4864:20::243])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 71A9C813C7;
 Fri,  6 Jul 2018 01:28:57 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: by mail-lj1-x243.google.com with SMTP id u7-v6so5343771lji.3;
 Thu, 05 Jul 2018 18:28:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:sender:in-reply-to:references:from:date:message-id
 :subject:to:cc;
 bh=V2dVobBq7YLhACj1Xbh6idiEc+CPaexgGE0d2+hkYww=;
 b=HcaGZ111kZXNkWwBrdy8z0CAY6yyM5XX/aPaaxfnSR5ABqUy0bT5WueD4aGKkVPBKk
 F+YgH4U7K2q/+nGA2K7bfrXgFn6ioj2w207CsVfqBwnFIDOJorNQ9FMPW26FMftP3L2C
 T4sH3LD8BV3X7In884a/YbP3itCwYFO7kAsZA6690HsK+FKWLUPp/ld0h2R52bK5HuZc
 BQ5p1EoT5JIfRhoivIZ7k/eroX8u/Zbdlf/txI9zfK5D8qXmIB/GkXPd6Q9RvWV3EekO
 UPwG+7n5M4SGOmnEUuoXPSgCpFUzpy3ZwTa71zDaf/hLGKLo1SPcKBs5e9dbEGXCdTy2
 ZP6A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:sender:in-reply-to:references:from
 :date:message-id:subject:to:cc;
 bh=V2dVobBq7YLhACj1Xbh6idiEc+CPaexgGE0d2+hkYww=;
 b=gOuoG9v35rZEGjgaVhwBZ7Y6YGSGzxgSiG0OKG2Tvx3/asLo3WGL4bRANmQXOG6x0+
 36mxN/QXhRR9bzo02sWzKw9vfL1RtIapJSNOW696XjBUy5r23+zxM3RYFmsuXsHw63B6
 NMgec2cuaTPXd76a0jHxcGjCBuxu2v1/0pTKECvGRc545RKLj+TmCb1flYc/5gSQR0Kl
 i1frFusZbW0wZf8IZK4TFu3lzam3KqwgvBiFikVWXcX61p5eEAy+CTwWhnlqxpmhI0uS
 n50pZ7Bd+Cx5OaKzuZNjR/JHtkQxYxwPCHzyh6DXSeWD/yK9pPMRr1Pwh/lJXk+A4hk7
 BpvA==
X-Gm-Message-State: APt69E21DPTmuTLF636FlyL0lxspxa+/JAYE7fTMggYMd6Jqapuif4p7
 qFjoeFfWsPf1pyxMlNLaZ9+jCX1sH/ZhunX0dp4=
X-Google-Smtp-Source: AAOMgpfm3aPZfDzIL8+RHlV/eG0LQoTBLaypjeVoP31SbNphiSVjW2F5y04x0pyqbL54dltGV1/61VtO1rTrukdlnUk=
X-Received: by 2002:a2e:44c6:: with SMTP id
 b67-v6mr5555312ljf.102.1530840536008; 
 Thu, 05 Jul 2018 18:28:56 -0700 (PDT)
MIME-Version: 1.0
Sender: asomers@gmail.com
Received: by 2002:ab3:1b91:0:0:0:0:0 with HTTP;
 Thu, 5 Jul 2018 18:28:55 -0700 (PDT)
In-Reply-To: <201807060106.w6616Bs4049980@pdx.rh.CN85.dnsmgr.net>
References: <CAOtMX2ijjJ5jdSU_effzY-rF9Pyg+b09dmNcOZprN=dx7Sy-ww@mail.gmail.com>
 <201807060106.w6616Bs4049980@pdx.rh.CN85.dnsmgr.net>
From: Alan Somers <asomers@freebsd.org>
Date: Thu, 5 Jul 2018 19:28:55 -0600
X-Google-Sender-Auth: kMXWvTdbxbwS0-cs0H4q_vVtisM
Message-ID: <CAOtMX2j91ptTZN8Z8WUofQzueAWv+eYVk6Yo++m__zyMsVZKhg@mail.gmail.com>
Subject: Re: Confusing smartd messages
To: "Rodney W. Grimes" <freebsd-rwg@pdx.rh.cn85.dnsmgr.net>
Cc: Wojciech Puchar <wojtek@puchar.net>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>, 
 Stefan Blachmann <sblachmann@gmail.com>, Lev Serebryakov <lev@freebsd.org>, 
 George Mitchell <george+freebsd@m5p.com>
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.27
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Jul 2018 01:28:58 -0000

On Thu, Jul 5, 2018 at 7:06 PM, Rodney W. Grimes <
freebsd-rwg@pdx.rh.cn85.dnsmgr.net> wrote:

> [ Charset UTF-8 unsupported, converting... ]
> > On Thu, Jul 5, 2018 at 12:15 PM, Rodney W. Grimes <
> > freebsd-rwg@pdx.rh.cn85.dnsmgr.net> wrote:
> >
> > > > On Thu, Jul 5, 2018 at 11:03 AM, Wojciech Puchar <wojtek@puchar.net>
> > > wrote:
> > > >
> > > > >
> > > > >> Rewriting suspicious sectors is useless in this day and age.
> HDDs and
> > > > >> SSDs
> > > > >> already do it internally and have for years.  Even healthy
> sectors get
> > > > >>
> > > > >
> > > > > unreadable sectors cannot be rewritten by drive electronics as it
> > > doesn't
> > > > > know what to rewrite. it may possibly remap it but still report
> read
> > > error
> > > > > until some data will be written - unless giving no error and
> returning
> > > > > meaningless data is an accepted behaviour.
> > > > >
> > > >
> > > > But if that disk is already managed by ZFS, the pool is redundant,
> and
> > > the
> > > > bad sector is allocated by ZFS, then ZFS will immediately rewrite the
> > > > unreadable sector.
> > >
> > > ZFS, if it gets a re error, will rewrite the unreadable sector
> > > to a DIFFERENT block, not over the top of the bad spot.
> > >
> >
> > Are you sure?  For read errors, I think ZFS rewrites the data in-place,
> so
> > it doesn't have to rewrite it on all other members of the same
> mirror/raid
> > group.  For persistent write errors of course, it would have to move it
> to
> > a different LBA as you describe.
>
> Your right, I am not sure exactly what happens during a scrub that finds
> a checksum error, or encounters a low level device I/O error.  I was
> wrongly
> assuming that given the COW nature of the whole system that it would
> never overwrite anything.
>
> I wonder if you can send ZFS into a loop with a hard write failing sector.
>

Not if you have zfsd enabled.  zfsd will fault the device after too many
errors.  And even without zfsd, I think zfs must give up on that sector
after awhile, but I'm not positive.  If a single bad sector could cause an
endless resilver loop, I think I would've seen it by now.


>
> >
> > >
> > > > > only on write it can be done properly.
> > > > >
> > > > > that the HDD/SSD won't fix itself would be a checksum error.
> Those are
> > > > >>
> > > > >
> > > > > yes and this will happen if you powerdown your disk on write. or
> get
> > > some
> > > > > power spike or other source of noise that would affect electronic
> > > > > components.
> > > > >
> > > >
> > > > It happens surprisingly rarely.  Even on a sudden power loss, the
> drive
> > > is
> > > > usually able to finish its current write operation.  When you run
> into
> > > > problems would be if the power loss were coincident with a mechanical
> > > shock
> > > > that knocks the head off-track, or something like that.
> > >
> > > I agree that "power failure" are rare causes of write errors, and an
> > > idea of how often this might of happened is look at the emergency
> > > retract counter, if your gettng lots of those you should try to find
> > > out why and stop that.   Vibration has become a serious problem though,
> > > at todays head flight hight drives are sensitive to this, you can
> > > even cause a drive to do retires by yelling at it with a loud
> > > voice :-)   Look at the "high fly" counter to see if your getting
> > > this issue.
> > >
> > > > > performing full disk rewrite (so not zfs rebuilds) and THEN
> looking at
> > > > > smart stats and THEN performing regular smartctl -t long will tell
> the
> > > > > truth.
> > > > >
> > > > > which usually is "drive is fine" in my practice. really faulty
> drive
> > > will
> > > > > QUICKLY develop new problems.
> > > > >
> > > >
> > > > Yeah, that should make the error go away.  It takes a long time,
> though.
> > > > With a SCSI drive, you can get the exact LBAs affected with a "READ
> > > > DEFECTS" command.  But there isn't a vendor-independent equivalent
> for
> > > > SATA, unfortunately.
> > >
> > > My bitch exactly about ATA missing this.  Though there are vendor
> specific
> > > commands to get it.
> > >
> > > --
> > > Rod Grimes
> > > rgrimes@freebsd.org
> > >
>
> --
> Rod Grimes
> rgrimes@freebsd.org
>