From owner-freebsd-stable@freebsd.org  Mon Apr 29 14:26:43 2019
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 00A851590DAF
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Mon, 29 Apr 2019 14:26:43 +0000 (UTC)
 (envelope-from wlosh@bsdimp.com)
Received: from mail-qt1-x829.google.com (mail-qt1-x829.google.com
 [IPv6:2607:f8b0:4864:20::829])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2BFD884932
 for <freebsd-stable@freebsd.org>; Mon, 29 Apr 2019 14:26:42 +0000 (UTC)
 (envelope-from wlosh@bsdimp.com)
Received: by mail-qt1-x829.google.com with SMTP id i31so3341229qti.13
 for <freebsd-stable@freebsd.org>; Mon, 29 Apr 2019 07:26:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=bsdimp-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=XpmADt+7fKLz9rWWj/FSMZ2PqEOB9SXz1yz4WIk59Lw=;
 b=2THWgFrHuqAZBmw3kXuHXIrvovLC0T9kpr+apiqn+Qn+UZKs7IiJfUlYhB899B3STz
 ix8JECw3zwTSiyOJlNQUp+A/Rfax3t0v4aCpJ+W7RzueoKQKAT1MONk7id/xjQEixhtq
 BVvRm41/Mct93ko5IT8rn5bMiCHTFagjUV1u9SmTw9G0/wbMKe0y9HV3ehFlh271BE+Z
 yPc/en3glM+L0Oo479CZKQlGf7pXopTnkdmYJ1xk0vrN03C/VDKZCsdZwqivSnttt0v8
 OMZn181pcegDmzBv5Br452GgLf7aGsUL9zx0qBg3Ms/HSC38NwboQNMk8qdt4/8IlOf1
 Wfuw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=XpmADt+7fKLz9rWWj/FSMZ2PqEOB9SXz1yz4WIk59Lw=;
 b=AJWPuBqFOymay3N2Yqt09Guzjqhb6s0pxkDfADqXVd7cc/BoXgjsycBl/d7ZDmJ/du
 FprUnBHLhTBltNxUWLY7yE3q7Xhmkdhs3aK13ZVhsuHU0mvFsUwgvZ5qa8HUZX6t/PJr
 adSM8WgmsVgvR7c/GF/5qy1qulnJI5/fKEhZ5ALnPIQtcz8X2e1pjn1VotUZ2B5ZPDFa
 z0LCxTU5WJxThKEV2NiHY2Q/uib0NXT3inerk83whUfm2diY8SkyQ11lPeh3KbeUnPxP
 ip0qoFOCNckg8RKC3Ai4mjdcZgshMtfAKmQn6pxBB3IDYi7cHBNSBSZTKYiUWiYLcP4h
 Q+NQ==
X-Gm-Message-State: APjAAAUB9ix56Wx/8La7Vri1BErdUXF+4/0tQa5OlorOTrBhhrBJxdU9
 M25upHId0nhqPalzL//F5OfvbPSBtLRz6jjMEbURUwgV
X-Google-Smtp-Source: APXvYqxwd3uJCuN9fVgY+22EcmEYaK+ULCg77YHFL4FaXtYDO+YDgQL6J5bnD6x5S8bUa/3df1I98FvUFPajujo6J3g=
X-Received: by 2002:aed:3e93:: with SMTP id n19mr26411080qtf.345.1556548001506; 
 Mon, 29 Apr 2019 07:26:41 -0700 (PDT)
MIME-Version: 1.0
References: <f87f32f2-b8c5-75d3-4105-856d9f4752ef@denninger.net>
 <c96e31ad-6731-332e-5d2d-7be4889716e1@FreeBSD.org>
 <9a96b1b5-9337-fcae-1a2a-69d7bb24a5b3@denninger.net>
 <CACpH0MdLNQ_dqH+to=amJbUuWprx3LYrOLO0rQi7eKw-ZcqWJw@mail.gmail.com>
 <1866e238-e2a1-ef4e-bee5-5a2f14e35b22@denninger.net>
 <3d2ad225-b223-e9db-cce8-8250571b92c9@FreeBSD.org>
 <2bc8a172-6168-5ba9-056c-80455eabc82b@denninger.net>
 <CACpH0MfmPzEO5BO2kFk8-F1hP9TsXEiXbfa1qxcvB8YkvAjWWw@mail.gmail.com>
 <2c23c0de-1802-37be-323e-d390037c6a84@denninger.net>
 <864062ab-f68b-7e63-c3da-539d1e9714f9@denninger.net>
 <6dc1bad1-05b8-2c65-99d3-61c547007dfe@denninger.net>
 <758d5611-c3cf-82dd-220f-a775a57bdd0b@multiplay.co.uk>
 <3f53389a-0cb5-d106-1f64-bbc2123e975c@denninger.net>
 <8108da18-2cdd-fa29-983c-3ae7be6be412@multiplay.co.uk>
 <5ff09821-98e2-8520-1a46-e6541c9c8891@denninger.net>
In-Reply-To: <5ff09821-98e2-8520-1a46-e6541c9c8891@denninger.net>
From: Warner Losh <imp@bsdimp.com>
Date: Mon, 29 Apr 2019 08:26:29 -0600
Message-ID: <CANCZdfq5V46Xqe5GqOaSxbgAWgxjy5dR7SB=wWHQw8mJZC0kHw@mail.gmail.com>
Subject: Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
 [[UPDATE w/more tests]]
To: Karl Denninger <karl@denninger.net>
Cc: FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>,
 Steven Hartland <killing@multiplay.co.uk>
X-Rspamd-Queue-Id: 2BFD884932
X-Spamd-Bar: -----
Authentication-Results: mx1.freebsd.org;
 dkim=pass header.d=bsdimp-com.20150623.gappssmtp.com header.s=20150623
 header.b=2THWgFrH
X-Spamd-Result: default: False [-5.91 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 R_DKIM_ALLOW(-0.20)[bsdimp-com.20150623.gappssmtp.com:s=20150623];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 MIME_GOOD(-0.10)[multipart/alternative,text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org];
 DMARC_NA(0.00)[bsdimp.com]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 TO_DN_ALL(0.00)[];
 DKIM_TRACE(0.00)[bsdimp-com.20150623.gappssmtp.com:+];
 MX_GOOD(-0.01)[cached: ALT1.aspmx.l.google.com];
 RCVD_IN_DNSWL_NONE(0.00)[9.2.8.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org
 : 127.0.5.0]; NEURAL_HAM_SHORT(-0.96)[-0.964,0];
 R_SPF_NA(0.00)[];
 FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com];
 MIME_TRACE(0.00)[0:+,1:+]; RCVD_TLS_LAST(0.00)[];
 ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US];
 FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com];
 IP_SCORE(-2.93)[ip: (-9.19), ipnet: 2607:f8b0::/32(-3.17), asn: 15169(-2.24),
 country: US(-0.06)]; RCVD_COUNT_TWO(0.00)[2]
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Apr 2019 14:26:43 -0000

On Sun, Apr 28, 2019 at 4:03 PM Karl Denninger <karl@denninger.net> wrote:

> On 4/20/2019 15:56, Steven Hartland wrote:
> > Thanks for extra info, the next question would be have you eliminated
> > that corruption exists before the disk is removed?
> >
> > Would be interesting to add a zpool scrub to confirm this isn't the
> > case before the disk removal is attempted.
> >
> >     Regards
> >     Steve
> >
> > On 20/04/2019 18:35, Karl Denninger wrote:
> >>
> >> On 4/20/2019 10:50, Steven Hartland wrote:
> >>> Have you eliminated geli as possible source?
> >> No; I could conceivably do so by re-creating another backup volume
> >> set without geli-encrypting the drives, but I do not have an extra
> >> set of drives of the capacity required laying around to do that.  I
> >> would have to do it with lower-capacity disks, which I can attempt if
> >> you think it would help.  I *do* have open slots in the drive
> >> backplane to set up a second "test" unit of this sort.  For reasons
> >> below it will take at least a couple of weeks to get good data on
> >> whether the problem exists without geli, however.
> >>
> Ok, following up on this with more data....
>
> First step taken was to create a *second* backup pool (I have the
> backplane slots open, fortunately) with three different disks but *no
> encryption.*
>
> I ran both side-by-side for several days, with the *unencrypted* one
> operating with one disk detached and offline (pulled physically) just as
> I do normally.  Then I swapped the two using the same paradigm.
>
> The difference was *dramatic* -- the resilver did *not* scan the entire
> disk; it only copied the changed blocks and was finished FAST.  A
> subsequent scrub came up 100% clean.
>
> Next I put THOSE disks in the vault (so as to make sure I didn't get
> hosed if something went wrong) and re-initialized the pool in question,
> leaving only the "geli" alone (in other words I zpool destroy'd the pool
> and then re-created it with all three disks connected and
> geli-attached.)  The purpose for doing this was to eliminate the
> possibility of old corruption somewhere, or some sort of problem with
> multiple, spanning years, in-place "zpool upgrade" commands.  Then I ran
> a base backup to initialize all three volumes, detached one and yanked
> it out of the backplane, as would be the usual, leaving the other two
> online and operating.
>
> I ran backups as usual for most of last week after doing this, with the
> 61.eli and 62-1.eli volumes online, and 62-2 physically out of the
> backplane.
>
> Today I swapped them again as I usually do (e.g. offline 62.1, geli
> detach, camcontrol standby and then yank it -- then insert the 62-2
> volume, geli attach and zpool online) and this is happening:
>
> [\u@NewFS /home/karl]# zpool status backup
>   pool: backup
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>         continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>   scan: resilver in progress since Sun Apr 28 12:57:47 2019
>         2.48T scanned at 202M/s, 1.89T issued at 154M/s, 3.27T total
>         1.89T resilvered, 57.70% done, 0 days 02:37:14 to go
> config:
>
>         NAME                      STATE     READ WRITE CKSUM
>         backup                    DEGRADED     0     0     0
>           mirror-0                DEGRADED     0     0     0
>             gpt/backup61.eli      ONLINE       0     0     0
>             11295390187305954877  OFFLINE      0     0     0  was
> /dev/gpt/backup62-1.eli
>             gpt/backup62-2.eli    ONLINE       0     0     0
>
> errors: No known data errors
>
> The "3.27T" number is accurate (by "zpool list") for the space in use.
>
> There is not a snowball's chance in Hades that anywhere near 1.89T of
> that data (thus far, and it ain't done as you can see!) was modified
> between when all three disks were online and when the 62-2.eli volume
> was swapped back in for 62.1.eli.  No possible way.  Maybe some
> 100-200Gb of data has been touched across the backed-up filesystems in
> the last three-ish days but there's just flat-out no way it's more than
> that; this would imply an entropy of well over 50% of the writeable data
> on this box in less than a week!  That's NOT possible.  Further it's not
> 100%; it shows 2.48T scanned but 1.89T actually written to the other drive.
>
> So something is definitely foooged here and it does appear that geli is
> involved in it.  Whatever is foooging zfs the resilver process thinks it
> has to recopy MOST (but not all!) of the blocks in use, it appears, from
> the 61.eli volume to the 62-2.eli volume.
>
> The question is what would lead ZFS to think it has to do that -- it
> clearly DOES NOT as a *much* smaller percentage of the total TXG set on
> 61.eli was modified while 62-2.eli was offline and 62.1.eli was online.
>
> Again I note that on 11.1 and previous this resilver was a rapid
> operation; whatever was actually changed got copied but the system never
> copied *nearly everything* on a resilver, including data that had not
> been changed at all, on a mirrored set.
>
> Obviously on a Raidz volume you have to go through the entire data
> structure because parity has to be recomputed and blocks regenerated but
> on a mirror only the changed TXGs need to be looked at and copied.  TXGs
> that were on both disks at the time second one was taken offline do not
> need to be touched *at all* since they're already on the target!
>
> What's going on here?
>

Silly question, and maybe I missed you doing this, but have you tried
dd'ing zeros each of the eli volumes that you are putitng the zpool on?
Does that make the problem go away? Maybe there's something weird going on
with un-written sectors and/or reusing sectors with weird content that is
accentuated with geli, or that causes an error path to trigger that doesn't
with !geli.

Otherwise, this testing sounds quite complete.

Warner