From owner-freebsd-stable@freebsd.org Sat Apr 20 15:50:41 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EEC3B1590E11 for ; Sat, 20 Apr 2019 15:50:40 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 76B4C95EC0 for ; Sat, 20 Apr 2019 15:50:39 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-ed1-x52f.google.com with SMTP id g6so6552530edc.8 for ; Sat, 20 Apr 2019 08:50:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding:content-language; bh=Z+j1odoeDC4IpR/s8eCr8f/+BAWdTInwmy8cpngnMWE=; b=sKyqVzy6N/SU9UMsrkROJpPCI3T8SeP/yis8XaPjpYFZkvZ6guJOsysmZMnKhwDgWS c17wgaM3YWUiTB5f7DE7xduzbru1W9YLM7wES2lmtYapYMpC2+55F5jU949bs7W/Nuep G+DuR62kK0TNvSbb/7lJS/z47wudVg9IQbiF/wNAKnPFiwIteOk6Zw1F0oQvxbq1Nk8o 38WBqCpRATPMCbTQ50X7yibzWxObdxBZO3exPfBYA7Ug8sQV8boakNZMQrW0QxefkrQT Ug3EA6zPv6TrX3d7dGL8aInzdnlcDHqhdyq6O5wo2mSjwZeOPFqknLf+Y8EMxwrCUQVl AueQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=Z+j1odoeDC4IpR/s8eCr8f/+BAWdTInwmy8cpngnMWE=; b=giPlQrEPKtRgEmZVyLM8YVC9LOc4nAM/ZX1nmIsx2qdhYo1P7X7duCEsbWUhs+mN/i 9+9hEz7ANxfAgMMdehq7NB6nGuNFTcu+nV/Z2lHnkdOJLCW6xFVWdUJCoZwEiuCo5EV7 7nyQShFG/Lzjy9/cfgVq72waToVC3n08s6OMxmicRvgkInJ2qLvCSE/8tx3FBjO+xBf1 brRKQZT5Yw+eZcbA8VuxDGvqw4nuQyEpkARxhrsue3QEvX3BZA68KbxFmmpgFSX0o8w8 pq+g1crsl8q0oKHKdO9OLlIT/RJcDL606gKsGkD99dHiUAnWAYx2zERog5cKY2pCIFOR VKJg== X-Gm-Message-State: APjAAAUMi1VgJj3OvEF6wk+kZ9si5tKPVMHNqpQwjpoVbZGr3SFBRTcX SQB1jjWP4YC2zWGsANT87KCQW+ute/8= X-Google-Smtp-Source: APXvYqynX15dAnwwqVjsGAgJp5lNOkQUPx7vGe2HdsuJSLAkjB5Ni8UAZMzwR69ymJfR8D5sCQg9aQ== X-Received: by 2002:a50:a915:: with SMTP id l21mr6338715edc.164.1555775437671; Sat, 20 Apr 2019 08:50:37 -0700 (PDT) Received: from [10.44.128.75] ([161.12.40.153]) by smtp.gmail.com with ESMTPSA id s13sm1369794eju.89.2019.04.20.08.50.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 20 Apr 2019 08:50:36 -0700 (PDT) Subject: Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20) To: Karl Denninger , freebsd-stable@freebsd.org References: <9a96b1b5-9337-fcae-1a2a-69d7bb24a5b3@denninger.net> <1866e238-e2a1-ef4e-bee5-5a2f14e35b22@denninger.net> <3d2ad225-b223-e9db-cce8-8250571b92c9@FreeBSD.org> <2bc8a172-6168-5ba9-056c-80455eabc82b@denninger.net> <2c23c0de-1802-37be-323e-d390037c6a84@denninger.net> <864062ab-f68b-7e63-c3da-539d1e9714f9@denninger.net> <6dc1bad1-05b8-2c65-99d3-61c547007dfe@denninger.net> From: Steven Hartland Message-ID: <758d5611-c3cf-82dd-220f-a775a57bdd0b@multiplay.co.uk> Date: Sat, 20 Apr 2019 16:50:38 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <6dc1bad1-05b8-2c65-99d3-61c547007dfe@denninger.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Rspamd-Queue-Id: 76B4C95EC0 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=pass header.d=multiplay-co-uk.20150623.gappssmtp.com header.s=20150623 header.b=sKyqVzy6; spf=pass (mx1.freebsd.org: domain of killing@multiplay.co.uk designates 2a00:1450:4864:20::52f as permitted sender) smtp.mailfrom=killing@multiplay.co.uk X-Spamd-Result: default: False [-6.26 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_DKIM_ALLOW(-0.20)[multiplay-co-uk.20150623.gappssmtp.com:s=20150623]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; DMARC_NA(0.00)[multiplay.co.uk]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[multiplay-co-uk.20150623.gappssmtp.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[f.2.5.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0]; MX_GOOD(-0.01)[ASPMX.L.GOOGLE.COM,ALT2.ASPMX.L.GOOGLE.COM,ALT1.ASPMX.L.GOOGLE.COM,ASPMX3.GOOGLEMAIL.COM,ASPMX2.GOOGLEMAIL.COM]; IP_SCORE(-2.76)[ip: (-9.09), ipnet: 2a00:1450::/32(-2.38), asn: 15169(-2.26), country: US(-0.06)]; NEURAL_HAM_SHORT(-1.00)[-0.996,0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; MID_RHS_MATCH_FROM(0.00)[] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Apr 2019 15:50:41 -0000 Have you eliminated geli as possible source? I've just setup an old server which has a LSI 2008 running and old FW (11.0) so was going to have a go at reproducing this. Apart from the disconnect steps below is there anything else needed e.g. read / write workload during disconnect? mps0: port 0xe000-0xe0ff mem 0xfaf3c000-0xfaf3ffff,0xfaf40000-0xfaf7ffff irq 26 at device 0.0 on pci3 mps0: Firmware: 11.00.00.00, Driver: 21.02.00.00-fbsd mps0: IOCCapabilities: 185c     Regards     Steve On 20/04/2019 15:39, Karl Denninger wrote: > I can confirm that 20.00.07.00 does *not* stop this. > The previous write/scrub on this device was on 20.00.07.00.  It was > swapped back in from the vault yesterday, resilvered without incident, > but a scrub says.... > > root@NewFS:/home/karl # zpool status backup >   pool: backup >  state: DEGRADED > status: One or more devices has experienced an unrecoverable error.  An >         attempt was made to correct the error.  Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors >         using 'zpool clear' or replace the device with 'zpool replace'. >    see: http://illumos.org/msg/ZFS-8000-9P >   scan: scrub repaired 188K in 0 days 09:40:18 with 0 errors on Sat Apr > 20 08:45:09 2019 > config: > >         NAME                      STATE     READ WRITE CKSUM >         backup                    DEGRADED     0     0     0 >           mirror-0                DEGRADED     0     0     0 >             gpt/backup61.eli      ONLINE       0     0     0 >             gpt/backup62-1.eli    ONLINE       0     0    47 >             13282812295755460479  OFFLINE      0     0     0  was > /dev/gpt/backup62-2.eli > > errors: No known data errors > > So this is firmware-invariant (at least between 19.00.00.00 and > 20.00.07.00); the issue persists. > > Again, in my instance these devices are never removed "unsolicited" so > there can't be (or at least shouldn't be able to) unflushed data in the > device or kernel cache.  The procedure is and remains: > > zpool offline ..... > geli detach ..... > camcontrol standby ... > > Wait a few seconds for the spindle to spin down. > > Remove disk. > > Then of course on the other side after insertion and the kernel has > reported "finding" the device: > > geli attach ... > zpool online .... > > Wait... > > If this is a boogered TXG that's held in the metadata for the > "offline"'d device (maybe "off by one"?) that's potentially bad in that > if there is an unknown failure in the other mirror component the > resilver will complete but data has been irrevocably destroyed. > > Granted, this is a very low probability scenario (the area where the bad > checksums are has to be where the corruption hits, and it has to happen > between the resilver and access to that data.)  Those are long odds but > nonetheless a window of "you're hosed" does appear to exist. >