From nobody Fri Feb 25 13:53:43 2022 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 2147219E181D for ; Fri, 25 Feb 2022 13:53:53 +0000 (UTC) (envelope-from eugene@zhegan.in) Received: from elf.hq.norma.perm.ru (mail.norman-retail.ru [128.127.146.8]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.norma.perm.ru", Issuer "Let's Encrypt Authority X3" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4K4rpm1dptz3n2X for ; Fri, 25 Feb 2022 13:53:51 +0000 (UTC) (envelope-from eugene@zhegan.in) Received: from bsdrookie.norma.com. ([128.127.147.1]) by elf.hq.norma.perm.ru (8.16.1/8.15.2) with ESMTPS id 21PDluGW022943 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Fri, 25 Feb 2022 18:47:56 +0500 (+05) (envelope-from eugene@zhegan.in) Subject: Re: zfs mirrored pool dead after a disk death and reset To: stable@freebsd.org References: From: "Eugene M. Zheganin" Message-ID: <10384e62-b643-95d9-1e1e-9ffa52a07c03@zhegan.in> Date: Fri, 25 Feb 2022 18:53:43 +0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------2CD3262A5FD6A2147E1D9A6B" Content-Language: en-US X-Rspamd-Server: elf.hq.norma.perm.ru X-Rspamd-Queue-Id: 4K4rpm1dptz3n2X X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of eugene@zhegan.in designates 128.127.146.8 as permitted sender) smtp.mailfrom=eugene@zhegan.in X-Spamd-Result: default: False [-3.30 / 15.00]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.996]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+a]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; TO_DN_NONE(0.00)[]; DMARC_NA(0.00)[zhegan.in]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; MLMMJ_DEST(0.00)[stable]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:212494, ipnet:128.127.146.0/24, country:RU]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N This is a multi-part message in MIME format. --------------2CD3262A5FD6A2147E1D9A6B Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Hello, On 25.02.2022 18:30, Steven Hartland wrote: > Have you tried removing the dead disk physically. I've seen in the > past a bad disk sending causing bad data to be sent to the controller > causing knock on issues. Yup, I did. I've even built 13.0 and tried to import it there. 13.0 complains dirrerently, but still refuses to import: # zpool import pool: data id: 15967028801499953224 state: ONLINE status: One or more devices contains corrupted data. action: The pool can be imported using its name or numeric identifier. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J config: data        ONLINE nvd0      UNAVAIL  corrupted data nvd1      ONLINE And while importing: # zpool import -FX data cannot import 'data': one or more devices is currently unavailable and I see the following in dmesg: Feb 25 16:44:41 db0 ZFS[4857]: failed to load zpool data Feb 25 16:44:41 db0 ZFS[4873]: failed to load zpool data Feb 25 16:44:41 db0 ZFS[4889]: failed to load zpool data Feb 25 16:44:41 db0 ZFS[4909]: failed to load zpool data Feb 25 16:45:13 db0 ZFS[4940]: pool log replay failure, zpool=data Feb 25 16:45:13 db0 ZFS[4952]: pool log replay failure, zpool=data Feb 25 16:45:13 db0 ZFS[4964]: pool log replay failure, zpool=data Feb 25 16:45:13 db0 ZFS[4976]: pool log replay failure, zpool=data > > Also the output doesn't show multiple devices, only nvd0. I'm hoping > you didn't use nv raid to create the mirror, as that means there's no > ZFS protection? Nope, I'm aware of that. Acrually, the redundant drive is still there, but dead already, it's the FAULTED device 9566965891719887395 in my quotes below. > > [root@db0:~]# zpool import > pool: data > id: 15967028801499953224 > state: FAULTED > status: One or more devices contains corrupted data. > action: The pool cannot be imported due to damaged devices or data. > The pool may be active on another system, but can be imported using > the '-f' flag. > see: http://illumos.org/msg/ZFS-8000-5E > config: > data                   FAULTED  corrupted data > 9566965891719887395  FAULTED  corrupted data > nvd0                 ONLINE > Thanks. Eugene. --------------2CD3262A5FD6A2147E1D9A6B Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
Hello,

On 25.02.2022 18:30, Steven Hartland wrote:
Have you tried removing the dead disk physically. I've seen in the past a bad disk sending causing bad data to be sent to the controller causing knock on issues.

Yup, I did. I've even built 13.0 and tried to import it there. 13.0 complains dirrerently, but still refuses to import:


# zpool import
pool: data
id: 15967028801499953224
state: ONLINE
status: One or more devices contains corrupted data.
action: The pool can be imported using its name or numeric identifier.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
config:
data        ONLINE
nvd0      UNAVAIL  corrupted data
nvd1      ONLINE


And while importing:

# zpool import -FX data
cannot import 'data': one or more devices is currently unavailable

and I see the following in dmesg:

Feb 25 16:44:41 db0 ZFS[4857]: failed to load zpool data
Feb 25 16:44:41 db0 ZFS[4873]: failed to load zpool data
Feb 25 16:44:41 db0 ZFS[4889]: failed to load zpool data
Feb 25 16:44:41 db0 ZFS[4909]: failed to load zpool data
Feb 25 16:45:13 db0 ZFS[4940]: pool log replay failure, zpool=data
Feb 25 16:45:13 db0 ZFS[4952]: pool log replay failure, zpool=data
Feb 25 16:45:13 db0 ZFS[4964]: pool log replay failure, zpool=data
Feb 25 16:45:13 db0 ZFS[4976]: pool log replay failure, zpool=data


Also the output doesn't show multiple devices, only nvd0. I'm hoping you didn't use nv raid to create the mirror, as that means there's no ZFS protection?
Nope, I'm aware of that. Acrually, the redundant drive is still there, but dead already, it's the FAULTED device 9566965891719887395 in my quotes below.


[root@db0:~]# zpool import
pool: data
id: 15967028801499953224
state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: http://illumos.org/msg/ZFS-8000-5E
config:
data                   FAULTED  corrupted data
9566965891719887395  FAULTED  corrupted data
nvd0                 ONLINE


Thanks.

Eugene.


--------------2CD3262A5FD6A2147E1D9A6B--