From nobody Fri Feb 25 11:07:07 2022 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id C8DD519EC6F0 for ; Fri, 25 Feb 2022 11:07:15 +0000 (UTC) (envelope-from eugene@zhegan.in) Received: from elf.hq.norma.perm.ru (mail.norman-retail.ru [128.127.146.8]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.norma.perm.ru", Issuer "Let's Encrypt Authority X3" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4K4n6V5X7jz3F5c for ; Fri, 25 Feb 2022 11:07:14 +0000 (UTC) (envelope-from eugene@zhegan.in) Received: from bsdrookie.norma.com. ([128.127.147.1]) by elf.hq.norma.perm.ru (8.16.1/8.15.2) with ESMTPS id 21PB1JWB010614 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO) for ; Fri, 25 Feb 2022 16:01:19 +0500 (+05) (envelope-from eugene@zhegan.in) From: "Eugene M. Zheganin" Subject: zfs mirrored pool dead after a disk death and reset To: stable@freebsd.org Message-ID: Date: Fri, 25 Feb 2022 16:07:07 +0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Server: elf.hq.norma.perm.ru X-Rspamd-Queue-Id: 4K4n6V5X7jz3F5c X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of eugene@zhegan.in designates 128.127.146.8 as permitted sender) smtp.mailfrom=eugene@zhegan.in X-Spamd-Result: default: False [-2.16 / 15.00]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+a]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[stable@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000]; DMARC_NA(0.00)[zhegan.in]; NEURAL_SPAM_SHORT(0.14)[0.141]; MLMMJ_DEST(0.00)[stable]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:212494, ipnet:128.127.146.0/24, country:RU]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N Hello. Recently a disk died in one of my servers running 12.2 (12.2-RELEASE-p2). So.... it died, I got a bunch of dmesg errors saying there's a bunch of i/o commands stuck, OS became partially livelocked (I still could login, but barely could do anything) so.... considering this is a mirrored pool, and "I have done it many times before, nothing could be safer !" I sent a reset to the server via IPMI. And it was quite discouraging finding this after a successful boot-up from intact zroot (yeah, I've already tried to zpool import -F after an export, so initially it was imported already, showing the same devastating state): [root@db0:~]# zpool import pool: data id: 15967028801499953224 state: FAULTED status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: http://illumos.org/msg/ZFS-8000-5E config: datašššššššššššššššššš FAULTEDš corrupted data 9566965891719887395š FAULTEDš corrupted data nvd0šššššššššššššššš ONLINE # zpool import -F data cannot import 'data': one or more devices is currently unavailable Well, -yeah, I do have a replica, I didn't lose one bit of data, but it's still a tragedy - to lose pool after one silly reset (and I have done it literally a hundred times before on various servers and FreeBSD versions). So, a couple of questions: - is it worth trying FreeBSD 13 to recover ? (just to get the experience if it can be still recovered) - is it because it's more dangerous with NVMes or would it also happen on SSD/rotational drives ? - would zpool checkpoint save me in this case ? Thanks. Eugene.