From nobody Tue Sep 10 10:35:25 2024
X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4X30TX1cT1z5W7c1
	for <freebsd-fs@mlmmj.nyi.freebsd.org>; Tue, 10 Sep 2024 10:35:28 +0000 (UTC)
	(envelope-from andy@time-domain.co.uk)
Received: from mail0.time-domain.net (mail0.time-domain.net [62.3.122.138])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(Client did not present a certificate)
	by mx1.freebsd.org (Postfix) with ESMTPS id 4X30TW4BzYz4kSd;
	Tue, 10 Sep 2024 10:35:27 +0000 (UTC)
	(envelope-from andy@time-domain.co.uk)
Authentication-Results: mx1.freebsd.org;
	none
Received: from mail0.time-domain.net (localhost [127.0.0.1])
	by mail0.time-domain.net (8.15.2/8.15.2) with ESMTP id 48AAZPUk075075;
	Tue, 10 Sep 2024 11:35:25 +0100 (BST)
	(envelope-from andy@time-domain.co.uk)
Received: from localhost (andy-tds@localhost)
	by mail0.time-domain.net (8.15.2/8.15.2/Submit) with ESMTP id 48AAZPTs075072;
	Tue, 10 Sep 2024 11:35:25 +0100 (BST)
	(envelope-from andy@time-domain.co.uk)
X-Authentication-Warning: mail0.time-domain.net: andy-tds owned process doing -bs
Date: Tue, 10 Sep 2024 11:35:25 +0100 (BST)
From: andy thomas <andy@time-domain.co.uk>
X-X-Sender: andy-tds@mail0.time-domain.net
To: Allan Jude <allanjude@freebsd.org>
cc: freebsd-fs@freebsd.org
Subject: Re: Does a failed separate ZIL disk mean the entire zpool is lost?
In-Reply-To: <dabea42c-65d7-40ea-bd37-840148e855c5@freebsd.org>
Message-ID: <alpine.BSF.2.22.395.2409101105040.74876@mail0.time-domain.net>
References: <alpine.BSF.2.22.395.2409091634020.50467@mail0.time-domain.net> <535969cf-0b0b-48ca-a163-fc238f316bb7@gmx.at> <dabea42c-65d7-40ea-bd37-840148e855c5@freebsd.org>
User-Agent: Alpine 2.22 (BSF 395 2020-01-19)
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-fs
List-Help: <mailto:freebsd-fs+help@freebsd.org>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Subscribe: <mailto:freebsd-fs+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-fs+unsubscribe@freebsd.org>
Sender: owner-freebsd-fs@FreeBSD.org
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="951801389-2008165308-1725964525=:74876"
X-Spamd-Bar: ----
X-Rspamd-Pre-Result: action=no action;
	module=replies;
	Message is reply to one we originated
X-Spamd-Result: default: False [-4.00 / 15.00];
	REPLY(-4.00)[];
	ASN(0.00)[asn:13037, ipnet:62.3.64.0/18, country:GB]
X-Rspamd-Queue-Id: 4X30TW4BzYz4kSd

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--951801389-2008165308-1725964525=:74876
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8BIT

Thank you but I'm afraid I didn't use two mirrored ZIL devices since I 
didn't know this was possible at the time I set this server up (late 2017 
and before I was even aware of the 'FreeBSD Mastery: ZFS' book!) And there 
were no spare disk bays in the server's chassis to add another device and 
at the time PCIe > nvme adapters were not available. For data resilience I 
relied on an identical mirror server in the same rack linked via a 2 x 
10GBit/sec bonded point-to-point network link but this server also failed 
in the data centre melt-down...

It looks like the data is now lost so I won't waste any more time trying 
to recover it - this incident will hopefully persuade my employer to heed 
advice given years ago regarding locating mirror servers in a different 
data centre linked by a fast multi-gigabit connection.

Andy

PS: the ZFS and Advanced ZFS books are truly excellent, by the way!

On Mon, 9 Sep 2024, Allan Jude wrote:

> As the last person mentioned, you should be able to import with the -m flag, 
> and only lose about 5 seconds worth of writes.
>
> The pool is already partially imported at boot by the other mechanisms, you 
> might need to disable that to prevent the partial import at boot, so you can 
> do the manual import.
>
> On 2024-09-09 12:20 p.m., infoomatic wrote:
>> did you use two mirrored ZIL devices?
>> 
>> You can "zpool import -m", but you will probably be confronted with some
>> errors - you will probably lose the data the ZIL has not committed, but
>> most of your data in your pool should be there
>> 
>> 
>> On 09.09.24 17:51, andy thomas wrote:
>>> A server I look after had a 65TB ZFS RAIDz1 pool with 8 x 8TB hard disks
>>> plus one hot spare and separate ZFS intent log (ZIL) and L2ARC cache
>>> disks that used a pair of 256GB SSDs. This ran really well for 6 years
>>> until 2 weeks ago, when the main cooling system in the data centre where
>>> it was installed failed and the backup cooling system failed to start up.
>>> 
>>> The upshot was the ZIL SSD went short-circuit across its power
>>> connector, shorting out the server's PSUs and shutting down the server.
>>> After replacing the failed SSD and verifying all the spinning hard disks
>>> and the cache SSD are undamaged, attempts to import the pool fail with
>>> the following message:
>>> 
>>> NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP DEDUP
>>> HEALTH  ALTROOT
>>> clustor2      -      -      -        -         -      - -      -
>>> UNAVAIL  -
>>> 
>>> Does this mean the pool's contents are now lost and unrecoverable?
>>> 
>>> Andy
>>> 
>> 
>
>


----------------------------
Andy Thomas,
Time Domain Systems

Tel: +44 (0)7866 556626
http://www.time-domain.co.uk
--951801389-2008165308-1725964525=:74876--