From nobody Tue Sep 10 10:35:25 2024 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4X30TX1cT1z5W7c1 for ; Tue, 10 Sep 2024 10:35:28 +0000 (UTC) (envelope-from andy@time-domain.co.uk) Received: from mail0.time-domain.net (mail0.time-domain.net [62.3.122.138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4X30TW4BzYz4kSd; Tue, 10 Sep 2024 10:35:27 +0000 (UTC) (envelope-from andy@time-domain.co.uk) Authentication-Results: mx1.freebsd.org; none Received: from mail0.time-domain.net (localhost [127.0.0.1]) by mail0.time-domain.net (8.15.2/8.15.2) with ESMTP id 48AAZPUk075075; Tue, 10 Sep 2024 11:35:25 +0100 (BST) (envelope-from andy@time-domain.co.uk) Received: from localhost (andy-tds@localhost) by mail0.time-domain.net (8.15.2/8.15.2/Submit) with ESMTP id 48AAZPTs075072; Tue, 10 Sep 2024 11:35:25 +0100 (BST) (envelope-from andy@time-domain.co.uk) X-Authentication-Warning: mail0.time-domain.net: andy-tds owned process doing -bs Date: Tue, 10 Sep 2024 11:35:25 +0100 (BST) From: andy thomas X-X-Sender: andy-tds@mail0.time-domain.net To: Allan Jude cc: freebsd-fs@freebsd.org Subject: Re: Does a failed separate ZIL disk mean the entire zpool is lost? In-Reply-To: Message-ID: References: <535969cf-0b0b-48ca-a163-fc238f316bb7@gmx.at> User-Agent: Alpine 2.22 (BSF 395 2020-01-19) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@FreeBSD.org MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="951801389-2008165308-1725964525=:74876" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:13037, ipnet:62.3.64.0/18, country:GB] X-Rspamd-Queue-Id: 4X30TW4BzYz4kSd This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --951801389-2008165308-1725964525=:74876 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8BIT Thank you but I'm afraid I didn't use two mirrored ZIL devices since I didn't know this was possible at the time I set this server up (late 2017 and before I was even aware of the 'FreeBSD Mastery: ZFS' book!) And there were no spare disk bays in the server's chassis to add another device and at the time PCIe > nvme adapters were not available. For data resilience I relied on an identical mirror server in the same rack linked via a 2 x 10GBit/sec bonded point-to-point network link but this server also failed in the data centre melt-down... It looks like the data is now lost so I won't waste any more time trying to recover it - this incident will hopefully persuade my employer to heed advice given years ago regarding locating mirror servers in a different data centre linked by a fast multi-gigabit connection. Andy PS: the ZFS and Advanced ZFS books are truly excellent, by the way! On Mon, 9 Sep 2024, Allan Jude wrote: > As the last person mentioned, you should be able to import with the -m flag, > and only lose about 5 seconds worth of writes. > > The pool is already partially imported at boot by the other mechanisms, you > might need to disable that to prevent the partial import at boot, so you can > do the manual import. > > On 2024-09-09 12:20 p.m., infoomatic wrote: >> did you use two mirrored ZIL devices? >> >> You can "zpool import -m", but you will probably be confronted with some >> errors - you will probably lose the data the ZIL has not committed, but >> most of your data in your pool should be there >> >> >> On 09.09.24 17:51, andy thomas wrote: >>> A server I look after had a 65TB ZFS RAIDz1 pool with 8 x 8TB hard disks >>> plus one hot spare and separate ZFS intent log (ZIL) and L2ARC cache >>> disks that used a pair of 256GB SSDs. This ran really well for 6 years >>> until 2 weeks ago, when the main cooling system in the data centre where >>> it was installed failed and the backup cooling system failed to start up. >>> >>> The upshot was the ZIL SSD went short-circuit across its power >>> connector, shorting out the server's PSUs and shutting down the server. >>> After replacing the failed SSD and verifying all the spinning hard disks >>> and the cache SSD are undamaged, attempts to import the pool fail with >>> the following message: >>> >>> NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP DEDUP >>> HEALTH  ALTROOT >>> clustor2      -      -      -        -         -      - -      - >>> UNAVAIL  - >>> >>> Does this mean the pool's contents are now lost and unrecoverable? >>> >>> Andy >>> >> > > ---------------------------- Andy Thomas, Time Domain Systems Tel: +44 (0)7866 556626 http://www.time-domain.co.uk --951801389-2008165308-1725964525=:74876--