Date: Tue, 10 Sep 2024 14:02:10 -0400 From: Charles Sprickman <spork@bway.net> To: andy thomas <andy@time-domain.co.uk> Cc: Allan Jude <allanjude@freebsd.org>, freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: Does a failed separate ZIL disk mean the entire zpool is lost? Message-ID: <94C1B563-5C1A-4D48-AE88-9ABEE8841D99@bway.net> In-Reply-To: <alpine.BSF.2.22.395.2409101105040.74876@mail0.time-domain.net> References: <alpine.BSF.2.22.395.2409091634020.50467@mail0.time-domain.net> <535969cf-0b0b-48ca-a163-fc238f316bb7@gmx.at> <dabea42c-65d7-40ea-bd37-840148e855c5@freebsd.org> <alpine.BSF.2.22.395.2409101105040.74876@mail0.time-domain.net>
next in thread | previous in thread | raw e-mail | index | archive | help
I don't think your data is gone! Note that the "-m" option to ignore the log device and possibly lose a = few transactions does NOT require a mirrored ZIL: -m Allows a pool to import when there is a missing log device. Recent transactions can be lost because the = log device will be discarded. It seems like it's absolutely worth trying. Charles > On Sep 10, 2024, at 6:35=E2=80=AFAM, andy thomas = <andy@time-domain.co.uk> wrote: >=20 > Thank you but I'm afraid I didn't use two mirrored ZIL devices since I = didn't know this was possible at the time I set this server up (late = 2017 and before I was even aware of the 'FreeBSD Mastery: ZFS' book!) = And there were no spare disk bays in the server's chassis to add another = device and at the time PCIe > nvme adapters were not available. For data = resilience I relied on an identical mirror server in the same rack = linked via a 2 x 10GBit/sec bonded point-to-point network link but this = server also failed in the data centre melt-down... >=20 > It looks like the data is now lost so I won't waste any more time = trying to recover it - this incident will hopefully persuade my employer = to heed advice given years ago regarding locating mirror servers in a = different data centre linked by a fast multi-gigabit connection. >=20 > Andy >=20 > PS: the ZFS and Advanced ZFS books are truly excellent, by the way! >=20 > On Mon, 9 Sep 2024, Allan Jude wrote: >=20 >> As the last person mentioned, you should be able to import with the = -m flag, and only lose about 5 seconds worth of writes. >>=20 >> The pool is already partially imported at boot by the other = mechanisms, you might need to disable that to prevent the partial import = at boot, so you can do the manual import. >>=20 >> On 2024-09-09 12:20 p.m., infoomatic wrote: >>> did you use two mirrored ZIL devices? >>> You can "zpool import -m", but you will probably be confronted with = some >>> errors - you will probably lose the data the ZIL has not committed, = but >>> most of your data in your pool should be there >>> On 09.09.24 17:51, andy thomas wrote: >>>> A server I look after had a 65TB ZFS RAIDz1 pool with 8 x 8TB hard = disks >>>> plus one hot spare and separate ZFS intent log (ZIL) and L2ARC = cache >>>> disks that used a pair of 256GB SSDs. This ran really well for 6 = years >>>> until 2 weeks ago, when the main cooling system in the data centre = where >>>> it was installed failed and the backup cooling system failed to = start up. >>>> The upshot was the ZIL SSD went short-circuit across its power >>>> connector, shorting out the server's PSUs and shutting down the = server. >>>> After replacing the failed SSD and verifying all the spinning hard = disks >>>> and the cache SSD are undamaged, attempts to import the pool fail = with >>>> the following message: >>>> NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP = DEDUP >>>> HEALTH ALTROOT >>>> clustor2 - - - - - - - - >>>> UNAVAIL - >>>> Does this mean the pool's contents are now lost and unrecoverable? >>>> Andy >>=20 >>=20 >=20 >=20 > ---------------------------- > Andy Thomas, > Time Domain Systems >=20 > Tel: +44 (0)7866 556626 > http://www.time-domain.co.uk
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?94C1B563-5C1A-4D48-AE88-9ABEE8841D99>