Date: Tue, 17 Sep 2024 12:16:20 +0100 From: Frank Leonhardt <freebsd-doc@fjl.co.uk> To: questions <questions@freebsd.org> Subject: Re: Zpool status -- why does a suboptimal pool show as "ONLINE"? Message-ID: <0290d22f5be2eb0b324254b663076924@fjl.co.uk> In-Reply-To: <312af967-e5bf-4e83-b48b-7c2841719373@app.fastmail.com> References: <378D100E-FFE1-4DA7-9C52-219863A50A24@gushi.org> <312af967-e5bf-4e83-b48b-7c2841719373@app.fastmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2024-09-12 14:29, Dave Cottlehuber wrote: > On Thu, 12 Sep 2024, at 13:05, Dan Mahoney (Ports) wrote: >> Hey there all, >> >> I have a nagios check that assumes that if I have a suboptimal zfs >> zpool, that the word “DEGRADED” will appear in the output. One disk >> of >> a two-disk mirror seems to have faulted, but the pool still shows as >> “ONLINE”. I know I’ve seen the word “DEGRADED” in the past. What’s >> different? >> >> pool: zroot >> state: ONLINE >> status: One or more devices are faulted in response to persistent >> errors. >> Sufficient replicas exist for the pool to continue functioning >> in a >> degraded state. >> action: Replace the faulted device, or use 'zpool clear' to mark the >> device >> repaired. >> config: >> >> NAME STATE READ WRITE CKSUM >> zroot ONLINE 0 0 0 >> mirror-0 ONLINE 0 0 0 >> ada0p3 FAULTED 4 372 0 too many errors >> ada1p3 ONLINE 0 0 0 >> >> errors: No known data errors >> >> 14.1, if it matters, the disks are two innolite SATADOM’s. > > Hi Dan > > I agree that I would expect the mirror-0 at least to report DEGRADED > or similar. Hopefully one of the zfs people clarifies the logic here. > > Practically, what I do is run: > > zpool status | grep -v 'with 0 errors' | sha256 > > and check that this hash remains the same over time. It's obviously > different for each pool. Could that help for nagios? I agree. A faulted drive always used to appear as "FAULTED" and and the vdev and pool should both have been tagged "DEGRADED" (cascading upwards). A faulted drive isn't necessary taken offline, although "too many errors" suggests it should be. If this isn't a bug I'd like to know the reason why. Regards, Frank.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0290d22f5be2eb0b324254b663076924>