Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 Sep 2024 12:16:20 +0100
From:      Frank Leonhardt <freebsd-doc@fjl.co.uk>
To:        questions <questions@freebsd.org>
Subject:   Re: Zpool status -- why does a suboptimal pool show as "ONLINE"?
Message-ID:  <0290d22f5be2eb0b324254b663076924@fjl.co.uk>
In-Reply-To: <312af967-e5bf-4e83-b48b-7c2841719373@app.fastmail.com>
References:  <378D100E-FFE1-4DA7-9C52-219863A50A24@gushi.org> <312af967-e5bf-4e83-b48b-7c2841719373@app.fastmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2024-09-12 14:29, Dave Cottlehuber wrote:
> On Thu, 12 Sep 2024, at 13:05, Dan Mahoney (Ports) wrote:
>> Hey there all,
>> 
>> I have a nagios check that assumes that if I have a suboptimal zfs
>> zpool, that the word “DEGRADED” will appear in the output.  One disk 
>> of
>> a two-disk mirror seems to have faulted, but the pool still shows as
>> “ONLINE”.  I know I’ve seen the word “DEGRADED” in the past.  What’s
>> different?
>> 
>>   pool: zroot
>>  state: ONLINE
>> status: One or more devices are faulted in response to persistent 
>> errors.
>>         Sufficient replicas exist for the pool to continue functioning 
>> in a
>>         degraded state.
>> action: Replace the faulted device, or use 'zpool clear' to mark the 
>> device
>>         repaired.
>> config:
>> 
>>         NAME        STATE     READ WRITE CKSUM
>>         zroot       ONLINE       0     0     0
>>           mirror-0  ONLINE       0     0     0
>>             ada0p3  FAULTED      4   372     0  too many errors
>>             ada1p3  ONLINE       0     0     0
>> 
>> errors: No known data errors
>> 
>> 14.1, if it matters, the disks are two innolite SATADOM’s.
> 
> Hi Dan
> 
> I agree that I would expect the mirror-0 at least to report DEGRADED
> or similar. Hopefully one of the zfs people clarifies the logic here.
> 
> Practically, what I do is run:
> 
>     zpool status | grep -v 'with 0 errors' | sha256
> 
> and check that this hash remains the same over time. It's obviously
> different for each pool. Could that help for nagios?

I agree. A faulted drive always used to appear as "FAULTED" and and the 
vdev and pool should both have been tagged "DEGRADED" (cascading 
upwards).

A faulted drive isn't necessary taken offline, although "too many 
errors" suggests it should be.

If this isn't a bug I'd like to know the reason why.

Regards, Frank.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0290d22f5be2eb0b324254b663076924>