Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 07 Apr 2025 16:36:31 +0000
From:      "Dave Cottlehuber" <dch@skunkwerks.at>
To:        "Andrea Venturoli" <ml@netfence.it>, "Mike Tancsa" <mike@sentex.net>, freebsd-questions <freebsd-questions@freebsd.org>
Subject:   Re: Sudden zpool checksums errors
Message-ID:  <3b4abe36-7fda-4ab7-8893-d8fe07288ab7@app.fastmail.com>
In-Reply-To: <0e703e40-1d87-4c4b-a2b1-f370933f713a@netfence.it>
References:  <6aeb488d-b3c3-4393-80ca-0b89c1ebc446@netfence.it> <3ddfecf7-2cb3-472c-bfce-93356e57b898@app.fastmail.com> <032776db-a8a1-4134-a395-a59effbc4c30@netfence.it> <4c6b64ec-0e59-4f64-8faf-117c7686a87d@sentex.net> <0e703e40-1d87-4c4b-a2b1-f370933f713a@netfence.it>

index | next in thread | previous in thread | raw e-mail

On Mon, 7 Apr 2025, at 15:15, Andrea Venturoli wrote:
> On 4/7/25 15:07, mike tancsa wrote:
> All "non-error" drives report:
> SCT Error Recovery Control:
>             Read: Disabled 
>
>            Write: Disabled
>
> All "error" drives report:
> SCT Error Recovery Control:
>             Read:    655 (65.5 seconds)
>            Write:    670 (67.0 seconds)
>
> I wonder if this could be the culprit...
> I guess I should enable or disable it on all drives; however I've been 
> reading mixed opinions on whether this is good or bad for ZFS.
>
> Any suggestion?

I would have a short timeout and rely on zfs to handle cleanup. The
thinking is that it is better for latency to return (failed) fast,
and let zfs give the correct data, then clean up afterwards, than
potentially have the entire drive be marked offline by zfs because
of the longer delay time.

Does this seem reasonable?

https://github.com/AMDmi3/scterc-rc.d & https://forums.truenas.com/t/checking-for-tler-erc-etc-support-on-a-drive/1497 may be useful.

A+
Dave


help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3b4abe36-7fda-4ab7-8893-d8fe07288ab7>