Date: Mon, 7 Apr 2025 17:15:02 +0200 From: Andrea Venturoli <ml@netfence.it> To: mike tancsa <mike@sentex.net>, freebsd-questions <freebsd-questions@freebsd.org> Subject: Re: Sudden zpool checksums errors Message-ID: <0e703e40-1d87-4c4b-a2b1-f370933f713a@netfence.it> In-Reply-To: <4c6b64ec-0e59-4f64-8faf-117c7686a87d@sentex.net> References: <6aeb488d-b3c3-4393-80ca-0b89c1ebc446@netfence.it> <3ddfecf7-2cb3-472c-bfce-93356e57b898@app.fastmail.com> <032776db-a8a1-4134-a395-a59effbc4c30@netfence.it> <4c6b64ec-0e59-4f64-8faf-117c7686a87d@sentex.net>
index | next in thread | previous in thread | raw e-mail
On 4/7/25 15:07, mike tancsa wrote:
> What does the smartctl -a /dev/da# show for the temperatures of
> the hard drives ?
Temperatures vary between drives (probably due to their slot position in
the chassis): over the last month, the coldest one averaged 30C with a
max of 35C; the hottest averaged 39C, with a peak of 48C.
There does not seem to be a correlation between temperatures and errors
(some drives gave errors are colder than others that didn't).
> Does smartctl -x show any interesting log entries for
> the drives that threw errors vs the ones that did not ?
All "non-error" drives report:
SCT Error Recovery Control:
Read: Disabled
Write: Disabled
All "error" drives report:
SCT Error Recovery Control:
Read: 655 (65.5 seconds)
Write: 670 (67.0 seconds)
I wonder if this could be the culprit...
I guess I should enable or disable it on all drives; however I've been
reading mixed opinions on whether this is good or bad for ZFS.
Any suggestion?
"Errored" drives show a few "Resets Between Cmd Acceptance and
Completion", "Number of Hardware Resets", "Number of ASR Events",
"Transition from drive PhyRdy to drive PhyNRdy" and "Device-to-host
register FISes sent due to a COMRESET".
Due to my ignorance I cannot tell what might be the cause and what the
effect :(
bye & Thanks
av.
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0e703e40-1d87-4c4b-a2b1-f370933f713a>
