Date: Mon, 7 Apr 2025 09:07:11 -0400 From: mike tancsa <mike@sentex.net> To: Andrea Venturoli <ml@netfence.it>, freebsd-questions <freebsd-questions@freebsd.org> Subject: Re: Sudden zpool checksums errors Message-ID: <4c6b64ec-0e59-4f64-8faf-117c7686a87d@sentex.net> In-Reply-To: <032776db-a8a1-4134-a395-a59effbc4c30@netfence.it> References: <6aeb488d-b3c3-4393-80ca-0b89c1ebc446@netfence.it> <3ddfecf7-2cb3-472c-bfce-93356e57b898@app.fastmail.com> <032776db-a8a1-4134-a395-a59effbc4c30@netfence.it>
index | next in thread | previous in thread | raw e-mail
On 4/5/2025 5:01 AM, Andrea Venturoli wrote: > On 4/4/25 20:59, Dave Cottlehuber wrote: > > > Thanks to all. > I'll answer here collectively. > > > > > >> I have had marginal power supplies, backplane issues or break out >> cables from the controller manifest errors like that. I would check >> the power supply first, backplane next, controller 3rd. > > How would I go about this? How do I check these components? > Does IPMI provide something useful? > ipmitool sensors. The ipmitool sel list will tell you actual errors logged. What does the smartctl -a /dev/da# show for the temperatures of the hard drives ? Does smartctl -x show any interesting log entries for the drives that threw errors vs the ones that did not ? >> - actually really bad ECC memory > > Any way to test? > memtest will help a bit. But if its ECC errors typically do get logged by the BMC and the ipmitool sel list will typically log those. > > >> does ipmitool sel list show anything btw ? (kldload ipmi and pkg >> install ipmitools if you dont have it already) > >> # ipmitool sel list >> 1 | 05/06/24 | 18:16:23 CEST | Temperature #0xcc | Upper >> Non-critical going high | Asserted >> 2 | 05/06/24 | 21:25:42 CEST | Temperature #0xcc | Upper Critical >> going high | Asserted >> 3 | 05/07/24 | 15:49:00 CEST | Temperature #0xcc | Upper Critical >> going high | Deasserted >> 4 | 05/07/24 | 16:00:43 CEST | Temperature #0xcc | Upper >> Non-critical going high | Deasserted >> 5 | 06/13/24 | 11:54:52 CEST | Drive Slot / Bay #0x77 | Drive >> Present | Asserted >> 6 | 06/13/24 | 11:55:24 CEST | Drive Slot / Bay #0x73 | Drive >> Present | Asserted >> 7 | 06/13/24 | 14:21:04 CEST | Drive Slot / Bay #0x73 | Drive >> Present | Deasserted >> 8 | 06/13/24 | 14:21:04 CEST | Drive Slot / Bay #0x77 | Drive >> Present | Deasserted > >help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4c6b64ec-0e59-4f64-8faf-117c7686a87d>
