Date: Fri, 04 Apr 2025 18:59:35 +0000 From: "Dave Cottlehuber" <dch@skunkwerks.at> To: "Andrea Venturoli" <ml@netfence.it> Cc: freebsd-questions <freebsd-questions@freebsd.org> Subject: Re: Sudden zpool checksums errors Message-ID: <3ddfecf7-2cb3-472c-bfce-93356e57b898@app.fastmail.com> In-Reply-To: <6aeb488d-b3c3-4393-80ca-0b89c1ebc446@netfence.it>
index | next in thread | previous in thread | raw e-mail
On Fri, 4 Apr 2025, at 15:42, Andrea Venturoli wrote: > Hello. > I'm finding it hard to believe that 7 disks out of 12 are failing or > just happened to misbehave all on the same day. > BTW, SMART says they are OK. Not saying its not zfs, but its probably not zfs.... fingers crossed! > I'm reluctant to blame RAM (since it's ECC) and power supply (as it's > redundant 2x800W). If its memory, and your mainboard supports it, you'll see failures in dmesg, MCA ... some good examples: https://lists.freebsd.org/pipermail/freebsd-hackers/2015-January/046878.html https://forums.freebsd.org/threads/mca-errors.88909/ https://forums.freebsd.org/threads/solved-weird-mca-errors.94800/ > Disks are 16TB TOSHIBA MG09ACA1 connected to a MegaRAID SAS-3 3108 (of > course not operating as RAID and with mrsas driver). Look for SCSI or CAM errors in your logs too, disconnects. I have seen storms of checksum errors in at least these situations: - faulty or failing storage / scsi controller - insufficient power (or failing power supplies) under load - overclocking - overheating on mainboard, or controller, or drives - actually really bad ECC memory - drive cables that have worked loose over time - over 50 disks failing within 2 days in a 200+ disk array - all disks failing within 20 days of deployment in 24 disk chassis Sometimes, vendors produce batches of Bad Disks - firmware bugs, physical defects, unexpected dust inside the sealed platters. Failures are far more correlated than you'd want to believe. External vibrations can cause problems. A slow process of upgrading firmware & checking each component, resetting all cables, is the best way to deal with this. A+ Davehelp
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3ddfecf7-2cb3-472c-bfce-93356e57b898>
