Date: Wed, 16 Sep 2020 16:34:04 -0500 From: Valeri Galtsev <galtsev@kicp.uchicago.edu> To: freebsd-questions@freebsd.org Subject: Re: partitioning server with 2 hard drives Message-ID: <53910bc6-c16d-44eb-4dee-56d95226a217@kicp.uchicago.edu> In-Reply-To: <20200916210325.08868ecc@gumby.homeunix.com> References: <MWHPR06MB32479D288A8D10AD73FC6A329A200@MWHPR06MB3247.namprd06.prod.outlook.com> <20200915231901.e767350415aad298732f72cc@sohara.org> <CAHu1Y70fkMavOh7Dfw5j9ifC1BwihoCxCua3ufauOtTqt=v_CQ@mail.gmail.com> <20200916065432.508e19c3b9b5c0e44a72da3f@sohara.org> <20200916210325.08868ecc@gumby.homeunix.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-09-16 15:03, RW via freebsd-questions wrote: > On Wed, 16 Sep 2020 06:54:32 +0100 > Steve O'Hara-Smith wrote: > >> On Tue, 15 Sep 2020 16:10:05 -0700 >> Michael Sierchio <kudzu@tenebras.com> wrote: > >>> When, not if, a drive fails, will you find one with precisely the >>> same geometry / capacity to replace the failed drive and remirror? >>> Can you do >> >> You don't need to, you just need one big enough, and 1TB >> drives are very easy to find. > > Just out of curiosity, does ZFS leave a small amount of space unused on > a raw drive to allow for a replacement being very slightly smaller? > > >>> so before the other drive fails (which might be statistically >>> likely)? >> >> Unless you do something daft like RMA the drive and wait then >> yes IME. > > In the early days of Fastmail they had a major outage where at least 3 > drives failed. IIRC the other failures occurred while the RAID array was > being rebuilt after the first failure. I remember someone came to the department with strong opinion that RAIDs can have multiple drive failures, thus leading to loss of data. Especially when rebuilding after replacement of failed drive. That person came from Japan, where people are very smart, and not always use professional sysadmins [at universities?], more often just smart people do set up computers, servers, number crunchers - including the ones with hardware RAIDs. It took me some effort to convince him that their hardware RAIDs (based on venerable 3ware hardware] were just not configured correctly. Good configuration includes weekly task: verify RAID. The last just walks through the whole surface of drives, reading them stripe by stripe, and verifies that redundancy stripes do match mathematical composition of other stripes as they should. If this is not done monthly, or better, weekly, potential bad areas of drives are not discovered, not till rebuild task happens, which can lead of "multiple drive failure" during rebuild task. So, routine weekly RAID verifications (known by other names as well), prevent "multiple drive failure" during rebuild task. That most likely was the mode of failure you have mentioned for that known incident. Incidentally, if checksum of RAID-5 do not match for some stripe, there is no precise way to pinpoint which drive has wrong information on this stripe. There are indirect indications such as drive responding slower (due to attempt to re-create by multiple reads and superimposition of reads, and re-allocate bad block), and similar. If stripe is wrong in case of RAID-6, it is possible to pinpoint wrong drive (in assumption that only one drive has wrong information). Yet, RAIDs are not designed to insure that information read from them is correct, every zfs proponent will tell that, and they are correct. > The drives were identical in > every way, and I guess the the stress of a rebuild can play a part in > synchronising failures. I would disagree with that, based on my experience: we have several dozens of hardware and software RAIDs, over decade and a half, all of them run 24/7/365, and some RAIDs are over decade and a half of age, failed drives are being hot (in hardware RAIDs) or cold (in software RAIDs) replaced routinely, no other drives in the same RAID fails during rebuild task. Drives are chosen of reliable brand/models, still they do fail occasionally. I don't remember exact rate of drive failures in our boxes, but I would say it is less often than one drive a Month that I have to replace. And we have about 2 petabyte of data composite on our boxes. So: I respectfully disagree: normally "rebuild stress" should not be any different from normal routine RAID [device] use. Valeri ++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53910bc6-c16d-44eb-4dee-56d95226a217>