FreeBSD Mail Archives

Date:      Wed, 16 Sep 2020 16:34:04 -0500
From:      Valeri Galtsev <galtsev@kicp.uchicago.edu>
To:        freebsd-questions@freebsd.org
Subject:   Re: partitioning server with 2 hard drives
Message-ID:  <53910bc6-c16d-44eb-4dee-56d95226a217@kicp.uchicago.edu>
In-Reply-To: <20200916210325.08868ecc@gumby.homeunix.com>
References:  <MWHPR06MB32479D288A8D10AD73FC6A329A200@MWHPR06MB3247.namprd06.prod.outlook.com> <20200915231901.e767350415aad298732f72cc@sohara.org> <CAHu1Y70fkMavOh7Dfw5j9ifC1BwihoCxCua3ufauOtTqt=v_CQ@mail.gmail.com> <20200916065432.508e19c3b9b5c0e44a72da3f@sohara.org> <20200916210325.08868ecc@gumby.homeunix.com>

On 2020-09-16 15:03, RW via freebsd-questions wrote:
> On Wed, 16 Sep 2020 06:54:32 +0100
> Steve O'Hara-Smith wrote:
> 
>> On Tue, 15 Sep 2020 16:10:05 -0700
>> Michael Sierchio <kudzu@tenebras.com> wrote:
> 
>>> When, not if, a drive fails, will you find one with precisely the
>>> same geometry / capacity to replace the failed drive and remirror?
>>> Can you do
>>
>> 	You don't need to, you just need one big enough, and 1TB
>> drives are very easy to find.
> 
> Just out of curiosity, does ZFS leave a small amount of space unused on
> a raw drive to allow for a replacement being very slightly smaller?
> 
>   
>>> so before the other drive fails (which might be statistically
>>> likely)?
>>
>> 	Unless you do something daft like RMA the drive and wait then
>> yes IME.
> 
> In the early days of Fastmail they had a major outage where at least 3
> drives failed. IIRC the other failures occurred while the RAID array was
> being rebuilt after the first failure.

I remember someone came to the department with strong opinion that RAIDs 
can have multiple drive failures, thus leading to loss of data. 
Especially when rebuilding after replacement of failed drive. That 
person came from Japan, where people are very smart, and not always use 
professional sysadmins [at universities?], more often just smart people 
do set up computers, servers, number crunchers - including the ones with 
hardware RAIDs.

It took me some effort to convince him that their hardware RAIDs (based 
on venerable 3ware hardware] were just not configured correctly. Good 
configuration includes weekly task: verify RAID. The last just walks 
through the whole surface of drives, reading them stripe by stripe, and 
verifies that redundancy stripes do match mathematical composition of 
other stripes as they should. If this is not done monthly, or better, 
weekly, potential bad areas of drives are not discovered, not till 
rebuild task happens, which can lead of "multiple drive failure" during 
rebuild task. So, routine weekly RAID verifications (known by other 
names as well), prevent "multiple drive failure" during rebuild task.

That most likely was the mode of failure you have mentioned for that 
known incident.

Incidentally, if checksum of RAID-5 do not match for some stripe, there 
is no precise way to pinpoint which drive has wrong information on this 
stripe. There are indirect indications such as drive responding slower 
(due to attempt to re-create by multiple reads and superimposition of 
reads, and re-allocate bad block), and similar. If stripe is wrong in 
case of RAID-6, it is possible to pinpoint wrong drive (in assumption 
that only one drive has wrong information). Yet, RAIDs are not designed 
to insure that information read from them is correct, every zfs 
proponent will tell that, and they are correct.

> The drives were identical in
> every way, and I guess the the stress of a rebuild can play a part in
> synchronising failures.

I would disagree with that, based on my experience: we have several 
dozens of hardware and software RAIDs, over decade and a half, all of 
them run 24/7/365, and some RAIDs are over decade and a half of age, 
failed  drives are being hot (in hardware RAIDs) or cold (in software 
RAIDs) replaced routinely, no other drives in the same RAID fails during 
rebuild task. Drives are chosen of reliable brand/models, still they do 
fail occasionally. I don't remember exact rate of drive failures in our 
boxes, but I would say it is less often than one drive a Month that I 
have to replace. And we have about 2 petabyte of data composite on our 
boxes. So: I respectfully disagree: normally "rebuild stress" should not 
be any different from normal routine RAID [device] use.

Valeri

++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53910bc6-c16d-44eb-4dee-56d95226a217>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation