Date: Sun, 27 Jul 2008 02:55:36 +0200 From: Ivan Voras <ivoras@freebsd.org> To: freebsd-questions@freebsd.org Subject: Re: graid3 Message-ID: <g6gh2a$a7m$1@ger.gmane.org> In-Reply-To: <20080725114402.G5386@wojtek.tensor.gdynia.pl> References: <20080725114402.G5386@wojtek.tensor.gdynia.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig63630725DBF5A87B69DB818C Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Wojciech Puchar wrote: > i read the graid3 manual and http://www.acnc.com/04_01_03.html to make = > sure i know what's RAID3 and i don't understand few things. >=20 > 1) >=20 > "The number of components must be equal to 3, 5, 9, 17, etc. > (2^n + 1)." >=20 > why it can't be say 5 disks+parity? The reason is in the definition on "RAID 3", which says the updates to=20 the RAID device must be atomic. In some ideal universe, RAID 3 is=20 implemented in hardware and on individual bytes, but here we cannot=20 write to the drives in units other than sectorsize and sectorsize is 512 = bytes. Parity needs to be calculated with regards to each sector, so at the=20 sector level, the minimum number of sectors is three sectors: two for=20 data and one for parity. This means the high-level atomic sectorsize is=20 2*512=3D1024 bytes. If you inspect your RAID 3 devices, you'll see just t= hat: # diskinfo -v /dev/raid3/homes /dev/raid3/homes 1024 # sectorsize 107374181376 # mediasize in bytes (100G) 104857599 # mediasize in sectors But each drive has a normal sectorsize of 512: # diskinfo -v /dev/ad4 /dev/ad4 512 # sectorsize 80026361856 # mediasize in bytes (75G) 156301488 # mediasize in sectors Sector sizes cannot be arbitrary for various reasons, mostly dealing=20 with how memory pages and virtual memory are managed. In short, they=20 need to be powers of two. This restricts us to high-level ("big") sector = sizes that can be exactly one of the following values: 1024, 2048, 4096, = 8192, etc. Since drive sectors are fixed to 512 bytes, this means that=20 the number of *data* drives must also be a power of two: 2, 4, 8, 16,=20 etc. Add one more drive for the parity and you get the starting=20 sequence: 3, 5, 9, 17. In practice, this means that if you have 17 drives in RAID3, the=20 sectorsize of the array itself will be 16*512 =3D 8192. Each write to the= =20 array will update all 17 drives before returning (one sector on each=20 drive, ensuring an atomic operation). Note that the file system created=20 on such an array will also have its characteristics modified to the=20 sector size (the fragment size will be the sector size). > 2) "-r Use parity component for reading in round-robin fashion. > "Without this option the parity component is not used at > all for reading operations when the device is in a complete state. > With this option specified random I/O read operations are even 40% fas= ter > , but sequential reads are slower. One cannot use this option if the -= w=20 > option is also specified." >=20 >=20 > how parity disk could speed up random I/O? It will work well only when the number of drives is small (i.e. three=20 drives), by using the parity drive as a valid source of data, avoiding=20 some seeks to all drives. I think that, theoretically, you can save at=20 most 0.33 (1/3) of all seeks - I don't know where the 40% number comes fr= om. --------------enig63630725DBF5A87B69DB818C Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIi8eJldnAQVacBcgRAmfQAKCRMuPfeZdLbi1GeVZmb3H8JgY6SwCgmOnU od/i6cQGCMEqMgGT84himXM= =WSbr -----END PGP SIGNATURE----- --------------enig63630725DBF5A87B69DB818C--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?g6gh2a$a7m$1>