Date: Sun, 6 Mar 2016 22:04:09 -0800 From: Richard Elling <richard.elling@gmail.com> To: zfs@lists.illumos.org Cc: developer@lists.open-zfs.org, "smartos-discuss@lists.smartos.org" <smartos-discuss@lists.smartos.org>, developer <developer@open-zfs.org>, illumos-developer <developer@lists.illumos.org>, omnios-discuss <omnios-discuss@lists.omniti.com>, Discussion list for OpenIndiana <openindiana-discuss@openindiana.org>, "zfs-discuss@list.zfsonlinux.org" <zfs-discuss@list.zfsonlinux.org>, "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>, "zfs-devel@freebsd.org" <zfs-devel@freebsd.org> Subject: Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built Message-ID: <6E2B77D1-E0CA-4901-A6BD-6A22C07536B3@gmail.com> In-Reply-To: <CALi05Xxm9Sdx9dXCU4C8YhUTZOwPY%2BNQqzmMEn5d0iFeOES6gw@mail.gmail.com> References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <CACTb9pxJqk__DPN_pDy4xPvd6ETZtbF9y=B8U7RaeGnn0tKAVQ@mail.gmail.com> <CAJjvXiH9Wh%2BYKngTvv0XG1HtikWggBDwjr_MCb8=Rf276DZO-Q@mail.gmail.com> <56D87784.4090103@broken.net> <A5A6EA4AE9DCC44F8E7FCB4D6317B1D203178F1DD392@SH-MAIL.ISSI.COM> <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> <CALi05Xxm9Sdx9dXCU4C8YhUTZOwPY%2BNQqzmMEn5d0iFeOES6gw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Mar 6, 2016, at 9:06 PM, Fred Liu <fred.fliu@gmail.com> wrote: >=20 >=20 >=20 > 2016-03-06 22:49 GMT+08:00 Richard Elling = <richard.elling@richardelling.com = <mailto:richard.elling@richardelling.com>>: >=20 >> On Mar 3, 2016, at 8:35 PM, Fred Liu <Fred_Liu@issi.com = <mailto:Fred_Liu@issi.com>> wrote: >>=20 >> Hi, >>=20 >> Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC = RAID introduction, >> the interesting survey -- the zpool with most disks you have ever = built popped in my brain. >=20 > We test to 2,000 drives. Beyond 2,000 there are some scalability = issues that impact failover times. > We=E2=80=99ve identified these and know what to fix, but need a real = customer at this scale to bump it to > the top of the priority queue. >=20 > [Fred]: Wow! 2000 drives almost need 4~5 whole racks!=20 >>=20 >> For zfs doesn't support nested vdev, the maximum fault tolerance = should be three(from raidz3). >=20 > Pedantically, it is N, because you can have N-way mirroring. > =20 > [Fred]: Yeah. That is just pedantic. N-way mirroring of every disk = works in theory and rarely happens in reality. >=20 >> It is stranded if you want to build a very huge pool. >=20 > Scaling redundancy by increasing parity improves data loss protection = by about 3 orders of=20 > magnitude. Adding capacity by striping reduces data loss protection by = 1/N. This is why there is > not much need to go beyond raidz3. However, if you do want to go = there, adding raidz4+ is=20 > relatively easy. >=20 > [Fred]: I assume you used stripped raidz3 vedvs in your storage mesh = of 2000 drives. If that is true, the possibility of 4/2000 will be not = so low. > Plus, reslivering takes longer time if single disk has = bigger capacity. And further, the cost of over-provisioning spare disks = vs raidz4+ will be an deserved=20 > trade-off when the storage mesh at the scale of 2000 = drives. Please don't assume, you'll just hurt yourself :-) For example, do not assume the only option is striping across raidz3 = vdevs. Clearly, there are many different options. -- richard
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6E2B77D1-E0CA-4901-A6BD-6A22C07536B3>