From owner-freebsd-fs@freebsd.org Mon Mar 7 05:07:02 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 26590AC2F31; Mon, 7 Mar 2016 05:07:02 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: from mail-lb0-x22b.google.com (mail-lb0-x22b.google.com [IPv6:2a00:1450:4010:c04::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8E45A21C; Mon, 7 Mar 2016 05:07:01 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: by mail-lb0-x22b.google.com with SMTP id k15so117597516lbg.0; Sun, 06 Mar 2016 21:07:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=eqNWEbnG4F0PhqztdQJB0RydrbkLOpX+3fztwzTCtkI=; b=LF5yQfs5tY+iniRnhnEzxVD5j1D35gcRu9uHbXVCg34I5+UHAn5O8RODsLjCbdg1aS xLDGPJlyuT+dbPQdL3gdKlyPDKrwLKhufxj7lgzp2vqXu/w9eIMZoyT7pg6u7jzqYwlm b0+xtW5GsxY1hK1GCx0OfH3dwajaEISkpQVzMRtFwMPjSqDkL+FOWQe2voWMSYRgmjr/ yzNs3j5E4HPJfyBYUvEjgrMlaa2hD7DhBtw6lZzze7qlgHANjTEF9m9F9DvvxiWpGZwV Qx8JdjUVpYf1eZ6ZGNa2V3d1WYaKkvXzAUMivq6dc+8FnCk48D/HMTAJjs3Mr7mP+OiK /rQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=eqNWEbnG4F0PhqztdQJB0RydrbkLOpX+3fztwzTCtkI=; b=ERkZwiEFbWCwk0LslH0NR83/d3V36/DgOQM4GOE75yL9jKzChp2JFxrpwScjPKsUM7 aDkWS4d7zt8ZP+VEtcASqnqi+WO6pv89Ep3EY/aoa/O9mQ0a1v1idejTVcZith3FoD1f dlrUH6ZKYT0TXGZLeFhUXnwsF6NXvJV95qCdEXLd2g+B67PIxoiK4hQS281dDSfsTADP OynSnQfkd0NLle9uiUyEKsoqpmv/cDyIRTAY2sw71kgsNtgLP0CE/Lde2CtCCmlShhwS SghrkfreUUmDwtM9U5DkSA9rsXBh0s5TTYTEoC9txvl5NE2MrGke7niSO8AHsMrh0TX9 dbeg== X-Gm-Message-State: AD7BkJLHdTDzza0QnFs5Ch+egjKBdbc1hQiHja9qA6PTFiI14NHLobBemDoc7Ejg6k0ewIaQtqNulMVeogcrAA== MIME-Version: 1.0 X-Received: by 10.25.161.131 with SMTP id k125mr7052392lfe.83.1457327219683; Sun, 06 Mar 2016 21:06:59 -0800 (PST) Received: by 10.25.20.164 with HTTP; Sun, 6 Mar 2016 21:06:59 -0800 (PST) In-Reply-To: <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> Date: Mon, 7 Mar 2016 13:06:59 +0800 Message-ID: Subject: Re: [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built From: Fred Liu To: developer@lists.open-zfs.org Cc: "smartos-discuss@lists.smartos.org" , developer , illumos-developer , omnios-discuss , Discussion list for OpenIndiana , illumos-zfs , "zfs-discuss@list.zfsonlinux.org" , "freebsd-fs@FreeBSD.org" , "zfs-devel@freebsd.org" X-Mailman-Approved-At: Mon, 07 Mar 2016 12:08:47 +0000 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 05:07:02 -0000 2016-03-06 22:49 GMT+08:00 Richard Elling : > > On Mar 3, 2016, at 8:35 PM, Fred Liu wrote: > > Hi, > > Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC RAI= D > introduction, > the interesting survey -- the zpool with most disks you have ever built > popped in my brain. > > > We test to 2,000 drives. Beyond 2,000 there are some scalability issues > that impact failover times. > We=E2=80=99ve identified these and know what to fix, but need a real cust= omer at > this scale to bump it to > the top of the priority queue. > > [Fred]: Wow! 2000 drives almost need 4~5 whole racks! > > For zfs doesn't support nested vdev, the maximum fault tolerance should b= e > three(from raidz3). > > > Pedantically, it is N, because you can have N-way mirroring. > [Fred]: Yeah. That is just pedantic. N-way mirroring of every disk works in theory and rarely happens in reality. > > It is stranded if you want to build a very huge pool. > > > Scaling redundancy by increasing parity improves data loss protection by > about 3 orders of > magnitude. Adding capacity by striping reduces data loss protection by > 1/N. This is why there is > not much need to go beyond raidz3. However, if you do want to go there, > adding raidz4+ is > relatively easy. > [Fred]: I assume you used stripped raidz3 vedvs in your storage mesh of 2000 drives. If that is true, the possibility of 4/2000 will be not so low. Plus, reslivering takes longer time if single disk has bigger capacity. And further, the cost of over-provisioning spare disks vs raidz4+ will be an deserved trade-off when the storage mesh at the scale of 2000 drives. Thanks. Fred > > > -- > > Richard.Elling@RichardElling.com > +1-760-896-4422 > > > > *openzfs-developer* | Archives > > | > Modify > > Your Subscription >