From owner-freebsd-fs@freebsd.org Mon Mar 7 06:18:28 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 190ADAC2A17; Mon, 7 Mar 2016 06:18:28 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: from mail-lb0-x229.google.com (mail-lb0-x229.google.com [IPv6:2a00:1450:4010:c04::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8388D85A; Mon, 7 Mar 2016 06:18:27 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: by mail-lb0-x229.google.com with SMTP id k15so118818080lbg.0; Sun, 06 Mar 2016 22:18:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=hRaNaJ1s+015Wsux9tpeW+6Al4cZnb6HjDhJ5AJXYeM=; b=OaXBgVjsiH/6SXCth3iH46uTUuEaakhIpjW/vYBGuX4LmOLz+qk1nKovyk4fb5rVAQ TlyjFavLvNYAoTJ2CaePv04c4Oe46G+KO6nnbFwKborqlXnF0wK2YXLFSdY/+Ufb0gjE gAfMat1VRxrSnojZdSkTP5IGuHVS1pNWBWsw+GCVlht4Bp5Hm+NUd64hEefeC8/wLtYq assCmQnb3dGLzlFeSNv/lC8kDNsaQZl8ys9XTtsIimCA6h/OlRx5+csD49yflkuw7+5/ ezckM/6ya9hPI0QN6O3Ci+RmDkn+pOZEb+TT0CSbBtmsZvZvcOSp8IFYVDTC+3iw3jhj Gs2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=hRaNaJ1s+015Wsux9tpeW+6Al4cZnb6HjDhJ5AJXYeM=; b=KBPD8u9257eB0oijoVnlns2mRsyrEXFz3tAPpn+SbK8ekhiFue1IqZhfCLqEp8RIXW XQE3Q43JiVeRgMI0h7JEZ2ajBcH74hhISO+PQYI2at/GPBIPJ55NVqeTSIRaj8zrhhXE zdMBlyPggAJyJwSxHjck1COndXpgw3Zyg65TOOUn2v+FZg1aA7CgmyjshdD4S0F7YtjU dSNuZdqIdOY6yp6x6ehepgYBc7K+aqwpn1Wk/MGCEJZIqVJ6snvDnenUpgiru5cczlxg poErzc6iFXFBypTzadB8TBe7PgEZ46tAvaYPxzD4h52hOB081BBTDzmWM7ZaV12TYlct hP3w== X-Gm-Message-State: AD7BkJIjlRtLbTkstNgxxxpEK/uKwqo1Vr9xiYWvxzISwhloFda0No8u4EfY37nV5Y0i2en+j8kCwKzzc5Gmgg== MIME-Version: 1.0 X-Received: by 10.112.149.73 with SMTP id ty9mr5563688lbb.48.1457331505362; Sun, 06 Mar 2016 22:18:25 -0800 (PST) Received: by 10.25.20.164 with HTTP; Sun, 6 Mar 2016 22:18:24 -0800 (PST) In-Reply-To: <6E2B77D1-E0CA-4901-A6BD-6A22C07536B3@gmail.com> References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> <6E2B77D1-E0CA-4901-A6BD-6A22C07536B3@gmail.com> Date: Mon, 7 Mar 2016 14:18:24 +0800 Message-ID: Subject: Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built From: Fred Liu To: "smartos-discuss@lists.smartos.org" Cc: illumos-zfs , developer@lists.open-zfs.org, developer , illumos-developer , omnios-discuss , Discussion list for OpenIndiana , "zfs-discuss@list.zfsonlinux.org" , "freebsd-fs@FreeBSD.org" , "zfs-devel@freebsd.org" X-Mailman-Approved-At: Mon, 07 Mar 2016 12:42:35 +0000 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 06:18:28 -0000 2016-03-07 14:04 GMT+08:00 Richard Elling : > > On Mar 6, 2016, at 9:06 PM, Fred Liu wrote: > > > > 2016-03-06 22:49 GMT+08:00 Richard Elling < > richard.elling@richardelling.com>: > >> >> On Mar 3, 2016, at 8:35 PM, Fred Liu wrote: >> >> Hi, >> >> Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC >> RAID introduction, >> the interesting survey -- the zpool with most disks you have ever built >> popped in my brain. >> >> >> We test to 2,000 drives. Beyond 2,000 there are some scalability issues >> that impact failover times. >> We=E2=80=99ve identified these and know what to fix, but need a real cus= tomer at >> this scale to bump it to >> the top of the priority queue. >> >> [Fred]: Wow! 2000 drives almost need 4~5 whole racks! > >> >> For zfs doesn't support nested vdev, the maximum fault tolerance should >> be three(from raidz3). >> >> >> Pedantically, it is N, because you can have N-way mirroring. >> > > [Fred]: Yeah. That is just pedantic. N-way mirroring of every disk works > in theory and rarely happens in reality. > >> >> It is stranded if you want to build a very huge pool. >> >> >> Scaling redundancy by increasing parity improves data loss protection by >> about 3 orders of >> magnitude. Adding capacity by striping reduces data loss protection by >> 1/N. This is why there is >> not much need to go beyond raidz3. However, if you do want to go there, >> adding raidz4+ is >> relatively easy. >> > > [Fred]: I assume you used stripped raidz3 vedvs in your storage mesh of > 2000 drives. If that is true, the possibility of 4/2000 will be not so lo= w. > Plus, reslivering takes longer time if single disk has bigger > capacity. And further, the cost of over-provisioning spare disks vs raidz= 4+ > will be an deserved > trade-off when the storage mesh at the scale of 2000 drives. > > > Please don't assume, you'll just hurt yourself :-) > For example, do not assume the only option is striping across raidz3 > vdevs. Clearly, there are many > different options. > [Fred]: Yeah. Assumptions always go far way from facts! ;-) Is designing a storage mesh with 2000 drives biz secret? Or it is just too complicate to elaborate? Never mind. ;-) Thanks. Fred > > *smartos-discuss* | Archives > > | > Modify > > Your Subscription >