From owner-freebsd-fs@freebsd.org Mon Mar 7 06:04:13 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C4164AC23CF; Mon, 7 Mar 2016 06:04:13 +0000 (UTC) (envelope-from richard.elling@gmail.com) Received: from mail-pf0-x22c.google.com (mail-pf0-x22c.google.com [IPv6:2607:f8b0:400e:c00::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8EABFE31; Mon, 7 Mar 2016 06:04:13 +0000 (UTC) (envelope-from richard.elling@gmail.com) Received: by mail-pf0-x22c.google.com with SMTP id x188so49825700pfb.2; Sun, 06 Mar 2016 22:04:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:subject:from:in-reply-to:date:cc:message-id:references :to; bh=inFEdDfH3mpHfQnxUD98ZBD5iQPHBkT3TlVx8a+LMKc=; b=Qe7MXPeW1fHNefC6mKPlUO0myUWtk9rFy4YQsDEtJwBtcQrFESb8d5S6JXm39VCRTJ mlnuxqRC9xZ1AtYyLMHsaHOLa44TbY5A5T6fyYhPQzCnHPmtNk3a1ee0mxFycoTR6kNf XL77b304+ZLXPuo2BN1tFGONuG3GXReF+FhSbZlyLexyCkyItW/e2erD5m8Lwb7xx6JW xQXeTFEBBcYpVtZRGK9y2K37xNy3Rw+H71LP4WtKLwzzmryeKzxOsUs+5u/lliBKk+Cp 7+V3iYfxC7lBDZJp5LeWAFhk7I6pCHsYyCgfvP9EiCNjf5spGL7pPZ5z+gBiTxb84coq LKSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=inFEdDfH3mpHfQnxUD98ZBD5iQPHBkT3TlVx8a+LMKc=; b=iJkjJ8wNEZgFZp2mhIZWJ6cA0NySS9IiJ75rBcqsZAf2sZ2OgtEOeW3RZLuGhbB0z7 sc/BBuE66qbokd99UJVS9zz2QpSIt3EwxYY9Z6PXiueZBaYrtMgSan3Wz4po2l/0F0wk 7oyQRBjIpj+6Afo7cLQo10R5BwdvqJts4xxB9x7wq7KQAAOgAOx+WcGic3QZBjnYagg7 x7Fur6JT4+eMmA0nfLJhc2rvyMZ0nD/htAnKHgbd1kJz1vKCXOVHawjGoqHYnS0PO3Mg iEuOvm+ZQSJmH3Df4WFJ4ieDVWIubJrUI0C2wPtLaDQSIYcmzUOLSspCZlJLgJk/BXBw Z24A== X-Gm-Message-State: AD7BkJJTZyaK5yojyae5gPvbGxJ7paFF+cLs5n1WrlAJdoaib6IzrOVzWNosgZlrx380HQ== X-Received: by 10.98.75.196 with SMTP id d65mr30968996pfj.96.1457330652928; Sun, 06 Mar 2016 22:04:12 -0800 (PST) Received: from [192.168.129.108] ([162.250.162.10]) by smtp.gmail.com with ESMTPSA id n68sm21255445pfj.46.2016.03.06.22.04.10 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 06 Mar 2016 22:04:11 -0800 (PST) Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built From: Richard Elling In-Reply-To: Date: Sun, 6 Mar 2016 22:04:09 -0800 Cc: developer@lists.open-zfs.org, "smartos-discuss@lists.smartos.org" , developer , illumos-developer , omnios-discuss , Discussion list for OpenIndiana , "zfs-discuss@list.zfsonlinux.org" , "freebsd-fs@FreeBSD.org" , "zfs-devel@freebsd.org" Message-Id: <6E2B77D1-E0CA-4901-A6BD-6A22C07536B3@gmail.com> References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> To: zfs@lists.illumos.org X-Mailer: Apple Mail (2.3112) X-Mailman-Approved-At: Mon, 07 Mar 2016 12:36:29 +0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 06:04:14 -0000 > On Mar 6, 2016, at 9:06 PM, Fred Liu wrote: >=20 >=20 >=20 > 2016-03-06 22:49 GMT+08:00 Richard Elling = >: >=20 >> On Mar 3, 2016, at 8:35 PM, Fred Liu > wrote: >>=20 >> Hi, >>=20 >> Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC = RAID introduction, >> the interesting survey -- the zpool with most disks you have ever = built popped in my brain. >=20 > We test to 2,000 drives. Beyond 2,000 there are some scalability = issues that impact failover times. > We=E2=80=99ve identified these and know what to fix, but need a real = customer at this scale to bump it to > the top of the priority queue. >=20 > [Fred]: Wow! 2000 drives almost need 4~5 whole racks!=20 >>=20 >> For zfs doesn't support nested vdev, the maximum fault tolerance = should be three(from raidz3). >=20 > Pedantically, it is N, because you can have N-way mirroring. > =20 > [Fred]: Yeah. That is just pedantic. N-way mirroring of every disk = works in theory and rarely happens in reality. >=20 >> It is stranded if you want to build a very huge pool. >=20 > Scaling redundancy by increasing parity improves data loss protection = by about 3 orders of=20 > magnitude. Adding capacity by striping reduces data loss protection by = 1/N. This is why there is > not much need to go beyond raidz3. However, if you do want to go = there, adding raidz4+ is=20 > relatively easy. >=20 > [Fred]: I assume you used stripped raidz3 vedvs in your storage mesh = of 2000 drives. If that is true, the possibility of 4/2000 will be not = so low. > Plus, reslivering takes longer time if single disk has = bigger capacity. And further, the cost of over-provisioning spare disks = vs raidz4+ will be an deserved=20 > trade-off when the storage mesh at the scale of 2000 = drives. Please don't assume, you'll just hurt yourself :-) For example, do not assume the only option is striping across raidz3 = vdevs. Clearly, there are many different options. -- richard