From owner-freebsd-fs@freebsd.org Fri Jan 29 21:11:15 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B088AA72B9A for ; Fri, 29 Jan 2016 21:11:15 +0000 (UTC) (envelope-from paul@kraus-haus.org) Received: from mail-qg0-x22e.google.com (mail-qg0-x22e.google.com [IPv6:2607:f8b0:400d:c04::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 80A9215A4 for ; Fri, 29 Jan 2016 21:11:15 +0000 (UTC) (envelope-from paul@kraus-haus.org) Received: by mail-qg0-x22e.google.com with SMTP id e32so76191331qgf.3 for ; Fri, 29 Jan 2016 13:11:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kraus-haus-org.20150623.gappssmtp.com; s=20150623; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=WKLQoAefjF0I+yHG+cfInW8SrG+2slZmkdnh0WoGv/U=; b=iZTZfJNmwfD2Hox+JoBVWY/khEQRQbBHB4h2XUd64g+fUCY6dYs+WKH0MUJ4L4aTDg ZbZQEUaFSJ7baFFc1NlBhfYBHnheAgqb8ERNYYb7bitnpAIxlgfcun8upC+K0puxb3n5 1BcgnFppBDTDDGxftxrSBglBefXsOluH4UPPCyHLkini2Rwpl0a1BVxcOaVC0UCpbYQu LlqNuPs6DpKhVJvO6Qwxp2CSIBk0wUPVlYcyTRntRDwlwzzLcbiWly3PKtMgTq0LdcQw ZwtQKU4MeS/F8oiacPvoaIATnKgd7f+5XazWzplkF+1vsZunW9I+i+4O4roJ6Q9QGhCY a5tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to; bh=WKLQoAefjF0I+yHG+cfInW8SrG+2slZmkdnh0WoGv/U=; b=D2eYKhUvyoxJxbYd9p/Y7mD7K+rDsZX2puqeZPK2JOQbXq4pKbPeBXMt58w5dWcUGC vRB4POCkR++tbIg9ZuoA+0CD2kYnGqvEgmhev8hZ34JQyPT8PGWWqvTTM4gWSkIuQT0O Ni+bF1GTCBDupaWaZQU/jUVsQrh5dBflg5ACu6atoX04NKeTb1GZ7mb3xd8LIRrHKRn1 4Out3c75I/Kn/uI8lLfWDCwQIrROXSJwuU0MsfdmO2egImGpP1oFSBU40vLP/YtPfZyr vyE5pwJImJB/4Ea+7ObGilDcoHdUz9NwC1q/y2UzkfQt7OrRkN1sJvonaYbt5Ssh4DDT FGnQ== X-Gm-Message-State: AG10YOQqALVd/+02QYAM6L8JuKm63xhGtQc5em9Hyzxfd7cebwm/QW0WuvGFzLy5XFwTRw== X-Received: by 10.140.31.197 with SMTP id f63mr13399360qgf.12.1454101874502; Fri, 29 Jan 2016 13:11:14 -0800 (PST) Received: from [192.168.2.138] (pool-100-4-209-221.albyny.fios.verizon.net. [100.4.209.221]) by smtp.gmail.com with ESMTPSA id d188sm6792479qkb.9.2016.01.29.13.11.12 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 29 Jan 2016 13:11:13 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: quantifying zpool performance with number of vdevs From: Paul Kraus In-Reply-To: <56ABAA18.90102@physics.umn.edu> Date: Fri, 29 Jan 2016 16:10:16 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <7E3F58C9-94ED-4491-A0FD-7AAB413F2E03@kraus-haus.org> References: <56ABAA18.90102@physics.umn.edu> To: Graham Allan , FreeBSD Filesystems X-Mailer: Apple Mail (2.1878.6) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jan 2016 21:11:15 -0000 On Jan 29, 2016, at 13:06, Graham Allan wrote: > In many of the storage systems I built to date I was slightly = conservative (?) in wanting to keep any one pool confined to a single = JBOD chassis. In doing this I've generally been using the Supermicro = 45-drive chassis with pools made of 4x (8+2) raidz2, other slots being = kept for spares, ZIL and L2ARC. > Obviously theory says that iops should scale with number of vdevs but = it would be nice to try and quantify. >=20 > Getting relevant data out of iperf seems problematic on machines with = 128GB+ RAM - it's hard to blow out the ARC. In a pervious life, where I was responsible for over 200 TB of storage = (in 2008, back when that was a lot), I did some testing for both = reliability and performance before committing to a configuration for our = new storage system. It was not FreeBSD but Solaris and we have 5 x J4400 = chassis (each with 24 drives) all dual SAS attached on four HBA ports. This link = https://docs.google.com/spreadsheets/d/13sLzYKkmyi-ceuIlUS2q0oxcmRnTE-BRvB= YHmEJteAY/edit?usp=3Dsharing has some of the performance testing I did. = I did not look at Sequential Read as that was not in our workload, in = hindsight I should have. By limiting the ARC, the entire ARC, to 4 GB I = was able to get reasonable accurate results. The number of vdevs made = very little difference to Sequential Writes, but Random Reads and Writes = scaled very linearly with the number of top level vdevs. Our eventual config was RAIDz2 based because we could not meet the space = requirements with mirrors, especially as we would have to have gone with = 3-way mirrors to get the same MTTDL as with the RAIDz2. The production = pool consisted of 22 top level vdevs, each was a 5-drive RAIDz2 where = each drive was a in a different disk chassis. So all of the drives in = slot 0 and 1 were hot spares, all of the drives in slot 2 made up one = vdev, all of the drives in slot 3 made up one vdev, etc. So we were = striping data across 22 vdevs. During pre-production testing we = completely lost connectivity to 2 of the 5 disk chassis and had no loss = of data or availability. When those chassis came back, they resilvered = and went along their merry way (just as they should). Once the system went live we took hourly snapshots and replicated them = both locally and remotely for backup purposes. We estimated that it = would have taken over 3 weeks to restore all the data from tape if we = had to, and that was unacceptable. The only issue we ran into related to = resilvering after a drive failure. Due to the large number of snapshots = and the ongoing snapshot creation, a resilver could take over a week. -- Paul Kraus paul@kraus-haus.org