Date: Fri, 29 Jan 2016 16:10:16 -0500 From: Paul Kraus <paul@kraus-haus.org> To: Graham Allan <allan@physics.umn.edu>, FreeBSD Filesystems <freebsd-fs@freebsd.org> Subject: Re: quantifying zpool performance with number of vdevs Message-ID: <7E3F58C9-94ED-4491-A0FD-7AAB413F2E03@kraus-haus.org> In-Reply-To: <56ABAA18.90102@physics.umn.edu> References: <56ABAA18.90102@physics.umn.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jan 29, 2016, at 13:06, Graham Allan <allan@physics.umn.edu> wrote: > In many of the storage systems I built to date I was slightly = conservative (?) in wanting to keep any one pool confined to a single = JBOD chassis. In doing this I've generally been using the Supermicro = 45-drive chassis with pools made of 4x (8+2) raidz2, other slots being = kept for spares, ZIL and L2ARC. > Obviously theory says that iops should scale with number of vdevs but = it would be nice to try and quantify. >=20 > Getting relevant data out of iperf seems problematic on machines with = 128GB+ RAM - it's hard to blow out the ARC. In a pervious life, where I was responsible for over 200 TB of storage = (in 2008, back when that was a lot), I did some testing for both = reliability and performance before committing to a configuration for our = new storage system. It was not FreeBSD but Solaris and we have 5 x J4400 = chassis (each with 24 drives) all dual SAS attached on four HBA ports. This link = https://docs.google.com/spreadsheets/d/13sLzYKkmyi-ceuIlUS2q0oxcmRnTE-BRvB= YHmEJteAY/edit?usp=3Dsharing has some of the performance testing I did. = I did not look at Sequential Read as that was not in our workload, in = hindsight I should have. By limiting the ARC, the entire ARC, to 4 GB I = was able to get reasonable accurate results. The number of vdevs made = very little difference to Sequential Writes, but Random Reads and Writes = scaled very linearly with the number of top level vdevs. Our eventual config was RAIDz2 based because we could not meet the space = requirements with mirrors, especially as we would have to have gone with = 3-way mirrors to get the same MTTDL as with the RAIDz2. The production = pool consisted of 22 top level vdevs, each was a 5-drive RAIDz2 where = each drive was a in a different disk chassis. So all of the drives in = slot 0 and 1 were hot spares, all of the drives in slot 2 made up one = vdev, all of the drives in slot 3 made up one vdev, etc. So we were = striping data across 22 vdevs. During pre-production testing we = completely lost connectivity to 2 of the 5 disk chassis and had no loss = of data or availability. When those chassis came back, they resilvered = and went along their merry way (just as they should). Once the system went live we took hourly snapshots and replicated them = both locally and remotely for backup purposes. We estimated that it = would have taken over 3 weeks to restore all the data from tape if we = had to, and that was unacceptable. The only issue we ran into related to = resilvering after a drive failure. Due to the large number of snapshots = and the ongoing snapshot creation, a resilver could take over a week. -- Paul Kraus paul@kraus-haus.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7E3F58C9-94ED-4491-A0FD-7AAB413F2E03>