From owner-freebsd-stable@freebsd.org Tue Feb 9 17:28:34 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 36E0AAA2131 for ; Tue, 9 Feb 2016 17:28:34 +0000 (UTC) (envelope-from crest@rlwinm.de) Received: from smtp.rlwinm.de (smtp.rlwinm.de [IPv6:2a01:4f8:201:31ef::e]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 02C181D3E for ; Tue, 9 Feb 2016 17:28:34 +0000 (UTC) (envelope-from crest@rlwinm.de) Received: from crest.local (unknown [87.253.189.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.rlwinm.de (Postfix) with ESMTPSA id 7E503595E for ; Tue, 9 Feb 2016 18:28:22 +0100 (CET) Subject: Re: Best practices for ZFS setup for a strictly SSD based system? To: freebsd-stable@freebsd.org References: <2D296837-3B06-4E72-B8B0-A33AE6CE48AE@punkt.de> From: Jan Bramkamp Message-ID: <56BA21B6.3070308@rlwinm.de> Date: Tue, 9 Feb 2016 18:28:22 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <2D296837-3B06-4E72-B8B0-A33AE6CE48AE@punkt.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Feb 2016 17:28:34 -0000 On 09/02/16 16:54, Patrick M. Hausen wrote: > Hi, all, > > while there is quite a bit of documentation on how to improve ZFS performance > by using a combination of rotating disks and SSDs, I have not found much about > an SSD only setup. > > We are planning to try a hosting server with 8 SATA SSDs with ZFS. Things I am > not at all sure about: > > * Does the recommended limit of 6 disks for a RAIDZ2 still > hold? 2x 4 disks is quite a bit of overhead, could I use all 8 > in one vdev and get away with it? > (The maximum of 6 recommendation is in some old Sun doc) There are multiple reasons to limit number of disks per RAID-Z VDEV. * Resilver time: ZFS has to process all objects ordered by transaction id to resilver a RAID-Z. Resilvering is a torture test for the remaining disks of your degraded RAID-Z and with the ratio of bandwidth to capacity of current hard disks resilvering takes too long. This isn't an issue for SSDs. * For performance estimations think of the RAID-Z of one huge disk with larger blocks but the same IOPS as the slowest disk in the RAID-Z. Databases perform disk I/O in small blocks limiting your RAID-Z to the performance of about one of its member disks. * A ZFS pool can only grow by adding whole VDEVS or replacing all disks in a VDEV one at a time. Using mirror allows the pool to grow in smaller increments. > * Will e.g. MySQL still profit from residing on a mirror > instead of a RAIDZ2, even if all disks are SSDs? Yes OpenZFS schedules reads on mirrors to the disk with the shortest queue thus a mirror offers about sum of its member disks in read performance (IOPS and bandwidth) and the minimum of its member disks in write performance (IOPS and bandwidth). A pool with as many mirrored VDEVs as possible will offer the optimal performance for a given number of disks. For write heavy workloads the quality of the SSDs matters a lot as well. Cheap consumer SSDs can't sustain high write rates for any length of time. Even medium quality SSDs have a lot of jitter and suffer from throughput degradation under sustained write loads. Optimized server SSDs can sustain random write workloads with little jitter and bounded latency. A NVMe SSD can offer an additional order of magnitude performance increase over SATA SSDs but at a significant increase in price. With multiple NVMe SSDs you will run into the current scalability limits of ZFS and GEOM. > * Does a separate ZIL and/or ARC cache device still > make sense? Most likely not. An other optimization is splitting the log and table space and creating a dedicated ZFS dataset for each. Create the dataset containing the table space with the fixed record size of your MySQL backend. ZFS also offers a lot more consistency and atomicity quarantines than required by a minimal POSIX file system. This allows you to further reduce the syncing overhead by tuning MySQL to take advantage of ZFS quarantines.