From owner-svn-doc-projects@FreeBSD.ORG Wed Feb 26 23:49:38 2014 Return-Path: Delivered-To: svn-doc-projects@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5C0DAE3F; Wed, 26 Feb 2014 23:49:38 +0000 (UTC) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 39FA71A6E; Wed, 26 Feb 2014 23:49:38 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.8/8.14.8) with ESMTP id s1QNncfV072676; Wed, 26 Feb 2014 23:49:38 GMT (envelope-from wblock@svn.freebsd.org) Received: (from wblock@localhost) by svn.freebsd.org (8.14.8/8.14.8/Submit) id s1QNncLC072675; Wed, 26 Feb 2014 23:49:38 GMT (envelope-from wblock@svn.freebsd.org) Message-Id: <201402262349.s1QNncLC072675@svn.freebsd.org> From: Warren Block Date: Wed, 26 Feb 2014 23:49:38 +0000 (UTC) To: doc-committers@freebsd.org, svn-doc-projects@freebsd.org Subject: svn commit: r44084 - projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs X-SVN-Group: doc-projects MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-doc-projects@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: SVN commit messages for doc projects trees List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Feb 2014 23:49:38 -0000 Author: wblock Date: Wed Feb 26 23:49:37 2014 New Revision: 44084 URL: http://svnweb.freebsd.org/changeset/doc/44084 Log: ZFS tuning content addtions by Allan Jude . Submitted by: Allan Jude Modified: projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml Modified: projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml ============================================================================== --- projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml Wed Feb 26 23:44:33 2014 (r44083) +++ projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml Wed Feb 26 23:49:37 2014 (r44084) @@ -675,7 +675,11 @@ errors: No known data errors ideally at least once every three months. The scrub operating is very disk-intensive and will reduce performance while running. Avoid high-demand - periods when scheduling scrub. + periods when scheduling scrub or use vfs.zfs.scrub_delay + to adjust the relative priority of the + scrub to prevent it interfering with other + workloads. &prompt.root; zpool scrub mypool &prompt.root; zpool status @@ -890,7 +894,8 @@ errors: No known data errors After the scrub operation has completed and all the data has been synchronized from ada0 to - ada1, the error messages can be cleared + ada1, the error messages can be cleared from the pool status by running zpool clear. @@ -2014,7 +2019,258 @@ mypool/compressed_dataset logicalused <acronym>ZFS</acronym> Tuning - + There are a number of tunables that can be adjusted to + make ZFS perform best for different + workloads. + + + + + vfs.zfs.arc_max - + Sets the maximum size of the ARC. + The default is all RAM less 1 GB, + or 1/2 of ram, whichever is more. However a lower value + should be used if the system will be running any other + daemons or processes that may require memory. This value + can only be adjusted at boot time, and is set in + /boot/loader.conf. + + + + + vfs.zfs.arc_meta_limit + - Limits the portion of the ARC + that can be used to store metadata. The default is 1/4 of + vfs.zfs.arc_max. Increasing this value + will improve performance if the workload involves + operations on a large number of files and directories, or + frequent metadata operations, at the cost of less file + data fitting in the ARC. + This value can only be adjusted at boot time, and is set + in /boot/loader.conf. + + + + + vfs.zfs.arc_min - + Sets the minimum size of the ARC. + The default is 1/2 of + vfs.zfs.arc_meta_limit. Adjust this + value to prevent other applications from pressuring out + the entire ARC. + This value can only be adjusted at boot time, and is set + in /boot/loader.conf. + + + + + vfs.zfs.vdev.cache.size + - A preallocated amount of memory reserved as a cache for + each device in the pool. The total amount of memory used + will be this value multiplied by the number of devices. + This value can only be adjusted at boot time, and is set + in /boot/loader.conf. + + + + + vfs.zfs.prefetch_disable + - Toggles prefetch, a value of 0 is enabled and 1 is + disabled. The default is 0, unless the system has less + than 4 GB of RAM. Prefetch works + by reading larged blocks than were requested into the + ARC + in hopes that the data will be needed soon. If the + workload has a large number of random reads, disabling + prefetch may actually improve performance by reducing + unnecessary reads. This value can be adjusted at any time + with &man.sysctl.8;. + + + + + vfs.zfs.vdev.trim_on_init + - Controls whether new devices added to the pool have the + TRIM command run on them. This ensures + the best performance and longevity for + SSDs, but takes extra time. If the + device has already been secure erased, disabling this + setting will make the addition of the new device faster. + This value can be adjusted at any time with + &man.sysctl.8;. + + + + + vfs.zfs.write_to_degraded + - Controls whether new data is written to a vdev that is + in the DEGRADED + state. Defaults to 0, preventing writes to any top level + vdev that is in a degraded state. The administrator may + with to allow writing to degraded vdevs to prevent the + amount of free space across the vdevs from becoming + unbalanced, which will reduce read and write performance. + This value can be adjusted at any time with + &man.sysctl.8;. + + + + + vfs.zfs.vdev.max_pending + - Limits the number of pending I/O requests per device. + A higher value will keep the device command queue full + and may give higher throughput. A lower value will reduce + latency. This value can be adjusted at any time with + &man.sysctl.8;. + + + + + vfs.zfs.top_maxinflight + - The maxmimum number of outstanding I/Os per top-level + vdev. Limits the + depth of the command queue to prevent high latency. The + limit is per top-level vdev, meaning the limit applies to + each mirror, + RAID-Z, or + other vdev independantly. This value can be adjusted at + any time with &man.sysctl.8;. + + + + + vfs.zfs.l2arc_write_max + - Limits the amount of data written to the L2ARC + per second. This tunable is designed to extend the + longevity of SSDs by limiting the + amount of data written to the device. This value can be + adjusted at any time with &man.sysctl.8;. + + + + + vfs.zfs.l2arc_write_boost + - The value of this tunable is added to vfs.zfs.l2arc_write_max + and increases the write speed to the + SSD until the first block is evicted + from the L2ARC. + This "Turbo Warmup Phase" is designed to reduce the + performance loss from an empty L2ARC + after a reboot. This value can be adjusted at any time + with &man.sysctl.8;. + + + + + vfs.zfs.no_scrub_io + - Disable scrub + I/O. Causes scrub to not actually read + the data blocks and verify their checksums, effectively + turning any scrub in progress into a + no-op. This may be useful if a scrub + is interferring with other operations on the pool. This + value can be adjusted at any time with + &man.sysctl.8;. + + If this tunable is set to cancel an + in-progress scrub, be sure to unset + it afterwards or else all future + scrub and resilver operations + will be ineffective. + + + + + vfs.zfs.scrub_delay + - Determines the milliseconds of delay inserted between + each I/O during a scrub. + To ensure that a scrub does not + interfere with the normal operation of the pool, if any + other I/O is happening the scrub will + delay between each command. This value allows you to + limit the total IOPS (I/Os Per Second) + generated by the scrub. The default + value is 4, resulting in a limit of: 1000  ms / 4 = + 250 IOPS. Using a value of + 20 would give a limit of: + 1000 ms / 20 = 50 IOPS. The + speed of scrub is only limited when + there has been only recent activity on the pool, as + determined by vfs.zfs.scan_idle. + This value can be adjusted at any time with + &man.sysctl.8;. + + + + + vfs.zfs.resilver_delay + - Determines the milliseconds of delay inserted between + each I/O during a resilver. To ensure + that a resilver does not interfere with + the normal operation of the pool, if any other I/O is + happening the resilver will delay + between each command. This value allows you to limit the + total IOPS (I/Os Per Second) generated + by the resilver. The default value is + 2, resulting in a limit of: 1000  ms / 2 = + 500 IOPS. Returning the pool to + an Online state may + be more important if another device failing could Fault the pool, causing + data loss. A value of 0 will give the + resilver operation the same priority as + other operations, speeding the healing process. The speed + of resilver is only limited when there + has been other recent activity on the pool, as determined + by vfs.zfs.scan_idle. + This value can be adjusted at any time with + &man.sysctl.8;. + + + + + vfs.zfs.scan_idle + - How many milliseconds since the last operation before + the pool is considered idle. When the pool is idle the + rate limiting for scrub + and resilver are disabled. + This value can be adjusted at any time with + &man.sysctl.8;. + + + + + vfs.zfs.txg.timeout + - Maximum seconds between transaction groups. The + current transaction group will be written to the pool and + a fresh transaction group started if this amount of time + has elapsed since the previous transaction group. A + transaction group my be triggered earlier if enough data + is written. The default value is 5 seconds. A larger + value may improve read performance by delaying + asynchronous writes, but this may cause uneven performance + when the transaction group is written. This value can be + adjusted at any time with &man.sysctl.8;. + + @@ -2356,6 +2612,76 @@ vfs.zfs.vdev.cache.size="5M" + Transaction Group + (TXG) + + Transaction Groups are the way changed blocks are + grouped together and eventually written to the pool. + Transaction groups are the atomic unit that + ZFS uses to assert consistency. Each + transaction group is assigned a unique 64-bit + consecutive identifier. There can be up to three active + transaction groups at a time, one in each of these three + states: + + + + Open - When a new + transaction group is created, it is in the open + state, and accepts new writes. There is always + a transaction group in the open state, however the + transaction group may refuse new writes if it has + reached a limit. Once the open transaction group + has reached a limit, or the vfs.zfs.txg.timeout + has been reached, the transaction group advances + to the next state. + + + + Quiescing - A short state + that allows any pending operations to finish while + not blocking the creation of a new open + transaction group. Once all of the transactions + in the group have completed, the transaction group + advances to the final state. + + + + Syncing - All of the data + in the transaction group is written to stable + storage. This process will in turn modify other + data, such as metadata and space maps, that will + also need to be written to stable storage. The + process of syncing involves multiple passes. The + first, all of the changed data blocks, is the + biggest, followed by the metadata, which may take + multiple passes to complete. Since allocating + space for the data blocks generates new metadata, + the syncing state cannot finish until a pass + completes that does not allocate any additional + space. The syncing state is also where + synctasks are completed. + Synctasks are administrative + operations, such as creating or destroying + snapshots and datasets, that modify the uberblock + are completed. Once the sync state is complete, + the transaction group in the quiescing state is + advanced to the syncing state. + + + + All administrative functions, such as snapshot + are written as part of the transaction group. When a + synctask is created, it is added to + the currently open transaction group, and that group is + advances as quickly as possible to the syncing state in + order to reduce the latency of administrative + commands. + + + Adaptive Replacement Cache (ARC) @@ -2419,12 +2745,13 @@ vfs.zfs.vdev.cache.size="5M"L2ARC is limited to the sum of the write limit and the boost limit, then after that limited to the write limit. A - pair of sysctl values control these rate limits; - vfs.zfs.l2arc_write_max controls how - many bytes are written to the cache per second, while - vfs.zfs.l2arc_write_boost adds to - this limit during the "Turbo Warmup Phase" (Write - Boost). + pair of sysctl values control these rate limits; vfs.zfs.l2arc_write_max + controls how many bytes are written to the cache per + second, while vfs.zfs.l2arc_write_boost + adds to this limit during the "Turbo Warmup Phase" + (Write Boost). @@ -2682,7 +3009,7 @@ vfs.zfs.vdev.cache.size="5M" @@ -2746,7 +3073,12 @@ vfs.zfs.vdev.cache.size="5M"scrub makes sure even infrequently used blocks are checked for silent corruption. This improves the security of the data, - especially in archival storage situations. + especially in archival storage situations. The relative + priority of scrub can be adjusted + with vfs.zfs.scrub_delay + to prevent the scrub from degrading the performance of + other workloads on your pool.