From owner-svn-doc-projects@FreeBSD.ORG Thu Aug 15 01:08:24 2013 Return-Path: Delivered-To: svn-doc-projects@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 43541DFD; Thu, 15 Aug 2013 01:08:24 +0000 (UTC) (envelope-from wblock@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2FB212738; Thu, 15 Aug 2013 01:08:24 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.7/8.14.7) with ESMTP id r7F18ORS056619; Thu, 15 Aug 2013 01:08:24 GMT (envelope-from wblock@svn.freebsd.org) Received: (from wblock@localhost) by svn.freebsd.org (8.14.7/8.14.5/Submit) id r7F18OET056618; Thu, 15 Aug 2013 01:08:24 GMT (envelope-from wblock@svn.freebsd.org) Message-Id: <201308150108.r7F18OET056618@svn.freebsd.org> From: Warren Block Date: Thu, 15 Aug 2013 01:08:24 +0000 (UTC) To: doc-committers@freebsd.org, svn-doc-projects@freebsd.org Subject: svn commit: r42544 - projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs X-SVN-Group: doc-projects MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-doc-projects@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for doc projects trees List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Aug 2013 01:08:24 -0000 Author: wblock Date: Thu Aug 15 01:08:23 2013 New Revision: 42544 URL: http://svnweb.freebsd.org/changeset/doc/42544 Log: Move Terms section to end. Modified: projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml Modified: projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml ============================================================================== --- projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml Thu Aug 15 01:04:54 2013 (r42543) +++ projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml Thu Aug 15 01:08:23 2013 (r42544) @@ -33,636 +33,6 @@ designed to prevent data write corruption and to overcome some of the limitations of hardware RAID. - - ZFS Features and Terminology - - ZFS is a fundamentally different file system because it - is more than just a file system. ZFS combines the roles of - file system and volume manager, enabling additional storage - devices to be added to a live system and having the new space - available on all of the existing file systems in that pool - immediately. By combining the traditionally separate roles, - ZFS is able to overcome previous limitations that prevented - RAID groups being able to grow. Each top level device in a - zpool is called a vdev, which can be a simple disk or a RAID - transformation such as a mirror or RAID-Z array. ZFS file - systems (called datasets), each have access to the combined - free space of the entire pool. As blocks are allocated the - free space in the pool available to of each file system is - decreased. This approach avoids the common pitfall with - extensive partitioning where free space becomes fragmentated - across the partitions. - - - - - - zpool - - A storage pool is the most basic building block of - ZFS. A pool is made up of one or more vdevs, the - underlying devices that store the data. A pool is then - used to create one or more file systems (datasets) or - block devices (volumes). These datasets and volumes - share the pool of remaining free space. Each pool is - uniquely identified by a name and a - GUID. The zpool also controls the - version number and therefore the features available for - use with ZFS. - - - &os; 9.0 and 9.1 include support for ZFS version - 28. Future versions use ZFS version 5000 with - feature flags. This allows greater - cross-compatibility with other implementations of - ZFS. - - - - - vdev Types - - A zpool is made up of one or more vdevs, which - themselves can be a single disk or a group of disks, in - the case of a RAID transform. When multiple vdevs are - used, ZFS spreads data across the vdevs to increase - performance and maximize usable space. - - - - - Disk - The most basic type - of vdev is a standard block device. This can be - an entire disk (such as - /dev/ada0 - or - /dev/da0) - or a partition - (/dev/ada0p3). - Contrary to the Solaris documentation, on &os; - there is no performance penalty for using a - partition rather than an entire disk. - - - - - File - In addition to - disks, ZFS pools can be backed by regular files, - this is especially useful for testing and - experimentation. Use the full path to the file - as the device path in the zpool create command. - All vdevs must be atleast 128 MB in - size. - - - - - Mirror - When creating a - mirror, specify the mirror - keyword followed by the list of member devices - for the mirror. A mirror consists of two or - more devices, all data will be written to all - member devices. A mirror vdev will only hold as - much data as its smallest member. A mirror vdev - can withstand the failure of all but one of its - members without losing any data. - - - regular single disk vdev can be upgraded to - a mirror vdev at any time using the - zpool attach - command. - - - - - - RAID-Z - - ZFS implements RAID-Z, a variation on standard - RAID-5 that offers better distribution of parity - and eliminates the "RAID-5 write hole" in which - the data and parity information become - inconsistent after an unexpected restart. ZFS - supports 3 levels of RAID-Z which provide - varying levels of redundancy in exchange for - decreasing levels of usable storage. The types - are named RAID-Z1 through Z3 based on the number - of parity devinces in the array and the number - of disks that the pool can operate - without. - - In a RAID-Z1 configuration with 4 disks, - each 1 TB, usable storage will be 3 TB - and the pool will still be able to operate in - degraded mode with one faulted disk. If an - additional disk goes offline before the faulted - disk is replaced and resilvered, all data in the - pool can be lost. - - In a RAID-Z3 configuration with 8 disks of - 1 TB, the volume would provide 5TB of - usable space and still be able to operate with - three faulted disks. Sun recommends no more - than 9 disks in a single vdev. If the - configuration has more disks, it is recommended - to divide them into separate vdevs and the pool - data will be striped across them. - - A configuration of 2 RAID-Z2 vdevs - consisting of 8 disks each would create - something similar to a RAID 60 array. A RAID-Z - group's storage capacity is approximately the - size of the smallest disk, multiplied by the - number of non-parity disks. 4x 1 TB disks - in Z1 has an effective size of approximately - 3 TB, and a 8x 1 TB array in Z3 will - yeild 5 TB of usable space. - - - - - Spare - ZFS has a special - pseudo-vdev type for keeping track of available - hot spares. Note that installed hot spares are - not deployed automatically; they must manually - be configured to replace the failed device using - the zfs replace command. - - - - - Log - ZFS Log Devices, also - known as ZFS Intent Log (ZIL) - move the intent log from the regular pool - devices to a dedicated device. The ZIL - accelerates synchronous transactions by using - storage devices (such as - SSDs) that are faster - compared to those used for the main pool. When - data is being written and the application - requests a guarantee that the data has been - safely stored, the data is written to the faster - ZIL storage, then later flushed out to the - regular disks, greatly reducing the latency of - synchronous writes. Log devices can be - mirrored, but RAID-Z is not supported. When - specifying multiple log devices writes will be - load balanced across all devices. - - - - - Cache - Adding a cache vdev - to a zpool will add the storage of the cache to - the L2ARC. Cache devices cannot be mirrored. - Since a cache device only stores additional - copies of existing data, there is no risk of - data loss. - - - - - - Adaptive Replacement - Cache (ARC) - - ZFS uses an Adaptive Replacement Cache - (ARC), rather than a more - traditional Least Recently Used - (LRU) cache. An - LRU cache is a simple list of items - in the cache sorted by when each object was most - recently used; new items are added to the top of the - list and once the cache is full items from the bottom - of the list are evicted to make room for more active - objects. An ARC consists of four - lists; the Most Recently Used (MRU) - and Most Frequently Used (MFU) - objects, plus a ghost list for each. These ghost - lists tracks recently evicted objects to provent them - being added back to the cache. This increases the - cache hit ratio by avoiding objects that have a - history of only being used occasionally. Another - advantage of using both an MRU and - MFU is that scanning an entire - filesystem would normally evict all data from an - MRU or LRU cache - in favor of this freshly accessed content. In the - case of ZFS since there is also an - MFU that only tracks the most - frequently used objects, the cache of the most - commonly accessed blocks remains. - - - - L2ARC - - The L2ARC is the second level - of the ZFS caching system. The - primary ARC is stored in - RAM, however since the amount of - available RAM is often limited, - ZFS can also make use of cache - vdevs. Solid State Disks (SSDs) are - often used as these cache devices due to their higher - speed and lower latency compared to traditional spinning - disks. An L2ARC is entirely optional, but having one - will significantly increase read speeds for files that - are cached on the SSD instead of - having to be read from the regular spinning disks. The - L2ARC can also speed up deduplication - since a DDT that does not fit in - RAM but does fit in the - L2ARC will be much faster than if the - DDT had to be read from disk. The - rate at which data is added to the cache devices is - limited to prevent prematurely wearing out the - SSD with too many writes. Until the - cache is full (the first block has been evicted to make - room), writing to the L2ARC is - limited to the sum of the write limit and the boost - limit, then after that limited to the write limit. A - pair of sysctl values control these rate limits; - vfs.zfs.l2arc_write_max controls how - many bytes are written to the cache per second, while - vfs.zfs.l2arc_write_boost adds to - this limit during the "Turbo Warmup Phase" (Write - Boost). - - - - Copy-On-Write - - Unlike a traditional file system, when data is - overwritten on ZFS the new data is written to a - different block rather than overwriting the old data in - place. Only once this write is complete is the metadata - then updated to point to the new location of the data. - This means that in the event of a shorn write (a system - crash or power loss in the middle of writing a file) the - entire original contents of the file are still available - and the incomplete write is discarded. This also means - that ZFS does not require a fsck after an unexpected - shutdown. - - - - Dataset - - Dataset is the generic term for a ZFS file system, - volume, snapshot or clone. Each dataset will have a - unique name in the format: - poolname/path@snapshot. The root of - the pool is technically a dataset as well. Child - datasets are named hierarchically like directories; for - example mypool/home, the home dataset - is a child of mypool and inherits properties from it. - This can be expended further by creating - mypool/home/user. This grandchild - dataset will inherity properties from the parent and - grandparent. It is also possible to set properties - on a child to override the defaults inherited from the - parents and grandparents. ZFS also allows - administration of datasets and their children to be - delegated. - - - - Volume - - In additional to regular file system datasets, ZFS - can also create volumes, which are block devices. - Volumes have many of the same features, including - copy-on-write, snapshots, clones and checksumming. - Volumes can be useful for running other file system - formats on top of ZFS, such as UFS or in the case of - Virtualization or exporting iSCSI - extents. - - - - Snapshot - - The copy-on-write - - design of ZFS allows for nearly instantaneous consistent - snapshots with arbitrary names. After taking a snapshot - of a dataset (or a recursive snapshot of a parent - dataset that will include all child datasets), new data - is written to new blocks (as described above), however - the old blocks are not reclaimed as free space. There - are then two versions of the file system, the snapshot - (what the file system looked like before) and the live - file system; however no additional space is used. As - new data is written to the live file system, new blocks - are allocated to store this data. The apparent size of - the snapshot will grow as the blocks are no longer used - in the live file system, but only in the snapshot. - These snapshots can be mounted (read only) to allow for - the recovery of previous versions of files. It is also - possible to rollback - a live file system to a specific snapshot, undoing any - changes that took place after the snapshot was taken. - Each block in the zpool has a reference counter which - indicates how many snapshots, clones, datasets or - volumes make use of that block. As files and snapshots - are deleted, the reference count is decremented; once a - block is no longer referenced, it is reclaimed as free - space. Snapshots can also be marked with a hold, - once a snapshot is held, any attempt to destroy it will - return an EBUY error. Each snapshot can have multiple - holds, each with a unique name. The release - command removes the hold so the snapshot can then be - deleted. Snapshots can be taken on volumes, however - they can only be cloned or rolled back, not mounted - independently. - - - - Clone - - Snapshots can also be cloned; a clone is a writable - version of a snapshot, allowing the file system to be - forked as a new dataset. As with a snapshot, a clone - initially consumes no additional space, only as new data - is written to a clone and new blocks are allocated does - the apparent size of the clone grow. As blocks are - overwritten in the cloned file system or volume, the - reference count on the previous block is decremented. - The snapshot upon which a clone is based cannot be - deleted because the clone is dependeant upon it (the - snapshot is the parent, and the clone is the child). - Clones can be promoted, reversing - this dependeancy, making the clone the parent and the - previous parent the child. This operation requires no - additional space, however it will change the way the - used space is accounted. - - - - Checksum - - Every block that is allocated is also checksummed - (which algorithm is used is a per dataset property, see: - zfs set). ZFS transparently validates the checksum of - each block as it is read, allowing ZFS to detect silent - corruption. If the data that is read does not match the - expected checksum, ZFS will attempt to recover the data - from any available redundancy (mirrors, RAID-Z). You - can trigger the validation of all checksums using the - scrub - command. The available checksum algorithms include: - - - - fletcher2 - - - - fletcher4 - - - - sha256 - - - - The fletcher algorithms are faster, but sha256 is a - strong cryptographic hash and has a much lower chance of - a collisions at the cost of some performance. Checksums - can be disabled but it is inadvisable. - - - - Compression - - Each dataset in ZFS has a compression property, - which defaults to off. This property can be set to one - of a number of compression algorithms, which will cause - all new data that is written to this dataset to be - compressed as it is written. In addition to the - reduction in disk usage, this can also increase read and - write throughput, as only the smaller compressed version - of the file needs to be read or written. - - - LZ4 compression is only available after &os; - 9.2 - - - - - Deduplication - - ZFS has the ability to detect duplicate blocks of - data as they are written (thanks to the checksumming - feature). If deduplication is enabled, instead of - writing the block a second time, the reference count of - the existing block will be increased, saving storage - space. In order to do this, ZFS keeps a deduplication - table (DDT) in memory, containing the - list of unique checksums, the location of that block and - a reference count. When new data is written, the - checksum is calculated and compared to the list. If a - match is found, the data is considered to be a - duplicate. When deduplication is enabled, the checksum - algorithm is changed to SHA256 to - provide a secure cryptographic hash. ZFS deduplication - is tunable; if dedup is on, then a matching checksum is - assumed to mean that the data is identical. If dedup is - set to verify, then the data in the two blocks will be - checked byte-for-byte to ensure it is actually identical - and if it is not, the hash collision will be noted by - ZFS and the two blocks will be stored separately. Due - to the nature of the DDT, having to - store the hash of each unique block, it consumes a very - large amount of memory (a general rule of thumb is - 5-6 GB of ram per 1 TB of deduplicated data). - In situations where it is not practical to have enough - RAM to keep the entire DDT in memory, - performance will suffer greatly as the DDT will need to - be read from disk before each new block is written. - Deduplication can make use of the L2ARC to store the - DDT, providing a middle ground between fast system - memory and slower disks. It is advisable to consider - using ZFS compression instead, which often provides - nearly as much space savings without the additional - memory requirement. - - - - Scrub - - In place of a consistency check like fsck, ZFS has - the scrub command, which reads all - data blocks stored on the pool and verifies their - checksums them against the known good checksums stored - in the metadata. This periodic check of all the data - stored on the pool ensures the recovery of any corrupted - blocks before they are needed. A scrub is not required - after an unclean shutdown, but it is recommended that - you run a scrub at least once each quarter. ZFS - compares the checksum for each block as it is read in - the normal course of use, but a scrub operation makes - sure even infrequently used blocks are checked for - silent corruption. - - - - Dataset Quota - - ZFS provides very fast and accurate dataset, user - and group space accounting in addition to quotes and - space reservations. This gives the administrator fine - grained control over how space is allocated and allows - critical file systems to reserve space to ensure other - file systems do not take all of the free space. - - ZFS supports different types of quotas: the - dataset quota, the reference - quota (refquota), the - user - quota, and the - group - quota. - - Quotas limit the amount of space that a dataset - and all of its descendants (snapshots of the dataset, - child datasets and the snapshots of those datasets) - can consume. - - - Quotas cannot be set on volumes, as the - volsize property acts as an - implicit quota. - - - - - Reference - Quota - - A reference quota limits the amount of space a - dataset can consume by enforcing a hard limit on the - space used. However, this hard limit includes only - space that the dataset references and does not include - space used by descendants, such as file systems or - snapshots. - - - - User - Quota - - User quotas are useful to limit the amount of space - that can be used by the specified user. - - - - Group - Quota - - The group quota limits the amount of space that a - specified group can consume. - - - - Dataset - Reservation - - The reservation property makes - it possible to guaranteed a minimum amount of space for - the use of a specific dataset and its descendants. This - means that if a 10 GB reservation is set on - storage/home/bob, if another - dataset tries to use all of the free space, at least - 10 GB of space is reserved for this dataset. If a - snapshot is taken of - storage/home/bob, the space used by - that snapshot is counted against the reservation. The - refreservation - property works in a similar way, except it - excludes descendants, such as - snapshots. - - Reservations of any sort are useful in many - situations, such as planning and testing the - suitability of disk space allocation in a new system, - or ensuring that enough space is available on file - systems for audio logs or system recovery procedures - and files. - - - - Reference - Reservation - - The refreservation property - makes it possible to guaranteed a minimum amount of - space for the use of a specific dataset - excluding its descendants. This - means that if a 10 GB reservation is set on - storage/home/bob, if another - dataset tries to use all of the free space, at least - 10 GB of space is reserved for this dataset. In - contrast to a regular reservation, - space used by snapshots and decendant datasets is not - counted against the reservation. As an example, if a - snapshot was taken of - storage/home/bob, enough disk space - would have to exist outside of the - refreservation amount for the - operation to succeed because descendants of the main - data set are not counted by the - refreservation amount and so do not - encroach on the space set. - - - - Resilver - - When a disk fails and must be replaced, the new - disk must be filled with the data that was lost. This - process of calculating and writing the missing data - (using the parity information distributed across the - remaining drives) to the new drive is called - Resilvering. - - - - - - What Makes ZFS Different @@ -1019,443 +389,1073 @@ config: errors: No known data errors - As shown from this example, everything appears to be - normal. - + As shown from this example, everything appears to be + normal. + + + + Data Verification + + ZFS uses checksums to verify the + integrity of stored data. These are enabled automatically + upon creation of file systems and may be disabled using the + following command: + + &prompt.root; zfs set checksum=off storage/home + + Doing so is not recommended as + checksums take very little storage space and are used to check + data integrity using checksum verification in a process is + known as scrubbing. To verify the data + integrity of the storage pool, issue this + command: + + &prompt.root; zpool scrub storage + + This process may take considerable time depending on the + amount of data stored. It is also very I/O + intensive, so much so that only one scrub may be run at any + given time. After the scrub has completed, the status is + updated and may be viewed by issuing a status request: + + &prompt.root; zpool status storage + pool: storage + state: ONLINE + scrub: scrub completed with 0 errors on Sat Jan 26 19:57:37 2013 +config: + + NAME STATE READ WRITE CKSUM + storage ONLINE 0 0 0 + raidz1 ONLINE 0 0 0 + da0 ONLINE 0 0 0 + da1 ONLINE 0 0 0 + da2 ONLINE 0 0 0 + +errors: No known data errors + + The completion time is displayed and helps to ensure data + integrity over a long period of time. + + Refer to &man.zfs.8; and &man.zpool.8; for other + ZFS options. + + + + + <command>zpool</command> Administration + + + + + Creating & Destroying Storage Pools + + + + + + Adding & Removing Devices + + + + + + Dealing with Failed Devices + + + + + + Importing & Exporting Pools + + + + + + Upgrading a Storage Pool + + + + + + Checking the Status of a Pool + + + + + + Performance Monitoring + + + + + + Splitting a Storage Pool + + + + + + + <command>zfs</command> Administration + + + + + Creating & Destroying Datasets + + + + + + Creating & Destroying Volumes + + + + + + Renaming a Dataset + + + + + + Setting Dataset Properties + + + + + + Managing Snapshots + + + + + + Managing Clones + + + + + + ZFS Replication + + + + + + Dataset, User and Group Quotes + + To enforce a dataset quota of 10 GB for + storage/home/bob, use the + following: + + &prompt.root; zfs set quota=10G storage/home/bob + + To enforce a reference quota of 10 GB for + storage/home/bob, use the + following: + + &prompt.root; zfs set refquota=10G storage/home/bob + + The general format is + userquota@user=size, + and the user's name must be in one of the following + formats: + + + + POSIX compatible name such as + joe. + + + + POSIX numeric ID such as + 789. + + + + SID name + such as + joe.bloggs@example.com. + + + + SID + numeric ID such as + S-1-123-456-789. + + + + For example, to enforce a user quota of 50 GB for a + user named joe, use the + following: + + &prompt.root; zfs set userquota@joe=50G + + To remove the quota or make sure that one is not set, + instead use: + + &prompt.root; zfs set userquota@joe=none + + + User quota properties are not displayed by + zfs get all. + Non-root users can only see their own + quotas unless they have been granted the + userquota privilege. Users with this + privilege are able to view and set everyone's quota. + + + The general format for setting a group quota is: + groupquota@group=size. + + To set the quota for the group + firstgroup to 50 GB, + use: + + &prompt.root; zfs set groupquota@firstgroup=50G - - Data Verification + To remove the quota for the group + firstgroup, or to make sure that + one is not set, instead use: - ZFS uses checksums to verify the - integrity of stored data. These are enabled automatically - upon creation of file systems and may be disabled using the - following command: + &prompt.root; zfs set groupquota@firstgroup=none - &prompt.root; zfs set checksum=off storage/home + As with the user quota property, + non-root users can only see the quotas + associated with the groups that they belong to. However, + root or a user with the + groupquota privilege can view and set all + quotas for all groups. - Doing so is not recommended as - checksums take very little storage space and are used to check - data integrity using checksum verification in a process is - known as scrubbing. To verify the data - integrity of the storage pool, issue this - command: + To display the amount of space consumed by each user on + the specified filesystem or snapshot, along with any specified + quotas, use zfs userspace. For group + information, use zfs groupspace. For more + information about supported options or how to display only + specific options, refer to &man.zfs.1;. - &prompt.root; zpool scrub storage + Users with sufficient privileges and + root can list the quota for + storage/home/bob using: - This process may take considerable time depending on the - amount of data stored. It is also very I/O - intensive, so much so that only one scrub may be run at any - given time. After the scrub has completed, the status is - updated and may be viewed by issuing a status request: + &prompt.root; zfs get quota storage/home/bob + - &prompt.root; zpool status storage - pool: storage - state: ONLINE - scrub: scrub completed with 0 errors on Sat Jan 26 19:57:37 2013 -config: + + Reservations - NAME STATE READ WRITE CKSUM - storage ONLINE 0 0 0 - raidz1 ONLINE 0 0 0 - da0 ONLINE 0 0 0 - da1 ONLINE 0 0 0 - da2 ONLINE 0 0 0 + -errors: No known data errors + The general format of the reservation + property is + reservation=size, + so to set a reservation of 10 GB on + storage/home/bob, use: - The completion time is displayed and helps to ensure data - integrity over a long period of time. + &prompt.root; zfs set reservation=10G storage/home/bob - Refer to &man.zfs.8; and &man.zpool.8; for other - ZFS options. - - + To make sure that no reservation is set, or to remove a + reservation, use: - - <command>zpool</command> Administration + &prompt.root; zfs set reservation=none storage/home/bob - + The same principle can be applied to the + refreservation property for setting a + refreservation, with the general format + refreservation=size. - - Creating & Destroying Storage Pools + To check if any reservations or refreservations exist on + storage/home/bob, execute one of the + following commands: - + &prompt.root; zfs get reservation storage/home/bob +&prompt.root; zfs get refreservation storage/home/bob - - Adding & Removing Devices + + Compression - - Dealing with Failed Devices + + Deduplication - - Importing & Exporting Pools + + Delegated Administration *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***