From owner-freebsd-doc Thu Feb 7 12:26:57 2002 Delivered-To: freebsd-doc@freebsd.org Received: from mta02-svc.ntlworld.com (mta02-svc.ntlworld.com [62.253.162.42]) by hub.freebsd.org (Postfix) with ESMTP id 55C7537B422 for ; Thu, 7 Feb 2002 12:26:44 -0800 (PST) Received: from hukins.hn.org ([62.253.83.115]) by mta02-svc.ntlworld.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with SMTP id <20020207202637.YKSS8848.mta02-svc.ntlworld.com@hukins.hn.org> for ; Thu, 7 Feb 2002 20:26:37 +0000 Received: (qmail 20000 invoked by uid 1001); 7 Feb 2002 20:26:03 -0000 Date: Thu, 7 Feb 2002 20:26:03 +0000 From: Tom Hukins To: Eric Ferguson Cc: doc@FreeBSD.org Subject: Re: Doc mistake. Message-ID: <20020207202603.A19497@eborcom.com> Mail-Followup-To: Tom Hukins , Eric Ferguson , doc@FreeBSD.org References: <20020207140922.Y50387-100000@res147b-129.rh.rit.edu> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="CE+1k2dSO48ffgeK" Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20020207140922.Y50387-100000@res147b-129.rh.rit.edu>; from etf2954@rit.edu on Thu, Feb 07, 2002 at 02:13:43PM -0500 Sender: owner-freebsd-doc@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org --CE+1k2dSO48ffgeK Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Feb 07, 2002 at 02:13:43PM -0500, Eric Ferguson wrote: > > Sorry to bother you, but I found a error in the docs. Thanks! It's always worth reporting any errors you find, and never a bother. > This is in the FreeBSD Handbook Section 6.9 Tuning Disks > > In section 6.9.2.1 in the third last sentence (beginning: A rm -f for > ...) of the second paragraph (beginning: There are two classical ...) the > sentence reads "... but every single of these directory changes ..." and I > believe that it should read "... but every single ONE of these directory > changes ..." I've looked through that section, and the English isn't great throughout. I'd appreciate if people could review and comment on the attached patch. If I don't hear any criticism, I'll commit it in the next few days. Tom --CE+1k2dSO48ffgeK Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="config-chapter.patch" Index: config/chapter.sgml =================================================================== RCS file: /home/ncvs/doc/en_US.ISO8859-1/books/handbook/config/chapter.sgml,v retrieving revision 1.37 diff -u -r1.37 chapter.sgml --- config/chapter.sgml 23 Jan 2002 11:59:32 -0000 1.37 +++ config/chapter.sgml 7 Feb 2002 20:06:45 -0000 @@ -816,17 +816,16 @@ The vfs.vmiodirenable sysctl variable - defaults to 1 (on) and may - be set to 0 (off) or 1 (on). This parameter controls how + be set to either 0 (off) or 1 (on). It is 1 by default. This parameter controls how directories are cached by the system. Most directories are - small and use but a single fragment (typically 1K) in the - filesystem and even less (typically 512 bytes) in the buffer + small, using just a single fragment (typically 1K) in the + filesystem and less (typically 512 bytes) in the buffer cache. However, when operating in the default mode the buffer cache will only cache a fixed number of directories even if you have a huge amount of memory. Turning on this sysctl allows the buffer cache to use the VM Page Cache to cache the - directories. The advantage is that all of memory is now - available for caching directories. The disadvantage is that + directories, making all the memory + available for caching directories. However, the minimum in-core memory used to cache a directory is the physical page size (typically 4K) rather than 512 bytes. We recommend turning this option on if you are running any @@ -847,15 +846,15 @@ FreeBSD 4.3 flirted with turning off IDE write caching. This reduced write bandwidth to IDE disks but was considered necessary due to serious data consistency issues introduced - by hard drive vendors. Basically the problem is that IDE + by hard drive vendors. The problem is that IDE drives lie about when a write completes. With IDE write - caching turned on, IDE hard drives will not only write data - to disk out of order, they will sometimes delay some of the + caching turned on, IDE hard drives not only write data + to disk out of order, but will sometimes delay writing some blocks indefinitely when under heavy disk loads. A crash or - power failure can result in serious filesystem corruption. - So our default was changed to be safe. Unfortunately, the - result was such a huge loss in performance that we caved in - and changed the default back to on after the release. You + power failure may cause serious filesystem corruption. + FreeBSD's default was changed to be safe. Unfortunately, the + result was such a huge performance loss that we + changed write caching back to on by default after the release. You should check the default on your system by observing the hw.ata.wc sysctl variable. If IDE write caching is turned off, you can turn it back on by setting @@ -898,44 +897,44 @@ updating the physical disk. If your system crashes you may lose more work than otherwise. Secondly, Soft Updates delays the freeing of filesystem blocks. If you have a filesystem (such as the root - filesystem) which is close to full, doing a major update of it, e.g. - make installworld, can run it out of space and - cause the update to fail. + filesystem) which is almost full, doing a major update of it, e.g. + make installworld, can cause the filesystem to run out of space, + causing the update to fail. More details about Soft Updates Soft Updates (Details) - There are two classical approaches how to write metadata of - a filesystem back to disk. (Metadata updates are updates to - non-content data like i-nodes or directories.) + There are two traditional approaches to writing a filesystem's meta-data + back to disk. (Meta-data updates are updates to + non-content data like inodes or directories.) Historically, the default behaviour was to write out - metadata updates synchronously. If a directory had been + meta-data updates synchronously. If a directory had been changed, the system waited until the change was actually written to disk. The file data buffers (file contents) have been passed through the buffer cache however, and backed up to disk later on asynchronously. The advantage of this - implementation is that it is operating very safely. If there is - a failure during an update the metadata are always in a - consistent state. A file has either been completely created + implementation is that it operates safely. If there is + a failure during an update, the meta-data are always in a + consistent state. A file has either been created completely or not at all. If the data blocks of a file did not find their way out of the buffer cache onto the disk by the time - of the crash, &man.fsck.8; is able to recognize this and to - repair the filesystem (e. g. the file length will be set to + of the crash, &man.fsck.8; is able to recognize this and + repair the filesystem by setting the file length to 0). Additionally, the implementation is clear and simple. - The disadvantage is that metadata changes are very slow. A - rm -r for instance touches all files of a - directory sequentially, but every single of these directory - changes (deletion of a file) will be written synchronously + The disadvantage is that meta-data changes are slow. A + rm -r for instance touches all the files in a + directory sequentially, but each directory + change (deletion of a file) will be written synchronously to the disk. This includes updates to the directory itself, - to the i-node table, and possibly to indirect blocks + to the inode table, and possibly to indirect blocks allocated by the file. Similar considerations apply for - unrolling large hierachies (tar -x). + unrolling large hierarchies (tar -x). - The second case are asynchronous metadata updates. This - is e. g. the default for Linux/ext2fs or achieved by + The second case is asynchronous meta-data updates. This + is the default for Linux/ext2fs or achieved by mount -o async for *BSD ufs. All metadata updates are simply being passed through the buffer cache too, that is, they will be intermixed with the updates @@ -948,101 +947,101 @@ risk for bugs creeping into the code. The disadvantage is that there is no guarantee at all for a consistent state of the filesystem. If there is a failure during an operation - that updated large amounts of metadata (like a power + that updated large amounts of meta-data (like a power failure, or someone pressing the reset button), the file system - will be left in an unpredictable state. There is no chance + will be left in an unpredictable state. There is no opportunity to examine the state of the file system when the system comes up again; the data blocks of a file could already have - been written to the disk while the updates of the i-node + been written to the disk while the updates of the inode table or the associated directory were not. It is actually impossible to implement a fsck which is able to clean up the resulting chaos (because the necessary - information is just not available on the disk). If the + information is not available on the disk). If the filesystem has been damaged beyond repair, the only choice is to newfs it and restore it from backup. - The usual solution for this problem was to implement a - dirty region logging (sometimes also - referred to as journalling, albeit that - term has not been used consistently and occasionally applied - to other forms of transaction logging as well). Metadata - updates are still written out synchronously, but only into a - small region of the disk. Later on they will be distributed - from there to their proper location. Because the logging - area is only a small, contiguous region on the disk, there + The usual solution for this problem was to implement + dirty region logging, which is also + referred to as journalling, although that + term is not used consistently and is occasionally applied + to other forms of transaction logging as well. Meta-data + updates are still written synchronously, but only into a + small region of the disk. Later on they will be moved + to their proper location. Because the logging + area is a small, contiguous region on the disk, there are no long distances for the disk heads to move, even - during heavy operations, so these operations are accelerated - quite a bit compared to the classical synchronous updates. + during heavy operations, so these operations are quicker + than synchronous updates. Additionally the complexity of the implementation is fairly - limited and thus the risk for bugs still low. A disadvatage - is that all metadata are written twice (once into the + limited, so the risk of bugs being present is low. A disadvatage + is that all meta-data are written twice (once into the logging region and once to the proper location) so for normal work, a performance pessimization might result. On the other hand, in case of a crash, all - pending metadata operations can be quickly either rolled-back + pending meta-data operations can be quickly either rolled back or completed from the logging area after the system comes up again, resulting in a fast filesystem startup. - Now, Kirk McKusick's (the developer of Berkeley FFS) - solution to the problem are Soft Updates: all pending - metadata updates are kept in memory and written out to disk - in a sorted sequence (ordered metadata + Kirk McKusick (the developer of Berkeley FFS) + solved this problem with Soft Updates: all pending + meta-data updates are kept in memory and written out to disk + in a sorted sequence (ordered meta-data updates). This has the effect that, in case of - heavy metadata operations, later updates of a certain item - catch the earlier ones if those are still in + heavy meta-data operations, later updates to an item + catch the earlier ones if the earlier ones are still in memory and have not already been written to disk. So all - operations on, say, a directory are generally done still in + operations on, say, a directory are generally done in memory before the update is written to disk (the data - blocks are sorted to their according position as well so + blocks are sorted according to their position so that they will not be on the disk ahead of their metadata). - In case of a crash this causes an implicit log + If the system crashes, this causes an implicit log rewind: all operations which did not find their way to the disk appear as if they had never happened. A consistent filesystem state is maintained that appears to - be the one of 30--60 seconds earlier. The - algorithm used guarantees that all actually used resources - are marked as such in their appropriate bitmaps: blocks and i-nodes. + be the one of 30 to 60 seconds earlier. The + algorithm used guarantees that all used resources + are marked as such in their appropriate bitmaps: blocks and inodes. After a crash, the only resource allocation error - that occur are that resources are - marked as used which actually are free. - &man.fsck.8; then recognizes this situation, - and free up those no longer used resources. It is safe to - ignore the dirty state of the filesystem after a crash, by + that occurs is that resources are + marked as used which are actually free. + &man.fsck.8; recognizes this situation, + and frees the resources that are no longer used. It is safe to + ignore the dirty state of the filesystem after a crash by forcibly mounting it with mount -f. In - order to free up possibly unused resources, &man.fsck.8; + order to free up resources that may be unused, &man.fsck.8; needs to be run at a later time. This is the idea behind the background fsck: at system startup - time, only a snapshot from the - filesystem is recorded, that fsck can be - run against later on. All filesystems can then be mounted - dirty, and system startup proceeds to + time, only a snapshot of the + filesystem is recorded, the fsck can be + run later on. All filesystems can then be mounted + dirty, so the system startup proceeds in multiuser mode. Then, background fscks - will be scheduled for all filesystems that need it, to free - up possibly unused resources. (Filesystems that do not use + will be scheduled for all filesystems where this is required, to free + resources that may be unused. (Filesystems that do not use soft updates still need the usual foreground fsck though.) - The advantage is that metadata operations are nearly as - fast as asynchronous updates (i. e. faster than with + The advantage is that meta-data operations are nearly as + fast as asynchronous updates (i.e. faster than with logging, which has to write the metadata twice). The disadvantages are the complexity of the code (implying a higher risk for bugs in an area that is highly sensitive regarding loss of user data), and a higher memory consumption. Additionally there are some - idiosyncrasies one has to get used to. + idiosyncrasies one has to get used to. After a crash, the state of the filesystem appears to be - somewhat older; e. g. in situations where + somewhat older. In situations where the standard synchronous approach would have caused some zero-length files to remain after the fsck, these files do not exist at all - with a soft updates filesystem because neither the metadata + with a Soft Updates filesystem because neither the meta-data nor the file contents have ever been written to disk. - After a rm, the released disk space is - not instantly available but only after the updates have - written to disk. This can in particular cause problems - when installing large amounts of data into a filesystem + Disk space is not released until the updates have been + written to disk, which may take place some time after + running rm . This may cause problems + when installing large amounts of data on a filesystem that does not have enough free space to hold all the files twice. --CE+1k2dSO48ffgeK-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message