From owner-svn-src-all@freebsd.org Fri Oct 16 06:21:22 2015 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9099CA16BC2; Fri, 16 Oct 2015 06:21:22 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 1923D2B9; Fri, 16 Oct 2015 06:21:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 92151D66A8C; Fri, 16 Oct 2015 17:21:11 +1100 (AEDT) Date: Fri, 16 Oct 2015 17:21:09 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Warner Losh cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r289405 - head/sys/ufs/ffs In-Reply-To: <201510160306.t9G3622O049128@repo.freebsd.org> Message-ID: <20151016151349.W1280@besplex.bde.org> References: <201510160306.t9G3622O049128@repo.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=R4L+YolX c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=kj9zAlcOel0A:10 a=vRWhB1pj-ohvYJLlHdEA:9 a=ALDmZt6oEDJzuPMU:21 a=OORJ6kMRU7IAifqU:21 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Oct 2015 06:21:22 -0000 On Fri, 16 Oct 2015, Warner Losh wrote: > Log: > Do not relocate extents to make them contiguous if the underlying drive can do > deletions. Ability to do deletions is a strong indication that this > optimization will not help performance. It will only generate extra write > traffic. These devices are typically flash based and have a limited number of > write cycles. In addition, making the file contiguous in LBA space doesn't > improve the access times from flash devices because they have no seek time. > > Reviewed by: mckusick@ Actually, making the file contiguous does improve the access time, probably by relatively more for flash devices, since for fast devices the number of i/o's per second is a bottleneck and discontiguous files give many more i/o's per second. E.g., suppose the block size is 16K and the transfer rate is 1GB/sec. This requires 64K i/o's per second (iops) to keep up with the data and many more to keep up with the metadata. Completely discontiguous files are limited to this rate. But clustering of large contiguous files increases the block size to 128K, so you only need 8K iops to keep up with the data. I think turning of reallocation always gives 1 discontiguous block for medium-sized files, but not many more than that. That still doubles ot triples the number of data i/o's for files of size about 128K (1 block is often split into 3 by a seek in the middle). > Modified: head/sys/ufs/ffs/ffs_alloc.c > ============================================================================== > --- head/sys/ufs/ffs/ffs_alloc.c Fri Oct 16 03:03:04 2015 (r289404) > +++ head/sys/ufs/ffs/ffs_alloc.c Fri Oct 16 03:06:02 2015 (r289405) > @@ -481,9 +481,19 @@ ffs_reallocblks(ap) > struct cluster_save *a_buflist; > } */ *ap; > { > + struct ufsmount *ump; > > - if (doreallocblks == 0) The correct way to configure this is a mount option, not this sysctl variable. I think the variable was only intended for turning off reallocation when it was buggy. In 4.4SD-Lite2, this variable wasn't even private for ffs, and old versions of FreeBSD misused it in ext2fs. The related sysctl variables noclusterr and noclusterw were misconfigured similarly in 4.4BSD-Lite2, but FreeBSD fixed this by turning them into mount options, despite them probably being less important than doreallocblocks. I only use them to see if vfs clustering is still useful. Unfortunately, it still is in most cases. It is too complicated, and too heavyweight. But its weight is still smaller than more i/o's for smaller blocks, at least on non-memory disks. I think the cleanup was motivated mainly for non-automatic use of the flags on memory disks in pc98. Configuration of memory disks in main memory is also badly supported. I think md(4) still gives double-caching for all types of backing store, so if iops is not a problem then doreallocblks and cluster[rw] should be turned off in all cases for md to recover a small part of the loss from the double-caching, but md doesn't know anything about this. Oops, actually md does try to avoid the double-caching, but it does this for all reads (by using IO_DIRECT) for all types of backing store, and this destroys performance for at least the case of vnode-backed disks with the vnode on a hard disk. IO_DIRECT certainly prevents clustering. Then if it works as intended to avoid double-caching, it also gives many more i/o's than necessary if there is a block size mismatch. Perhaps 128 times as many for a 64K:512 mismatch (128 reads of different virtual 512-blocks are mapped to 128 reads of the same physical 64K-block. IO_DIRECT prevents caching of the physical block. The virtual blocks should be clustered into part of 1 128K- block, but don't seem to be. Bruce