From owner-freebsd-arch Fri Nov 3 10:49:45 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id 4C73237B4CF for ; Fri, 3 Nov 2000 10:49:41 -0800 (PST) Received: (from daemon@localhost) by smtp02.primenet.com (8.9.3/8.9.3) id LAA08446; Fri, 3 Nov 2000 11:45:39 -0700 (MST) Received: from usr07.primenet.com(206.165.6.207) via SMTP by smtp02.primenet.com, id smtpdAAAvfaOCq; Fri Nov 3 11:45:31 2000 Received: (from tlambert@localhost) by usr07.primenet.com (8.8.5/8.8.5) id LAA20781; Fri, 3 Nov 2000 11:49:15 -0700 (MST) From: Terry Lambert Message-Id: <200011031849.LAA20781@usr07.primenet.com> Subject: Re: Like to commit my diskprep To: dhh@androcles.com (Duane H. Hesser) Date: Fri, 3 Nov 2000 18:49:14 +0000 (GMT) Cc: des@ofug.org (Dag-Erling Smorgrav), arch@FreeBSD.ORG, rjesup@wgate.com (Randell Jesup), mbendiks@eunet.no (Marius Bendiksen), dillon@earth.backplane.com (Matt Dillon), Cy.Schubert@uumail.gov.bc.ca (Cy Schubert - ITSD Open Systems Group) In-Reply-To: from "Duane H. Hesser" at Nov 03, 2000 08:50:38 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > You are too optimistic, when you say "nearly ten years". McKusick, > et al's paper "A Fast Filesystem for Unix", which describes the > design of the 4.2BSD FFS, and some of the testing upon which it > was based, is marked as "Revised July 27, 1983" in my copy of the > 4.2BSD manuals printed by Usenix for 4.2BSD. The copy in > /usr/share/doc/smm/05.fastfs/ is "Revised February 18, 1984". [ ... ] > Perhaps it *is* time to rethink defaults. Proabably should be done > at least once every millenium. The defaults were rethought once. The fictional geometry that FreeBSD uses today ignore sthe track-to-track seek times. This was changed in the mid 1990's to account for disks that lied about their geometry. Using the fictional geometry, all of the optimizations related to seek reduction, one of the primary foci of the FFS paper, are disabled. The block/cluster issue is one of fragmentation, not really of optimization. The ability to do clusters effectively prevents fragmentation, taking it down to 50% of a frag size, on average; for a 4k block size FS, this is 512b, and for an 8k, it's 1k, yielding unused frag averages of 256b and 512b, respectively. The clustering code is mean to ensure relative locality of much data within a single cylinder, while not penalizing the multiple process locality case with too much seeking or rotational latency. Some of the assumptions there have changed, such as inverted track recording order, to ensure sequential reads are in the cache (basically, prefetched by starting to read wherever you seek to, and returning data once the sector you had asked for has been read), and the number of sectors in a track. One potential performance benefit for large files would be to increase the number of sectors in a cylinder group. This is not necessarily as big a win as you might think, since most DB access is random. The only thing that would change this is if the average data object was larger than one cluster in size; even then, the actual optimial cluster size would really depend; for fixed size records, it would be "exactly one record". If the records weren't stored on at least 512b boundaries, this would turn into a loss, since, given random I/O (poor locality), you will still span a cluster at the start and end an average probability of: oddness = (cluster_size%rec_size) ? 1 : 0 r_per_c = (cluster_size/record_size) P = r_per_c : oddness + .5 ... the probability of a record spanning any given boundary. Generally, a DB that valued speed would frag storage by never spanning a physical media blocksize boundary (though for small records, it would probably put more than one per block, if it had a reasonable confidence that it wouldn't have to move them during a record expansion later). If you guys want to experiment with log and block structured FSs, by all means, do so, but I don't think that you'll end up optimizing things, unless your average object size is larger than 1/2 of a cluster in size, which to my mind, is a very large object indeed (18k/36k for 4k/8k block size @ 9 blocks per cluster). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message