From owner-freebsd-current  Tue Sep 23 23:17:35 1997
Return-Path: <owner-freebsd-current>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.7/8.8.7) id XAA05242
          for current-outgoing; Tue, 23 Sep 1997 23:17:35 -0700 (PDT)
Received: from usr07.primenet.com (tlambert@usr07.primenet.com [206.165.6.207])
          by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id XAA05230;
          Tue, 23 Sep 1997 23:17:29 -0700 (PDT)
Received: (from tlambert@localhost)
	by usr07.primenet.com (8.8.5/8.8.5) id XAA04899;
	Tue, 23 Sep 1997 23:17:23 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199709240617.XAA04899@usr07.primenet.com>
Subject: Re: New timeout capability (was Re: cvs commit:....)
To: dyson@FreeBSD.ORG
Date: Wed, 24 Sep 1997 06:17:23 +0000 (GMT)
Cc: syssgm@dtir.qld.gov.au, freebsd-current@FreeBSD.ORG
In-Reply-To: <199709230920.EAA00190@dyson.iquest.net> from "John S. Dyson" at Sep 23, 97 04:20:00 am
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Sender: owner-freebsd-current@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

> I could possibly imagine a reasonable use for a 16K basic allocation size.

8k is where I typically stop, mostly because of frag size.  1k frags
are about my limit.  8-).


> I think that 4K performs pretty darned well anyway though.  In the
> real world, I wouldn't think that one would see much of a performance
> difference between 4K and 16K. 

For 8k, there used to be about a 40% improvement over 4k for iozone; I
haven't really tried this for about 5 moths now, though.

I expect a bit of a drop for 16k because of the 2k frags, actually.

I'd thing that 32k would go back up -- perhaps way up -- because of
4k page aligned frags being good for you.

It really matters how sequentially you are accessing your files.

For random writes less not equal to 4k, there is a requirement of
read-before-write.  Technically, you could take this down to 512b,
since the VM has the bitmap for it.  If so, block sizes over 4k
(with frags larger than a disk block) would get relatively more
expensive *fast*, as long as you were doing I/O on block boundries.

I'm not sure whether I/O on a block boundry for a page causes a read
before write or not.  It probably does; this is technically not needed,
so theres a tiny optimization there for better iozone numbers.  8-).

If the read-before-write could be done on a block basis using a block
bitmap to indicate which 512b chunks had been read and which hadn't,
and you were guaranteed read-before-write, and if you wrote a whole
block, you'd map it read without reading, and you respected this bitmap
when responding to the dirty bit, well... that'd be a lot of work.  8-).
It would also give a more uniform win for block aligned accesses in
block increments (ndbm?), and certainly make IOZONE happier, as well
as making MSDOS FS happier.

So to recap, a 512b aligned write of block 3 in a new 4k page would
result in b00001000 in the bitmap, and the dirty bit set on the page.
A 43b write in block 5 not crossing a block boundry would result in
b00100000 in the bitmap, a 512b read of that block from disk, and a
43b write somewhere in the block, with the dirty bit set on the page.
Probably a usesful optimization for fixed size record based random
record I/O for records 2k or smaller (so page locality is less of an
issue, and so that you shouldn't just read the whole page anyway).

I don't know what the impact would be on the pager in the general
case; probably not pretty at all, actually.  Maybe John could comment
(probably to csay I'm insane ;-)).


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.