Date: Sat, 31 May 1997 06:16:07 +1000 From: Bruce Evans <bde@zeta.org.au> To: bde@zeta.org.au, dfr@nlsystems.com Cc: current@freebsd.org Subject: Re: disk cache challenged by small block sizes Message-ID: <199705302016.GAA19539@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
>> It seems to fix all the speed problems (except ufs is still slower with
>> a larger fs blocksize) and the block leak in ex2tfs.
>
>If you roll vfs_bio.c back to rev 1.115, does it affect the speed of ufs
>with 8k blocksize? I am not sure whether my changes to vfs_bio would
>affect that.
That wasn't it. The slowdown was caused by ffs deciding to allocate all
the blocks starting with the first indirect block on a slower part of
the disk. It attempts to pessimize all cases, but is confused by fuzzy
rounding :-).
Details:
1. The file system has size 96MB (exactly).
2. The defaults for a block size of 4K give 10 cylinder groups (cg's) with
9 of size 10MB and one smaller one (slightly less than 6MB because of
special blocks before the first cg). The average size is about 9.6MB.
3. The defaults for a block size of 8K give 3 cg's with 2 of size 32MB
and one slightly smaller one. The average size is about 32MB.
4. I ran iozone on a new file system, so there was just one directory and
one file.
5. The inode for the file was allocated in cg #0 in both cases.
6. The direct blocks were allocated in the same cg as the inode in both
cases.
7. The first indirect block and subsequent data blocks are allocated on a
cg with >= the average number of free blocks. (The comments before
ffs_blkpref() about using a rotor are wrong. fs->fs_cgrotor is never
used.)
8. In case (2), cg #0 is chosen because it has almost 10MB-metatada free
and the average is about 9.6MB-metadata.
9. In case (3), cg #1 is chosen since it has significantly less than
32MB-metadata free and the average is about 32MB-metadata.
10. In case (3), cg #1 starts a full 1/3 of the way towards the slowest
parts of the disk and the speed is significantly slower there.
I think the combination of algorithms behind (6) and (7) is often wrong.
It's silly to put the direct blocks on a different cg than the indirect
blocks immediately following them. The silliest case is for a new file
system with all cg's of the same size. Then exact calculation of the
average number of free blocks would result in the indirect blocks always
starting on cg #1 despite cg #0 being almost empty when the first indirect
block is allocated.
I added a bias towards using the same cg as the inode for the first indirect
block. This is probably too strong.
Bruce
diff -c2 ffs_alloc.c~ ffs_alloc.c
*** ffs_alloc.c~ Mon Mar 24 14:21:27 1997
--- ffs_alloc.c Sat May 31 03:08:56 1997
***************
*** 689,692 ****
--- 686,700 ----
startcg %= fs->fs_ncg;
avgbfree = fs->fs_cstotal.cs_nbfree / fs->fs_ncg;
+ /*
+ * Prefer the same cg as the inode if this allocation
+ * is for the first block in an indirect block.
+ */
+ if (lbn == NDADDR) {
+ cg = ino_to_cg(fs, ip->i_number);
+ if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree / 2) {
+ fs->fs_cgrotor = cg;
+ return (fs->fs_fpg * cg + fs->fs_frag);
+ }
+ }
for (cg = startcg; cg < fs->fs_ncg; cg++)
if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) {
***************
*** 694,698 ****
return (fs->fs_fpg * cg + fs->fs_frag);
}
! for (cg = 0; cg <= startcg; cg++)
if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) {
fs->fs_cgrotor = cg;
--- 702,706 ----
return (fs->fs_fpg * cg + fs->fs_frag);
}
! for (cg = 0; cg < startcg; cg++)
if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) {
fs->fs_cgrotor = cg;
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199705302016.GAA19539>
