Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Aug 1999 21:02:32 -0400 (EDT)
From:      Zhihui Zhang <zzhang@cs.binghamton.edu>
To:        Terry Lambert <tlambert@primenet.com>
Cc:        Poul-Henning Kamp <phk@critter.freebsd.dk>, roberto@keltia.freenix.fr, freebsd-fs@FreeBSD.ORG
Subject:   Re: Help with understand file system performance
Message-ID:  <Pine.GSO.3.96.990812202049.1878A-100000@sol.cs.binghamton.edu>
In-Reply-To: <199908122314.QAA23506@usr04.primenet.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On Thu, 12 Aug 1999, Terry Lambert wrote:

> The filesystem block allocation table in directories is unique, in
> that it is generally used as a convenience for locating physical
> blocks, rather than using the standard filesystem block access
> mechanisms, when reading or writing directories.

Directory files have the same on-disk structure as regular files.
However, they can never have holes and they can only be incremented at the
end of the file in device block chunks. No directory entry can cross the
device block boundary to guarantee the atomic update. 

However, I do not know why you say the block map (direct and indirect
blocks) of a directory is only used as a convenience. I mean there is a
need to call VOP_BMAP() on a directory file. The routine ffs_blkatoff() 
calls bread(), which in turn calls VOP_BMAP(). The in-core inode does have
several fields to facilitate the insertion of new directory entries. But
we still need the block map (block allocation table). 

Directory files are also specical in that we can not write into them with
the write() system call as normal files.  They use a special routine to
grow, i.e., ufs_direnter().  By the way, we can use read() system call to
read directory files as we do with normal files. 

> There are a number of performance penalties for this, especially
> on large directories, where it is not possible to trigger sequential
> readahead through use of the getdents() system call sequentially
> accessing sequential 512b/physical_block_size extents.

I do not understand this. The read-ahead mechanism should work on any
files. I thought the reorganization of diretory entries within a directory
block when you delete an entry is an inefficiency. 

Does this issue have anything to do with the VMIO directory issue
discussed earlier this year? 
 
> The frag size can be tuned down below this (i.e. 1/4, 1/2, 1).
> 
> The only case where 1024 bytes of physical disk would be used is at
> a filesystem block size of 8192 (or greater), which, divided by 8,
> gives 1024b (or greater).

I did not realize this before.  The maximum ratio is 8.  So if the
filesystem block is 8192, the allocation unit (fragment size) can not be
512 because 8192/512 > 8.

> This is called an encapsulated two stage commit, in database terms.
> 
> For inodes, indirect blocks, and directory entry blocks, there is
> no two stage commit, because there is no indirection of their data
> contents.

I guess you mean that their data are not managed by any higher level
metadata which must be updated together. 

Thanks for your help.

-Zhihui



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.3.96.990812202049.1878A-100000>