Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Dec 2001 06:35:14 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Lamont Granquist <lamont@scriptkiddie.org>
Cc:        freebsd-hackers@FreeBSD.org
Subject:   Re: What a FBSD FS needs to do?
Message-ID:  <3C1E02A1.98BFFE5@mindspring.com>
References:  <20011217014953.G15950-100000@coredump.scriptkiddie.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------02B49AC6CDE37E406593A945
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Lamont Granquist wrote:
> 
> Can anyone give a brief overview (or point to one) of what a FS in FreeBSD
> needs to do to interact with the rest of the OS?  The general picture I've
> got is of some code which interacts with the VFS layer above it and the
> block I/O layer down below it.  It is this correct?  And what are the APIs
> in those layers?  (and how does the FS interact with the VM?)

Briefly, there are ~185 kernel entry points which are consumed by
the FFS code.  To see these, go into the directory were you build
your kernel and have object files lying around, and conunt them,
e.g.:

	# cd /sys/compile/GENERIC
	# sh
	# ld -o /tmp/ffsobj ffs* ufs* >/tmp/ffs.link 2>&1
	# cd /tmp
	# vi ffs.link
	:1,$g/:$/d
	:1,$g/more undefined/d
	:1,$s/'$//
	:1,$s/^.*`//
	:x
	# sort -o ffs.sort < ffs.link
	# uniq < ffs.sort > ffs.uniq
	# wc -l ffs.uniq
	185

I have attached an example of the result, for my older 4.x based system
to this email.

If you look at these, you will see 5 broad categories:

1)	Kernel support services.

	These are things like bzero, copyin, printf, uiomove, timeout,
	tsleepm untimeout, etc., and are required support functions
	that aren't really FS specific.  Another OS would call then
	"generic kernel services", but wouldn't have the whole story.

2)	VFS services.

	These are things like vfs_add_vnops, vfs_export, vfs_timestamp,
	etc., and are required for registration and recognition of the
	FS as a VFS.  There are also services for manipulation of VFS
	specific kernel resources in this category.

3)	Vnode services.

	These are things like all of the vop_* operations, vget, vgone,
	NDFREE, and so on.  These services services represent both VFS
	service, which the VFS can call for stacking reasons (it calls
	these services, rather than calling the VFS specific routines
	it defines in orcer to abstract the VFS so that you can do VFS
	stacking, and things won't break), and VFS specific resource
	that are managed by the OS (such as vnodes, etc.).

	Note: The NDFREE reference is actually an implemenation error,
	since it breaks the "caller allocates/caller frees" paradigm;
	this is a long-standing layering issue.

4)	Virtual memory and I/O services.

	These are things like malloc, free, cache_enter, getblk, bread,
	vm_object_deallocate, vinvalbuf, etc..  These services represent
	the VFS' interaction with the VM system, and, as a result, the
	buffer cache.  The spl* functions, which are used for concurrency
	control, as well as the locking primitives, fall into this
	category.

	It's important to note that most of these operations only exist
	in "local media" FSs... if your VFS were implementing a
	stacking layer, you would not have almost any of these used by
	it, since the services consumed would be pretty much covered
	in #3, above.

5)	Miscellaneous functions.

	Into this category, I lump all the inconvenient to explain
	functions, like the spec_* functions, which implement the
	special device operations exported by the VFS (when you
	look up a device, you actually get a specfs vnode back,
	instead of an FFS vinode, but since the backing object is
	an FFS object, you have to reference it through the FFS),
	and, similarly, the fifo_* operations (which are used to
	manage named pipes -- FIFO objects -- in the same way.  You
	would also see "__divdi3" here, as well as other systhetic
	functions which are, in reality, an artifact of the compiler.

Practically, nearly half of these undefined symbols could be made
to go away, with little to no effect on performance.  In particular,
the descriptor references could be factored out at FS instance time,
when the mount takes place, and a stack is "frozen" as a mounted FS
instance.  The way you would do this is to sort the VOP and VFSOP
lists, respectively, and then build direct references, rather than
descriptor references, and access them by index, rather than descirptor
(this would be slightly faster, too).

Other references could additionally be eliminated, as they are
really the result of sloppy references (e.g. the spec_* and fifo_*
entries: the first by mount-based externalization and inheritance,
and the seconfd by pure inheritance, enforced at instance time).

A lot of the b* buffer cache operations should probably be via an
ops structure dereference; this means an additional pointer
dereference at runtime, so some of the wins you got by sorting
the VOP list and using an index, insteaqd of areverse lookup of
the descriptor reference, get paid back at that time, but overall
you are still better off.  Ig you have ever programmed an IFS under
Windows, you are familiar with the concept of function table
reference definition at IFS registration time: this is basically
tyhe same approach as there).

The total external exposure could therefore be dropped to perhaps
30 or more symbols, which would make understanding things a whole
lot easier.


As far as the externally exposed symbols are concerned, the place
to look for these is in the VFSOPS and VOPS tables; these are
contained in /sys/ufs/ffs/ffs_vfsops.c and /sys/ufs/ffs/ffs_vnops.c,
in descriptor tables.  These tables define the VFS consumer interface
used to talk to an FS by any VFS layer consumer.  There are three
consumers of the VFS layer at present: the system calls, the static
references to things like the ufs_*, fifo_*, and spec_* operations
by the ffs_* code, and the NFS server code.  Putatively, there is
also VFS stacking modules, but since only trivial versions of those
actually work (they have overly complex interaction with the VM
system, in particular, the cache coherency), so they don't really
count as something you have to worry about supporting at this point,
at least not any more than any other VFS supports them directly.

This would all be significantly better handled in the context of
a journal of a new, independent FS port to FreeBSD, since it would
be possible to address the more arcan issues and the issues of ideal
vs. practical kernel interaction, at a much more abstract (and thus
useful to future FS writers) level.  Using the FFS as your example
is not generally a good idea, particularly since it has some
additional complexity for things like Soft Updates and legacy stuff
that make it a really bad example of "how to do things the right way
when you are starting from scratch".

I'm pretty sure Kirk and others would agree with this assessment.

In any case, there's your "brief" overview.

You would do well to read John Heidemann's thesis, and the documentation
for the FICUS framework out of UCLA, on which the stacking code is based,
as well as Matt Dillon's small articles that give a brief overview of the
FreeBSD unified VM and buffer cache system.  See:

	ftp://ftp.cs.ucla.edu/pub/ficus/
	http://www.daemonnews.org/

-- Terry
--------------02B49AC6CDE37E406593A945
Content-Type: text/plain; charset=us-ascii;
 name="ffs.uniq"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="ffs.uniq"

M_TEMP
NDFREE
__divdi3
__moddi3
addaliasu
addlog
allocbuf
bawrite
bcmp
bcopy
bdevvp
bdirty
bdwrite
biodone
biowait
bowrite
bqrelse
bread
breadn
brelse
bremfree
buf_wmesg
bwillwrite
bwrite
bzero
cache_enter
cache_purge
cluster_read
cluster_write
copyin
copyinstr
copyout
copystr
crfree
curproc
desiredvnodes
dev2udev
devsw
devtoname
dsname
fifo_printinfo
fifo_vnodeop_p
fifo_vnoperate
free
getblk
geteblk
getmicrouptime
getnewvnode
groupmember
hashinit
iftovt_tab
incore
knote
lbolt
lf_advlock
lockinit
lockmgr
lockmgr_printinfo
log
major
makedev
malloc
malloc_init
malloc_uninit
minor
mntvnode_slock
module_register_init
mountlist
namei
nchstats
panic
pmap_zero_page
printf
psignal
random
relookup
rootdev
rootvp
scanc
securelevel
skpc
spec_vnodeop_p
spec_vnoperate
speedup_syncer
splbio
splx
suser_xxx
sysctl__debug_children
sysctl__vfs_children
sysctl_handle_int
tablefull
time_second
timeout
tsleep
uiomove
untimeout
uprintf
vcount
vflush
vfs_add_vnodeops
vfs_bio_awrite
vfs_bio_clrbuf
vfs_busy_pages
vfs_cache_lookup
vfs_export
vfs_export_lookup
vfs_getnewfsid
vfs_getvfs
vfs_modevent
vfs_mountedon
vfs_object_create
vfs_rm_vnodeops
vfs_stdextattrctl
vfs_stduninit
vfs_timestamp
vget
vgone
vinvalbuf
vm_freeze_copyopts
vm_object_reference
vm_object_vndeallocate
vm_page_free_toq
vm_page_zero_invalid
vn_close
vn_isdisk
vn_lock
vn_open
vn_rdwr
vnode_pager_generic_getpages
vnode_pager_generic_putpages
vnode_pager_setsize
vop_access_desc
vop_advlock_desc
vop_balloc_desc
vop_bmap_desc
vop_bwrite_desc
vop_cachedlookup_desc
vop_close_desc
vop_create_desc
vop_default_desc
vop_defaultop
vop_freeblks_desc
vop_fsync_desc
vop_getattr_desc
vop_getpages_desc
vop_inactive_desc
vop_ioctl_desc
vop_islocked_desc
vop_link_desc
vop_lock_desc
vop_lookup_desc
vop_mkdir_desc
vop_mknod_desc
vop_mmap_desc
vop_open_desc
vop_pathconf_desc
vop_poll_desc
vop_print_desc
vop_putpages_desc
vop_read_desc
vop_readdir_desc
vop_readlink_desc
vop_reallocblks_desc
vop_reclaim_desc
vop_remove_desc
vop_rename_desc
vop_rmdir_desc
vop_setattr_desc
vop_stdislocked
vop_stdlock
vop_stdpoll
vop_stdunlock
vop_strategy_desc
vop_symlink_desc
vop_unlock_desc
vop_whiteout_desc
vop_write_desc
vprint
vput
vrecycle
vref
vrele
vtruncbuf
vttoif_tab
wakeup

--------------02B49AC6CDE37E406593A945--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C1E02A1.98BFFE5>