Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Apr 2000 15:15:29 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        vova@express.ru
Cc:        ticso@cicely.de (Bernd Walter), freebsd-fs@FreeBSD.ORG
Subject:   Re: SCSI bus
Message-ID:  <200004041515.IAA08886@usr06.primenet.com>
In-Reply-To: <Pine.BSF.4.21.0004021534050.24770-100000@lanturn.kmost.express.ru> from "vova@express.ru" at Apr 02, 2000 03:38:10 PM

next in thread | previous in thread | raw e-mail | index | archive | help
> > > > I don't expect this to work because the readonly side
> > > > can't know when the incore informations outdates.
> > > 
> > > Yes, it can be a problem, but may be this may be solved by
> > > disabling any cache on read-only side (or setting expire
> > > time in one sec) ?
> > 
> > You can't diable readcaching completely.
> > Say you need the inode, then you will read it and finaly use it.
> > You don't reread it for every single byte you access which
> > creates some kind of read cache. And there are much more
> > complex points like this sample.
> 
> Ok, have kernel algorithm to "expire" cached vnodes ? Or vnodes only
> pushed out by new pages ?
> 
> In my case writes - relative rare case then reads and I can wait for some
> timeout while my in-kernel vnodes will dropped to see new from disk


There is a general problem here.  It stems from the ability to
disassociate the ihash cache from the vnode cache; this is a
direct result of the operating system, not the file system,
owning the vnodes in question.

Fixing this is complex.  It requires coupling the pool retention
times on all file systems, such that the OS can ask the file
systems to flush cached data in low resource situations.

Is this worth it?

In general, "yes, a little", and in specific cases, "yes, a lot".

The "yes, a little" case is the locality of reference case
for freed vnodes with cached clean data, which have not been
reclaimed.  This is, in effect, any clean vnode from which
the ihash entry has been divorced.  Even though the data is
available in core without going to disk, you must go to disk
to reread the data, because the association between the inode
and vnode can not be recovered.

In effect, this is the difference between a "write through" and
a "write back" cache: the "write back" cache has significantly
better associativity, and therefore significantly better
performance.


The specific "yes, a lot" case is another associativity case,
and in particular, comes into play when the machine is being
used as a file server for Windows machines (SAMBA, etc.).

You can get a 35% increase in speed for directory and file
name manipulation operations out of SAMBA very quickly.

A peculiar property of jamming the file operations into the
DOS interface is that the DOS file system has historically
been pased on "FAT", or "file allocation tables".  This is
well known, but the net opshot of this architecture is often
ignored: in this architecture, the directory entry _is_ the
inode.  This is very un-UNIX-like.

The main impact this has on file operations is that any BIOS
or even protected mode directory lookup operation, including
file opens, will return stat information.

In UNIX, this requires that the SMB (or NCP, etc.) file server
running as a hosted OS perform two operations: the requested
operation, and an additional "stat" operation to get the rest
of the data which, while it may not be used by the client, it
might be, and there is no way to tell, so it must be returned.

Throwing aside performance increases available by providing a
seperate system call interface that mimics the interface that
is expected to be implemented by the file system by these
clients (all operations return stat information, thus saving
50% of all system calls, and globbing in the kernel, which
would allow pushing back only data that was relevent to the
client request across the user/kernel protection domain
boundary), there is still a significant win to be had.

Prefault the vnodes when iterating directory entries, opening
files, or performing other operations.

This technique realizes an immediate performance boost for
SAMBA and other file server software that services Windows
clients.

There are a couple of problems with this approach.  Performance
on most normal UNIX applications is reduced.  Without kernel
globbing, you will fault a lot of vnodes that don't need
faulting, because they are not relevent to the client.  But
the first can be addressed by making it a flag on the process;
the easiest way to do this is to have the process open its own
/proc entry, and set the flag on itself, and have the prefault
occur in the FS when this falg is set.  The second wants another
API (which is just as well; if we define that a directory is not
a file, then we can ignore the atime update requirements of
POSIX, since they only apply to the POSIX defined getdents(2)
interface -- getdirentries(2), for BSD-land).


The operating system ownership of vnodes damages the utility of
these techniques, but not enough to make them not useful.  On
the other hand, an operating system where the file system owns
its vnodes, and where there are provisions for kernel globbing
and call collapse (putting the stat in with the open and directory
and file name manipulation code), etc., will be able to achieve
much better performance than an operating system without these
capabilities (such as the current FreeBSD).  UnixWare has some
of these capabilities built in; that's to be expected, since
they were invented by myself, Ed Lane, Dan Fritch, Dan Grice,
and others at Novell in the early 1990's.  They are among the
reasons that NetWare for UNIX, written in C and running on UnixWare,
outperformed Native NetWare, written in assembly, and running on
the bare silicon, when run on identical Intel hardware.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200004041515.IAA08886>