From owner-freebsd-hackers  Sat Nov 18 01:33:31 1995
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.6.12/8.6.6) id BAA05605
          for hackers-outgoing; Sat, 18 Nov 1995 01:33:31 -0800
Received: from hq.icb.chel.su (icb-rich-gw.icb.chel.su [193.125.10.34])
          by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id BAA05554
          for <hackers@freebsd.org>; Sat, 18 Nov 1995 01:32:53 -0800
Received: from localhost (babkin@localhost) by hq.icb.chel.su (8.6.5/8.6.5) id OAA01222; Sat, 18 Nov 1995 14:33:22 +0500
From: "Serge A. Babkin" <babkin@hq.icb.chel.su>
Message-Id: <199511180933.OAA01222@hq.icb.chel.su>
Subject: Re: NFS client caching in UNIX
To: terry@lambert.org (Terry Lambert)
Date: Sat, 18 Nov 1995 14:33:21 +0500 (GMT+0500)
Cc: terry@lambert.org, hackers@freebsd.org
In-Reply-To: <199511171721.KAA05681@phaeton.artisoft.com> from "Terry Lambert" at Nov 17, 95 10:21:35 am
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 5022      
Sender: owner-hackers@freebsd.org
Precedence: bulk

> > > Since the client won't make one INT 21 call and then make another while
> > > that one is pending (DOS is a non-reentrant real mode interrupt handler),
> > > you can't cache one and not return.  You *must* return.
> > 
> > How about to use a like algorithm for an Unix client ?
> 
> NFS client caching for UNIX is possible to make safe in a restricted
> case.  There are many detailed papers on this that are a better source
> than me on this, and which will cover the topic in greater detail.  But
> here are some highlights:

Thank you!

> 
> You can client cache fairly safely if the file is opened read-only.  This
> won't work for some circumstances: mostly programs that incorrectly use
> files for IPC.  Most of these would be older programs.
> 
> You can client cache if you have advisory notification (ala madvise) to
> tell you what rules of access the application will follow.
> 
> You can client cache (in fact, it is a major win in many ways) if the
> file is being opened to execute it.  There is unfortunately no distinction
> in the kernel in the way that a file is openened in order to provide
> cache hints to the underlying FS.  This is (IMO) a deficiency in the
> kernel level file I/O interface used for both file and executable image
> opening, and prevents cleanup of the "execute a program from an NFS server
> using the file as swap store" VEXEC non-propagation bug that lets you
> crash programs on an NFS client from either the server, or in some cases
> another client.  The fix involves changing the internal-use-interface,
> providing a "hint flag" which is part of the internal-use-interface
> semantics, and flagging the NFS file system as "remote" (this last is
> the only part which is implemented).  Then you still have to implement
> the client side cache, which will be complicated by the unified VM/cache
> model.
> 
> You can use write-locking as a lease to cache, *if* you have working
> NFS locking.  You *must* flush the cache when the lock goes away.  I
> believe this is thrice flawed. An application that makes multiple changes
> to a single record on disk is broken in the first place is flawed... the
> changes should be made in core to reduce the implied state.  The cache
> flush must be synchronous, so it's questionable whether trading a "delay
> now" for a "delay later" isn't itself inherently flawed.  Finally, there
> exists implied state of a small amount of data using index locking, a
> typical approach to reduce real locking calls for databases in third
> normal form (or higher).  Caching will fail to be asserted for file data
> which may in fact be the majority of operations.
> 
> In closing, it's arguable that any application that uses database
> techniques should be implemented as a transaction oriented database
> client and server.  If this is done, it's unlikely that occasions that
> allow for NFS client cacheing (other than file execution) will ever occur.

You are describing the read cache here. I'm speaking about the write cache.

Consider the logic of read and write. The reading needs to spend a round
trip time per each request if we can't predict the request sequence.
The writing does not need a round trip time because after the request is
transferred to the network (in the case of an absolutely reliable network) we
can forget about it and let the program work and generate the next write
request. Or in the case of an unreliable network we can use a windowed protocol
for writes so that when one write request travel through the network, being
executed and reply travels through the network the next write request(s) can
be produced. So, obviously, the writes must be more effective than writes.
But what do we see with NFS ? Reads are about 5 times more effective than 
writes. Why? Because the network is unreliable and we can get an error
(in the case of soft mount) that should be reported immediately to the
application and because the application can use some order of writes
(possibly in different files) to implicitly synchronize their "transactions". 

But if the application uses explicit syncronization and is not very 
sensitive in the case of failure which write() precisely returns 
failure (presence of at least one
failure during "transaction" means that the whole "transaction" fails) we
can "delay" reporting the failure until any other write request before the
end of "transaction" or the end of "transaction" itself. The "transaction" 
can be commited by the calls close(), fsync(), unlock() and possibly lock().
So we can have windowed writes between the "transaction" delimiters.

Yes, not all applications will work well under these assumptions, but most
will do. So, we can add such write cache as an option. In the most cases
we will get significant write performance increase, in all other cases
we can simply disable this option for the mounts that will need syncronous
writes.

		Serge Babkin

! (babkin@hq.icb.chel.su)
! Headquarter of Joint Stock Commercial Bank "Chelindbank"
! Chelyabinsk, Russia