From owner-freebsd-current  Wed Nov 15 13:04:17 1995
Return-Path: owner-current
Received: (from root@localhost)
          by freefall.freebsd.org (8.6.12/8.6.6) id NAA25923
          for current-outgoing; Wed, 15 Nov 1995 13:04:17 -0800
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id NAA25857
          for <current@FreeBSD.ORG>; Wed, 15 Nov 1995 13:03:56 -0800
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id NAA01590; Wed, 15 Nov 1995 13:58:27 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199511152058.NAA01590@phaeton.artisoft.com>
Subject: Re: ISP state their FreeBSD concerns
To: babkin@hq.icb.chel.su (Serge A. Babkin)
Date: Wed, 15 Nov 1995 13:58:27 -0700 (MST)
Cc: terry@lambert.org, karl@mcs.com, current@FreeBSD.ORG
In-Reply-To: <199511150924.OAA05961@hq.icb.chel.su> from "Serge A. Babkin" at Nov 15, 95 02:24:57 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 6783      
Sender: owner-current@FreeBSD.ORG
Precedence: bulk

> > > > Well, NFS lockd, for one.
> > > 
> > > I'm sorry, I said unclear. I meant the file-based "implicit" locking methods.
> > > I think lockd must make synchronization anyway and it must flush the caches.
> > 
> > No, no, evil, no!
> 
> Even the _client_ cache (if it is present) of the file on which this lock
> is executed ? I think that file locking is the simplest example of 
> [limited] transaction processing. IMHO when the file (or its part) gets
> unlocked everyone who tries to read from it must get the updated data,
> not old. And when the file (or its part) gets locked it means that the 
> process wants to see the current state of locked data and change it 
> without any intervention.

The lockd can't sync data in client cache.  Only the client can do that.

> As I can understand they are even explicitly prohibited (your original
> "1)" paragraph). But why ? Is there some principial problem or just
> nobody had implemented async NFS client (or simply I never saw it) yet ?

The principle that a server is like a disk: when the write returns,
you are guaranteed that a read several days later will return the
same data by virtue of the semantics.

Caching breaks this because cache commit order is not guaranteed to be
the same as write order, and a series of idempotent operations will
not result in the same ordering on cache commits.  Unless you put a
lot of work into the cache code to make it so.

> > If the client did a "window" worth of async writes an did an fsync() before
> > letting go, then it would work.
> 
> How about this algorithm :
> 
> client_nfs_fsync():
> 	If the file is marked as "write failed" return ERROR;
> 	Make a local simple lock of file to prevent write()s during
> 	fsync();
> 	Wait until all outstanding write() requests are completed;
> 	Unlock the file;
> 	Return OK;

An "fsync()" implies a cache.  What gets synced is the client cache
contents, not any server contents.  The writes at fsync() time are
as synchronous as the writes without caching.

Like I said, you'd have to put a lot of work into the cache for this.

The problem is implied state in the update.

Consider the case of a data file and an index file for that data.
The relationship between the files is based on implied state in the
application.  For simplicity, we'll assume a two stage commit so
we can make the write ordering requirement on the cache, and we can
make the requirement that the client update not be cached across the
transaction.  Caching across the transaction will incorrectly allow
the commit state to advance in the client application.  It thinks
it is OK to do a write because it thinks the previous write in the
staged transaction has gone to permanent media.

DOS has an "fsync()" mechanism to handle this: INT 21, AH=0x0d.  And
Win32 has a similar mechanism implemented at the IFS layer using
FS_CloseFile() with flags values of CLOSE_HANDLE or CLOSE_FOR_PROCESS,
both of which aren't real resource deallocations and cause the
buffers to be flushed.  But most progams do not expect a cache and
thus do not use these functions.  If they aren't called, hooking them
does no good.

> client_nfs_lock/unlock():
> 	If the file is marked as "write failed" return ERROR;
> 	Make a local simple lock of file to prevent write()s during
> 	fsync();
> 	Wait until all outstanding write() requests are completed;
> 	Issue an NFS lock/unlock request and wait until it completes;
> 	Unlock the file;
> 	Return OK;

I believe that if you are to use locking as the trigger and the guard,
you have to have the lock asserted during the entire cache cycle, and
you must flush/invalidate the (write/read) cache when you deassert the
lock.

A local lock is insufficient.

The problem comes when some other client updates the same block
before you do.


[ ... ]

> Of course it is simple and obvious, but what can you, Unix Wizards, say about
> it ? Is it wrong ?

Distributed cache coherency is a hard problem.  You can only partially
leverage lock state to implement a coherency mechanism.


The biggest pain in the rear is that we have the UNIX side source
code, but the DOS client source code is proprietary.

> > Basically, it fails because the client is stupid and NFS is not a connection
> > oriented protocol.
> 
> Client can be made clever :-) and the connectionless nature of NFS prtocol
> should not disallow this buffering.

Distributed cache coherency, again.

> > How else are you going to support findfirst/findnext and short name
> > semantics?!?
> 
> I have experimented with short-named files :-) Really it is not a big
> problem if you will put only files with dos-formatted names in the
> PCNFS-mounted directories. I don't know about findfirst/findnext problem,
> Tsofts's PCNFS with which I experimented worked well with "auth=none"
> option.

I've experimented with having the short name as an attribute of the
file in an attributed file system, though I did the storage in the
directory instead of the metadata proper.

You still need to know what kind of client you have to enforce the
semantics.

My personal favorite is a CDROM with RR extensions that you want turned
off because the consumer is a DOS client.

A file server can be considered as exporting file system interfaces
that are views on a single file syste.  The local users of the file
system, from that point of view, are just another client type.

It pays to really support the nameing and name translation coherency
between multiple name spaces.  The PCNFSD does this with on the fly
generation of short names.  But limiting the names by convention
instead of by semantic is a poor substitute.  The first time you
drop a long file name into an exported directory, you are screwed.

> I have looked at pcnfsd.x and most of request types I saw are printer-related,
> only two of them are PCNFSD[2]_AUTH that checks user name and password and
> returns uid, gid and other related information and PCNFSD2_MAPID that
> performs translations between names and IDs. I see no need to send these
> requests every time we do some NFS operation.

You're misunderstanding.

They take the place of corresponding UNIX client requests, they are not
in addition to them.

I believe the Sun PCNFSD actually supports NFSv3 style multiple directory
entry+stat information per directory traversal request.  This is a big win
because of the way DOS uses directory lookups.

Actually, I need to talk to the two guys here (at Artisoft) who are doing
the Win95 NFS client code to ensure that it's optimal for a UNIX server
as well as an NT/Win95 server.  They might also be able to give me some
information on the Sun and B&W PCNFS client code.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.