From owner-freebsd-current Wed Nov 15 13:04:17 1995 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.6.12/8.6.6) id NAA25923 for current-outgoing; Wed, 15 Nov 1995 13:04:17 -0800 Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id NAA25857 for ; Wed, 15 Nov 1995 13:03:56 -0800 Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id NAA01590; Wed, 15 Nov 1995 13:58:27 -0700 From: Terry Lambert Message-Id: <199511152058.NAA01590@phaeton.artisoft.com> Subject: Re: ISP state their FreeBSD concerns To: babkin@hq.icb.chel.su (Serge A. Babkin) Date: Wed, 15 Nov 1995 13:58:27 -0700 (MST) Cc: terry@lambert.org, karl@mcs.com, current@FreeBSD.ORG In-Reply-To: <199511150924.OAA05961@hq.icb.chel.su> from "Serge A. Babkin" at Nov 15, 95 02:24:57 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 6783 Sender: owner-current@FreeBSD.ORG Precedence: bulk > > > > Well, NFS lockd, for one. > > > > > > I'm sorry, I said unclear. I meant the file-based "implicit" locking methods. > > > I think lockd must make synchronization anyway and it must flush the caches. > > > > No, no, evil, no! > > Even the _client_ cache (if it is present) of the file on which this lock > is executed ? I think that file locking is the simplest example of > [limited] transaction processing. IMHO when the file (or its part) gets > unlocked everyone who tries to read from it must get the updated data, > not old. And when the file (or its part) gets locked it means that the > process wants to see the current state of locked data and change it > without any intervention. The lockd can't sync data in client cache. Only the client can do that. > As I can understand they are even explicitly prohibited (your original > "1)" paragraph). But why ? Is there some principial problem or just > nobody had implemented async NFS client (or simply I never saw it) yet ? The principle that a server is like a disk: when the write returns, you are guaranteed that a read several days later will return the same data by virtue of the semantics. Caching breaks this because cache commit order is not guaranteed to be the same as write order, and a series of idempotent operations will not result in the same ordering on cache commits. Unless you put a lot of work into the cache code to make it so. > > If the client did a "window" worth of async writes an did an fsync() before > > letting go, then it would work. > > How about this algorithm : > > client_nfs_fsync(): > If the file is marked as "write failed" return ERROR; > Make a local simple lock of file to prevent write()s during > fsync(); > Wait until all outstanding write() requests are completed; > Unlock the file; > Return OK; An "fsync()" implies a cache. What gets synced is the client cache contents, not any server contents. The writes at fsync() time are as synchronous as the writes without caching. Like I said, you'd have to put a lot of work into the cache for this. The problem is implied state in the update. Consider the case of a data file and an index file for that data. The relationship between the files is based on implied state in the application. For simplicity, we'll assume a two stage commit so we can make the write ordering requirement on the cache, and we can make the requirement that the client update not be cached across the transaction. Caching across the transaction will incorrectly allow the commit state to advance in the client application. It thinks it is OK to do a write because it thinks the previous write in the staged transaction has gone to permanent media. DOS has an "fsync()" mechanism to handle this: INT 21, AH=0x0d. And Win32 has a similar mechanism implemented at the IFS layer using FS_CloseFile() with flags values of CLOSE_HANDLE or CLOSE_FOR_PROCESS, both of which aren't real resource deallocations and cause the buffers to be flushed. But most progams do not expect a cache and thus do not use these functions. If they aren't called, hooking them does no good. > client_nfs_lock/unlock(): > If the file is marked as "write failed" return ERROR; > Make a local simple lock of file to prevent write()s during > fsync(); > Wait until all outstanding write() requests are completed; > Issue an NFS lock/unlock request and wait until it completes; > Unlock the file; > Return OK; I believe that if you are to use locking as the trigger and the guard, you have to have the lock asserted during the entire cache cycle, and you must flush/invalidate the (write/read) cache when you deassert the lock. A local lock is insufficient. The problem comes when some other client updates the same block before you do. [ ... ] > Of course it is simple and obvious, but what can you, Unix Wizards, say about > it ? Is it wrong ? Distributed cache coherency is a hard problem. You can only partially leverage lock state to implement a coherency mechanism. The biggest pain in the rear is that we have the UNIX side source code, but the DOS client source code is proprietary. > > Basically, it fails because the client is stupid and NFS is not a connection > > oriented protocol. > > Client can be made clever :-) and the connectionless nature of NFS prtocol > should not disallow this buffering. Distributed cache coherency, again. > > How else are you going to support findfirst/findnext and short name > > semantics?!? > > I have experimented with short-named files :-) Really it is not a big > problem if you will put only files with dos-formatted names in the > PCNFS-mounted directories. I don't know about findfirst/findnext problem, > Tsofts's PCNFS with which I experimented worked well with "auth=none" > option. I've experimented with having the short name as an attribute of the file in an attributed file system, though I did the storage in the directory instead of the metadata proper. You still need to know what kind of client you have to enforce the semantics. My personal favorite is a CDROM with RR extensions that you want turned off because the consumer is a DOS client. A file server can be considered as exporting file system interfaces that are views on a single file syste. The local users of the file system, from that point of view, are just another client type. It pays to really support the nameing and name translation coherency between multiple name spaces. The PCNFSD does this with on the fly generation of short names. But limiting the names by convention instead of by semantic is a poor substitute. The first time you drop a long file name into an exported directory, you are screwed. > I have looked at pcnfsd.x and most of request types I saw are printer-related, > only two of them are PCNFSD[2]_AUTH that checks user name and password and > returns uid, gid and other related information and PCNFSD2_MAPID that > performs translations between names and IDs. I see no need to send these > requests every time we do some NFS operation. You're misunderstanding. They take the place of corresponding UNIX client requests, they are not in addition to them. I believe the Sun PCNFSD actually supports NFSv3 style multiple directory entry+stat information per directory traversal request. This is a big win because of the way DOS uses directory lookups. Actually, I need to talk to the two guys here (at Artisoft) who are doing the Win95 NFS client code to ensure that it's optimal for a UNIX server as well as an NT/Win95 server. They might also be able to give me some information on the Sun and B&W PCNFS client code. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.