From owner-freebsd-hackers Mon Aug 21 19:57:53 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.FreeBSD.org (8.6.11/8.6.6) id TAA26898 for hackers-outgoing; Mon, 21 Aug 1995 19:57:53 -0700 Received: from cs.weber.edu (cs.weber.edu [137.190.16.16]) by freefall.FreeBSD.org (8.6.11/8.6.6) with SMTP id TAA26888 for ; Mon, 21 Aug 1995 19:57:47 -0700 Received: by cs.weber.edu (4.1/SMI-4.1.1) id AA28223; Mon, 21 Aug 95 20:59:18 MDT From: terry@cs.weber.edu (Terry Lambert) Message-Id: <9508220259.AA28223@cs.weber.edu> Subject: Re: Making a FreeBSD NFS server To: peter@bonkers.taronga.com (Peter da Silva) Date: Mon, 21 Aug 95 20:59:17 MDT Cc: hackers@freebsd.org In-Reply-To: <199508220109.UAA25345@bonkers.taronga.com> from "Peter da Silva" at Aug 21, 95 08:09:09 pm X-Mailer: ELM [version 2.4dev PL52] Sender: hackers-owner@freebsd.org Precedence: bulk > > Write me an NFS fsck. 8-). > > You do the fsck on the server. > > I've fone fscks over opennet. I'm talking about "client/server connection crash recovery" to recover the state after a failure of a stateful protocol. The problem is, of course, that one must maintain all idempotent state on both the client and the server -- that is any state that can't be derived from other state. an "NFS fsck" would recover the state, including causing any open files and locks to be asserted as they were asserted before the crash. Since I could have two clients, and the server crashed, then the state has to be reinstantiated by both clients. Consider that if I have client 1 with a lock outstanding and client 2 with a lock request outstanding and blocked on client 1's lock, if client 2 begins crash recovery prior to client 1, it will assert its outstanding lock request -- which the server will grant, not having client 1's context to use as a wakeup address. Basically, "the machine" becomes the network, and as a result MTBF goes way, way down. That was my point in the "Write me an NFS fsck" statement. > > Yeah, this isn't really a result of the statefulness or statelessness of > > the transport. It's that fact that NFS doesn't implement an ISO layer > > 5 or 6. > > Devices are inherently stateful. You can't resend a missing block to a > remote tape drive, because you can't seek it. This is an unrecoverable failure -- an EIO will be returned to the caller in this case by the server when the server comes back up. Lock state recovery is never guaranteed in any case. In the hypotheitcal case of devies exported as ordinary files by a stateless NFS, the server will verify the existance of a file lock on the device before permitting I/O. After recovery, a lock will not exist. The momentary state of the device post-recovery is irrelevent. This type of NFS extension is simple in the extreme if one has a lock daemon and the ability to query lock state from the server. It's even transparent, as long as you don't need the device files locally -- ie: a remotely mounted /dev directory. But as you say, this is what devfs is for: so a diskless client will carry its own device instances not necessarily exported as device nodes into the file system name space, but rather as files without delete permission and directories without create permission. > > To combat that, you maintain the open instance -- by asserting an NFS > > lock, which causes the lockd to convert the handle into an open fd in > > the lockd processes address space -- an open instance held for the > > duration of the lock on the remote system. > > In which case you now have a stateful interface. Yes, you do, although we can afford to lose the state for the vast majority of the items exported. It's amazingly funny that the state of a device on a remote machine is linked to the physical state of the device... 8-). By your argument, NFS itself is stateful, so you have no room to complain about it by calling it "stateless". A correct implementation supports locking, which is stateful. > And what do you do about named pipes? Is that if you are the reader, or if you are the writer, or if you are both and choose to do both over the network interface? The correct implementation of named pipes probably does not involve using the open hang to implement semaphoring of one of the processes using the pipe to communicate. You can approximate this by using an O_NDELAY open and compensate for the O_NDELAY open by using a protocol on the data pushed through the pipe. In reality, one should use sockets instead of named pipes in any case. But, yes, there exists a possibility for data loss when a machine crashes. The directory containing the named pipe could have been in the process of compaction or renaming an entry in the block containing the names pipe name on your OpenNet system. What do you do when the pipe ends up in Lost+Found? For that matter, how do you implement a recoverey mechanism for a named pipe in any case? I don't think this is a valid argument for implementing a fully stateful protocol. However, don't let me stand in your way if you want to implement one. 8-). Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.