From owner-freebsd-hackers  Mon Aug 21 19:57:53 1995
Return-Path: hackers-owner
Received: (from majordom@localhost)
          by freefall.FreeBSD.org (8.6.11/8.6.6) id TAA26898
          for hackers-outgoing; Mon, 21 Aug 1995 19:57:53 -0700
Received: from cs.weber.edu (cs.weber.edu [137.190.16.16])
          by freefall.FreeBSD.org (8.6.11/8.6.6) with SMTP id TAA26888
          for <hackers@freebsd.org>; Mon, 21 Aug 1995 19:57:47 -0700
Received: by cs.weber.edu (4.1/SMI-4.1.1)
	id AA28223; Mon, 21 Aug 95 20:59:18 MDT
From: terry@cs.weber.edu (Terry Lambert)
Message-Id: <9508220259.AA28223@cs.weber.edu>
Subject: Re: Making a FreeBSD NFS server
To: peter@bonkers.taronga.com (Peter da Silva)
Date: Mon, 21 Aug 95 20:59:17 MDT
Cc: hackers@freebsd.org
In-Reply-To: <199508220109.UAA25345@bonkers.taronga.com> from "Peter da Silva" at Aug 21, 95 08:09:09 pm
X-Mailer: ELM [version 2.4dev PL52]
Sender: hackers-owner@freebsd.org
Precedence: bulk

> > Write me an NFS fsck.  8-).
> 
> You do the fsck on the server.
> 
> I've fone fscks over opennet.

I'm talking about "client/server connection crash recovery" to recover
the state after a failure of a stateful protocol.

The problem is, of course, that one must maintain all idempotent state
on both the client and the server -- that is any state that can't be
derived from other state.

an "NFS fsck" would recover the state, including causing any open files
and locks to be asserted as they were asserted before the crash.

Since I could have two clients, and the server crashed, then the state
has to be reinstantiated by both clients.

Consider that if I have client 1 with a lock outstanding and client 2
with a lock request outstanding and blocked on client 1's lock, if
client 2 begins crash recovery prior to client 1, it will assert its
outstanding lock request -- which the server will grant, not having
client 1's context to use as a wakeup address.

Basically, "the machine" becomes the network, and as a result MTBF
goes way, way down.

That was my point in the "Write me an NFS fsck" statement.

> > Yeah, this isn't really a result of the statefulness or statelessness of
> > the transport.  It's that fact that NFS doesn't implement an ISO layer
> > 5 or 6.
> 
> Devices are inherently stateful. You can't resend a missing block to a
> remote tape drive, because you can't seek it.

This is an unrecoverable failure -- an EIO will be returned to the
caller in this case by the server when the server comes back up.  Lock
state recovery is never guaranteed in any case.

In the hypotheitcal case of devies exported as ordinary files by a
stateless NFS, the server will verify the existance of a file lock
on the device before permitting I/O.  After recovery, a lock will not
exist.  The momentary state of the device post-recovery is irrelevent.

This type of NFS extension is simple in the extreme if one has a lock
daemon and the ability to query lock state from the server.  It's even
transparent, as long as you don't need the device files locally -- ie:
a remotely mounted /dev directory.  But as you say, this is what devfs
is for: so a diskless client will carry its own device instances not
necessarily exported as device nodes into the file system name space,
but rather as files without delete permission and directories without
create permission.

> > To combat that, you maintain the open instance -- by asserting an NFS
> > lock, which causes the lockd to convert the handle into an open fd in
> > the lockd processes address space -- an open instance held for the
> > duration of the lock on the remote system.
> 
> In which case you now have a stateful interface.

Yes, you do, although we can afford to lose the state for the vast
majority of the items exported.

It's amazingly funny that the state of a device on a remote machine is
linked to the physical state of the device... 8-).

By your argument, NFS itself is stateful, so you have no room to complain
about it by calling it "stateless".  A correct implementation supports
locking, which is stateful.

> And what do you do about named pipes?

Is that if you are the reader, or if you are the writer, or if you
are both and choose to do both over the network interface?

The correct implementation of named pipes probably does not involve
using the open hang to implement semaphoring of one of the processes
using the pipe to communicate.  You can approximate this by using an
O_NDELAY open and compensate for the O_NDELAY open by using a protocol
on the data pushed through the pipe.

In reality, one should use sockets instead of named pipes in any case.

But, yes, there exists a possibility for data loss when a machine
crashes.  The directory containing the named pipe could have been
in the process of compaction or renaming an entry in the block containing
the names pipe name on your OpenNet system.  What do you do when the
pipe ends up in Lost+Found?

For that matter, how do you implement a recoverey mechanism for a named
pipe in any case?

I don't think this is a valid argument for implementing a fully stateful
protocol.


However, don't let me stand in your way if you want to implement one.

8-).


					Terry Lambert
					terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.