Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Jun 96 9:53:24 PDT
From:      Jon Inouye <jinouye@cse.ogi.edu>
To:        hackers@freefall.freebsd.org
Subject:   Re: gated & pccard don't get along
Message-ID:  <9606151653.AA02302@indurain.cse.ogi.edu>
In-Reply-To: <199606131843.LAA11283@freefall.freebsd.org>; from "owner-hackers-digest@freefall.freebsd.org" at Jun 13, 96 11:43 am

next in thread | previous in thread | raw e-mail | index | archive | help

Andrew McRae <amcrae@cisco.com> writes:
> Having thought long and hard about this, I have come to the
> conclusion that having hot-swappable resources and interfaces
> is a great idea in theory, but the kernel (and parts of the user-land
> and daemons) generally assumes that devices are not going to
> appear and disappear at random intervals.  It is pretty scary to
> think of the changes required to really make the system understand
> this concept fully. The net code is a good example; whilst the
> insert/remove scripts can already do some of these things (like
> add default routes etc.), we are really working with a bit
> of glue around the edges, and not tackling some of the core
> problems.  One issue is the way various bits get informed about
> changes [e.g a card being pulled]. The need is for programs
> to be started or stopped, signals sent, kernel tables to be
> modified, daemons to be informed [e.g gated] etc.

Hot swapping network cards without obtrusive side effects is the idea
behind what I call Physical Media Independence (PMI). The modular construction
of the 4.4BSD network stack (and great documentation thanks to Wright&Stevens)
makes it relatively straightforward to support PMI in FreeBSD. All bindings
between network data structures and interfaces need to be re-bound when an
interface becomes available (such as when a card gets unplugged). This
includes ARP entries, route entries, multicast groups, BPF attachments, and
all the applications interacting with the network stack (such as routed/gated).
Connection that are dependent on invariant IP addresses are harmed if the
new interface does not have the same name as the old. Mobile IP helps out
in this case, allowing the available interface to assume the unavailable
interface's IP address. TCP's retransmission timer also causes problems
if you take too long to switch media (exponential backoff). The local
retransmission timer can be reset as part of the re-binding process, but
the remote retransmission timer is harder to access. One solution (published
by Ramon Caceres, IEEE Journal of Selected Areas in Communications, June '95)
is to send a triple ACK to take advantage of the long fat networks (LFN)
support in contemporary networking stacks. I haven't had a chance to
implement this yet.

I haven't thought a lot about a router (active rather than passive
routed/gated) with hot swappable interfaces. When you unplug an interface,
you can't send any messages to neighboring hosts/routers informing them of
your intention to disconnect! In this case, the user may have inform the
system before disconnecting. In the worst cause, the router timeouts will
notice the disconnection and take appropriate action.

> Berny Goodheart and I were talking about this, and his
> suggestion is to implement a registry scheme, I imagine with
> a graph of dependancies and some IPC etc. Tandem (Berny's
> employer) uses such a scheme to implement hot swap
> in their high availability architecture.  Having worked on such a scheme
> myself, I appreciate the complexity.  Unfortunately, you can't implement
> just a *little* bit of the scheme.  If you do *any* form of
> hot swap, you have to go the whole hog. Cisco also support
> hot-swap, and even when it's designed in from day one, it is
> still a significant effort to make it work.

I built such a dependency scheme using `grep' and with lots of hand analysis.
A colleague is working on a better "guard tracking" tool using compilation
techniques such as dependency graphs and alias analysis. This only works for
the kernel sources, and we need another interface to allow applications to
`register' themselves for events of interest to them. For example, the
WinSock-2 API has a NetworkAvailability field in the flowspec used during
Connect(). When the kernel cannot maintain this guarantee, an event is sent
to the application. What to do with legacy applications that don't use such
a meta-interface is an open question.

If you have a reference to the Tandem scheme I'd appreciate it. PMI is
a lot like fault tolerance, except the faults are "expected" and the
network media is often heterogeneous. The basic concepts of fault detection,
fault isolation, and recovery are the same.

Sidenote: Jim Binkley (http://www.cs.pdx.edu/~jrb), at Portland State,
	  is working on adding Mobile IP to FreeBSD.

--
Jon Inouye                           EMAIL: jinouye@cse.ogi.edu
Distributed Systems Research Group   WWW  : http://www.cse.ogi.edu/~jinouye/
Computer Science and Eng. Dept.      PHONE: (503) 690-1009, FAX: (503) 690-1553
Oregon Graduate Institute of Science & Technology (aka OGI)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9606151653.AA02302>