Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Nov 2003 01:45:45 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        jos@catnook.com
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: non-root process and PID files
Message-ID:  <3FB4A449.7452A382@mindspring.com>
References:  <3F9CF3F6.8307.ABC1250@localhost> <20031111071944.GA5778@lizzy.catnook.com> <3FB360BE.779DB42F@mindspring.com> <20031113165647.GA80504@lizzy.catnook.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Jos Backus wrote:
> On Thu, Nov 13, 2003 at 02:45:18AM -0800, Terry Lambert wrote:
> > > Why use pid files at all if you could be using a process supervisor instead?
> >
> > Who supervises the supervisor?
> 
> Heh. The supervisor should be small and robust, like init. Has init died on
> you recently? Do you want to solve this problem or find Nirvana? If the
> latter, don't use computers.

OK.  We already have one of those.  We call it "init".  8-).


> > There are also the small issues of ordering (the reason you can't
> > just run everything out of /etc/ttys via init in the first place),
> 
> Sure. Hard to get right but not unsolvable. No reason you can't use process
> monitoring with something like rcNG.

We tried very hard to do this on the InterJet.  We still ended up
shooting most things in the head with large caliber bullets each
time the dial on demand interface went up or down because we did
not have the idea of hard and soft dependencies.  Even if we had
had them, though, we would still have been SOL, since many of the
Open Source programs we used cached information when they started.
Because of this, the data could get stale.

For example, say I ran sendmail and bound it to the external port
(or INADDR_ANY).  What is the host name that I should claim to the
remote host when I answer with the "200 Connected" message?  What
should I use for the argument to the "HELO" or "EHLO" for outbound
SMTP connections so that the name I use matches the name the remote
host gets on it's crosscheck for the canonical name of the machine
contacting it via a gethostbyaddr(getpeername())?

Basically, you end up with a system where you either can't cache
data, or where the cache has to be chared, or where you implement
a generic notification mechanism.  No matter how you slice it,
though, you're talking about rewriting millions of lines of code.

"Cacheing Considered Harmful".



> > multiple instances,
> 
> /service/smtpd.{external,internal}

Yeah, we did this, so that we could shoot "only" half the processes
in the head on link up/down.

It sucked.  We almost shipped a product that wouldn't hav worked
when we did the DNS split, because the dependency graph had to be
manually managed, and wasn't.


> > and removing human error from adding and removing new things to be
> > monitored.
> 
> That's a generic problem with any type of change management.

Not really.  If your configuration changes all happened in a
centralized data repository, and nobody cached anything, but got
their information from that central repository, and the interface
to the repository was a system interface (so the system could
cache on your behalf so performance didn't degrade unbearably),
THEN you might have something.  After you rewrote millions of
lines of Open Source code to use your registry instead of working
the way it currently works, which is everyone has their own poop
files.  If you are lucky, hitting them over the head with a
shovel (SIGHUP) works, and you don't have to kill and resurrect
them (you just have to wait a long time before the services become
usable again, e.g. DNS reading its config files).

Anyway, FreeBSD has steadfastly disliked the concept of a registry,
ever since Microsoft implemented it in Windows95; it's on of the
biggest "NIH" items of all time.

-- Terry



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3FB4A449.7452A382>