Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Sep 2000 10:11:15 -0700 (PDT)
From:      "Jason C. Wells" <jcwells@nwlink.com>
To:        cjclark@alum.mit.edu
Cc:        Haikal Saadh <wyldephyre2@yahoo.com>, chat@FreeBSD.ORG
Subject:   Re: So what do (unix) sysadmins do anyway?
Message-ID:  <Pine.SOL.3.96.1000925093125.2335A-100000@utah>
In-Reply-To: <20000924224054.H59015@149.211.6.64.reflexcom.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 24 Sep 2000, Crist J . Clark wrote:

> > Coming from the environment that I do, I state that there is no such thing
> > as a system failure.
> 
> Huh. NOT! The admin's at our office spend a _lot_ of time fixing
> systems that die from hardware failures. Sure, you have backups (data

Yes, things break.  I agree with everything you have said except the "NOT"
part. Allow me to explain my position.  My position is:  We must hold
_people_ culpable for systems failures and not fall into the trap that the
system is to blame. (I use system here as an abstraction.)

Did the hardware fail due to a design flaw?  Did the hardware fail due to
some error on the installers part?  Did the software have a bug?  Does the
hardware have an advertised mean time between failures but wasn't replaced
because an organization didn't implement a scheduled obsolecense plan? 
There is, or could be, (should even?) a human factor behind each one of
these questions.

The environment that I came from consistently placed the burden of success
or failure on humans.  We always discussed incidents.  We examined lessons
learned.  Almost invariably, the conclusion one ended up drawing was that
some person or persons made error(s) which led to an incident.

Yes spurious system failures occured.  After they did though, efforts were
made to ensure that it never happened again.  This mentality made safe
work of a dangerous business.  All of the lessons that produced this
mentality were written in blood. 

Not all systems are given this level of scrutiny.  I have even fallen prey
to my own value judgements regarding my execution of managing systems.  In
the end though, it was my value judgement that was to blame. It was not
the systems fault that I failed.

My point is that is that a human is ultimately responsible.  (At a minumum
we end up cleaning up the mess.)  This must be the way it is.  If we get
to the point where the machine is the excuse, then why bother? 
 
Now I come back to our original posters question.  A system administator
is needed for all the reasons you described.  A system administrator
should also be making those needed value judgements to prevent system
failure.  I hope that we have both provided a variety of reasons that a
system administrator is important.  I hope we have answered the question,
"What does a system administrator do anyway?" 

OBTW. There is a double standard in this respect regarding computers.  We
do not accept the failure of the Corvair or the Audi TT or the Boeing 737.
When the Tacoma Narrows falls we don't just say, "Sometime bridges crash.
Get over it."

People accept computer failures as a matter of course.  It doesn't have to
be that way.  A human value judgement somewhere along the line leads to
failure. 

Thank you,
Jason C. Wells




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.SOL.3.96.1000925093125.2335A-100000>