Date: Mon, 25 Sep 2000 10:11:15 -0700 (PDT) From: "Jason C. Wells" <jcwells@nwlink.com> To: cjclark@alum.mit.edu Cc: Haikal Saadh <wyldephyre2@yahoo.com>, chat@FreeBSD.ORG Subject: Re: So what do (unix) sysadmins do anyway? Message-ID: <Pine.SOL.3.96.1000925093125.2335A-100000@utah> In-Reply-To: <20000924224054.H59015@149.211.6.64.reflexcom.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 24 Sep 2000, Crist J . Clark wrote: > > Coming from the environment that I do, I state that there is no such thing > > as a system failure. > > Huh. NOT! The admin's at our office spend a _lot_ of time fixing > systems that die from hardware failures. Sure, you have backups (data Yes, things break. I agree with everything you have said except the "NOT" part. Allow me to explain my position. My position is: We must hold _people_ culpable for systems failures and not fall into the trap that the system is to blame. (I use system here as an abstraction.) Did the hardware fail due to a design flaw? Did the hardware fail due to some error on the installers part? Did the software have a bug? Does the hardware have an advertised mean time between failures but wasn't replaced because an organization didn't implement a scheduled obsolecense plan? There is, or could be, (should even?) a human factor behind each one of these questions. The environment that I came from consistently placed the burden of success or failure on humans. We always discussed incidents. We examined lessons learned. Almost invariably, the conclusion one ended up drawing was that some person or persons made error(s) which led to an incident. Yes spurious system failures occured. After they did though, efforts were made to ensure that it never happened again. This mentality made safe work of a dangerous business. All of the lessons that produced this mentality were written in blood. Not all systems are given this level of scrutiny. I have even fallen prey to my own value judgements regarding my execution of managing systems. In the end though, it was my value judgement that was to blame. It was not the systems fault that I failed. My point is that is that a human is ultimately responsible. (At a minumum we end up cleaning up the mess.) This must be the way it is. If we get to the point where the machine is the excuse, then why bother? Now I come back to our original posters question. A system administator is needed for all the reasons you described. A system administrator should also be making those needed value judgements to prevent system failure. I hope that we have both provided a variety of reasons that a system administrator is important. I hope we have answered the question, "What does a system administrator do anyway?" OBTW. There is a double standard in this respect regarding computers. We do not accept the failure of the Corvair or the Audi TT or the Boeing 737. When the Tacoma Narrows falls we don't just say, "Sometime bridges crash. Get over it." People accept computer failures as a matter of course. It doesn't have to be that way. A human value judgement somewhere along the line leads to failure. Thank you, Jason C. Wells To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.SOL.3.96.1000925093125.2335A-100000>