From owner-cvs-all@FreeBSD.ORG Wed Oct 17 13:42:50 2007 Return-Path: Delivered-To: cvs-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 405AB16A46B; Wed, 17 Oct 2007 13:42:50 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 1195813C49D; Wed, 17 Oct 2007 13:42:49 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by elvis.mu.org (Postfix) with ESMTP id CF4141A4D84; Wed, 17 Oct 2007 06:16:45 -0700 (PDT) From: John Baldwin To: "Constantine A. Murenin" Date: Wed, 17 Oct 2007 09:07:06 -0400 User-Agent: KMail/1.9.7 References: <200710161702.00008.jhb@freebsd.org> <471537CA.9080807@FreeBSD.org> In-Reply-To: <471537CA.9080807@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200710170907.07832.jhb@freebsd.org> Cc: Scott Long , src-committers@freebsd.org, Alexander Leidinger , cvs-src@freebsd.org, cvs-all@freebsd.org, Poul-Henning Kamp , Wilko Bulte Subject: Re: cvs commit: src/etc Makefile sensorsd.conf src/etc/defaults rc.conf src/etc/rc.d Makefile sensorsd src/lib/libc/gen sysctl.3 src/sbin/sysctl sysctl.8 sysctl.c src/share/man/man5 rc.conf.5 src/share/man/man9 Makefile sensor_attach.9 src/sys/conf f X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2007 13:42:50 -0000 On Tuesday 16 October 2007 06:14:34 pm Constantine A. Murenin wrote: > On 16/10/2007 17:01, John Baldwin wrote: > > > On Monday 15 October 2007 10:57:48 pm Constantine A. Murenin wrote: > > > >>On 15/10/2007, John Baldwin wrote: > >> > >>>On Monday 15 October 2007 09:43:21 am Alexander Leidinger wrote: > >>> > >>>>Quoting Scott Long (from Mon, 15 Oct 2007 > >>> > >>>01:47:59 -0600): > >>> > >>>>>Alexander Leidinger wrote: > >>>>> > >>>>>>Quoting Poul-Henning Kamp (from Sun, 14 Oct > >>>>>>2007 17:54:21 +0000): > >>>> > >>>>>>>listen to the various mumblings about putting RAID-controller status > >>>>>>>under sensors framework. > >>>>>> > >>>>>>What's wrong with this? Currently each RAID driver has to come up > >>>>>>with his own way of displaying the RAID status. It's like saying > >>>>>>that each network driver has to implement/display the stuff you can > >>>>>> see with ifconfig in its own way, instead of using the proper > >>>>>>network driver interface for this. > >>>>>> > >>>>> > >>>>>For the love of God, please don't use RAID as an example to support > > > > your > > > >>>>>argument for the sensord framework. Representing RAID state is > > > > several > > > >>>>>orders of magnitude more involved than representing network state. > >>>>>There are also landmines in the OpenBSD bits of RAID support that are > >>>>>best left out of FreeBSD, unless you like alienating vendors and > > > > risking > > > >>>>>legal action. Leave it alone. Please. I don't care what you do with > >>>>>lmsensors or cpu power settings or whatever. Leave RAID out of it. > >>>> > >>>>Talking about RAID status is not talking about alienating vendors. I > >>>>don't talk about alienating vendors and I don't intent to do. You may > >>>>not be able to display a full blown RAID status with the sensors > >>>>framework, but it allows for a generic "wors/works not" or > >>>>"OK/degraded" status display in drivers we have the source for. This > >>>>is enough for status monitoring (e.g., nagios). > >>> > >>>As I mentioned in the thread on arch@ where people brought up objections > > > > that > > > >>>were apparently completely ignored, this is far from useful for RAID > >>>monitoring. For example, if my RAID is down, which disk do I need to > >>>replace? Again, all this was covered earlier and (apparently) ignored. > >>>Also, what strikes me as odd is that I didn't see this patch posted again > > > > for > > > >>>review this time around before it was committed. > >> > >>This has been addressed back in July. You'd use bioctl to see which > >>exact disc needs to be replaced. Sensorsd is intended for an initial > >>alert about something being wrong. > > > > > > In July you actually said you weren't sure about bioctl(8). :) But also, this > > model really isn't very sufficient since it doesn't handle things like drives > > going away, etc. You really need to maintain a decent amount of state to > > keep all that, and this is far easier done in userland rather than in the > > kernel. However, you can choose to ignore real-world experience if you > > choose. > > > > Basically, by having so little data in hw.sensors if I had to write a RAID > > monitoring daemon I would just not use hw.sensors since it's easier for me to > > figure out the simple status myself based on the other state I already have > > to track (unless you write an event-driven daemon based on messages posted by > > the firmware in which case again you wouldn't use hw.sensors for that either). > > There is no other daemon that you'd need, you'd simply use sensorsd for > this. You could write a script that would be executed by sensorsd if a > certain logical disc drive sensor changes state, and then this script > would call the bio framework and give you additional details on why the > state was changed. That's actually not quite good enough as, for example, I want to keep yelling about a busted volume on a periodic basis until its fixed. Also, having a volume change state doesn't tell me if a drive was pulled. On at least one RAID controller firmware I am familiar with, the only way you can figure this out is to keep track of which drives are currently present with a generation count and use that to determine when a drive goes away. Even my monitoring daemon for ata-raid has to do this since the ata(4) driver just detaches and removes a drive when it fails and you have no way to figure out which drive died as the kernel thinks that drive no longer exists. -- John Baldwin