From owner-freebsd-current@FreeBSD.ORG Wed Mar 12 20:29:19 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E4A910656C0 for ; Wed, 12 Mar 2008 20:29:19 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id F14AA8FC1E for ; Wed, 12 Mar 2008 20:29:18 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8s) with ESMTP id 235230165-1834499 for multiple; Wed, 12 Mar 2008 16:27:08 -0400 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m2CKT0s9017578; Wed, 12 Mar 2008 16:29:00 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-current@freebsd.org Date: Wed, 12 Mar 2008 16:03:33 -0400 User-Agent: KMail/1.9.7 References: <47D82EDF.3090602@realtsp.com> In-Reply-To: <47D82EDF.3090602@realtsp.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200803121603.33963.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Wed, 12 Mar 2008 16:29:01 -0400 (EDT) X-Virus-Scanned: ClamAV 0.91.2/6218/Wed Mar 12 15:07:21 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Oliver Schonrock Subject: Re: monitoring mpt raid arrays X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Mar 2008 20:29:19 -0000 On Wednesday 12 March 2008 03:28:31 pm Oliver Schonrock wrote: > Hi > > We have a Dell SC1435 with a SAS 5i raid controller (which is an OEM LSI > Logic SAS1064, supported under FreeBSD by the mpt driver). The array > works just fine, but monitoring the array is also important and there > seems to be no support under mpt(4) right now: > > http://nico.schottelius.org/documentations/freebsd/freebsd-raid-monitoring > > I snooped around in the source code and found these snippets > > mpt_raid.c > > const char * > mpt_disk_state(struct mpt_raid_disk *disk) > { > switch (disk->config_page.PhysDiskStatus.State) { > case MPI_PHYSDISK0_STATUS_ONLINE: > return ("Online"); > case MPI_PHYSDISK0_STATUS_MISSING: > return ("Missing"); > case MPI_PHYSDISK0_STATUS_NOT_COMPATIBLE: > return ("Incompatible"); > case MPI_PHYSDISK0_STATUS_FAILED: > return ("Failed"); > case MPI_PHYSDISK0_STATUS_INITIALIZING: > return ("Initializing"); > case MPI_PHYSDISK0_STATUS_OFFLINE_REQUESTED: > return ("Offline Requested"); > case MPI_PHYSDISK0_STATUS_FAILED_REQUESTED: > return ("Failed per Host Request"); > case MPI_PHYSDISK0_STATUS_OTHER_OFFLINE: > return ("Offline"); > default: > return ("Unknown"); > } > } > > /* > * Update in-core information about RAID support. We update any entries > * that didn't previously exists or have been marked as needing to > * be updated by our event handler. Interesting changes are displayed > * to the console. > */ > int > mpt_refresh_raid_data(struct mpt_softc *mpt) > { > > > ..... > > mpt_disk_prt(mpt, mpt_disk, "%s\n", mpt_disk_state(mpt_disk)); > > .... > > Which looks to me like the raid controller/driver would report when > things go wrong and how it is dealing with it etc. The messages printed > by the driver make it into dmesg output. > > Slight Aside: > ------------- > We had a problem with: > > case MPI_EVENT_QUEUE_FULL: > { > struct cam_sim *sim; > struct cam_path *tmppath; > struct ccb_relsim crs; > PTR_EVENT_DATA_QUEUE_FULL pqf = > (PTR_EVENT_DATA_QUEUE_FULL) msg->Data; > lun_id_t lun_id; > > mpt_prt(mpt, "QUEUE FULL EVENT: Bus 0x%02x Target > 0x%02x Depth " > "%d\n", pqf->Bus, pqf->TargetID, pqf->CurrentDepth); > > > which we "fixed" by writing an rc script to run camcontrol like this: > http://www.zulustips.com/2007/09/06/mpt0-queue-full-event-on-dell-sas-5ir.html > > (not sure what this actually does, but it works...) > > Anyway these QUEUE FULL EVENT message were appearing in dmesg output. > > So, my very simplistic question is: > > While there is no mpt cli management interface to query the state of the > raid array, can I just write a cron driven script which checks dmesg > output every 10min, say, and notifies the administrator if it finds any > messages from mpt (some simple grepping and diff'ing against dmesg.boot, > should be able to keep this quiet unless there is really something to > report). > > Will this work? Is it a reasonable "work around" for the missing raid > monitoring support mpt arrays? > > Thanks in advance. The problem is that on your box the mpt_raid stuff isn't working because it probes for the RAID metadata too early, so you don't get any of the raid messages. -- John Baldwin