From owner-freebsd-hardware@FreeBSD.ORG Wed Mar 17 20:43:14 2010 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D7AC91065672; Wed, 17 Mar 2010 20:43:14 +0000 (UTC) (envelope-from cowens@greatbaysoftware.com) Received: from portcityhosting.com (bayringfw.portcityweb.com [64.140.243.92]) by mx1.freebsd.org (Postfix) with ESMTP id 7D43D8FC0C; Wed, 17 Mar 2010 20:43:13 +0000 (UTC) Received: from [127.0.0.1] ([173.14.128.81]) by portcityhosting.com with MailEnable ESMTP; Wed, 17 Mar 2010 16:43:12 -0400 Message-ID: <4BA13EDF.8040909@greatbaysoftware.com> Date: Wed, 17 Mar 2010 16:43:11 -0400 From: Charles Owens MIME-Version: 1.0 To: John Baldwin References: <4B75AB2D.2090306@greatbaysoftware.com> <201002191315.13796.jhb@freebsd.org> <4B9E928B.2070409@greatbaysoftware.com> <201003171114.11601.jhb@freebsd.org> In-Reply-To: <201003171114.11601.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit X-WatchGuard-AntiVirus: part scanned. clean action=allow X-ME-Bayesian: 0.000000 Cc: freebsd-hardware@freebsd.org Subject: Re: mptutil(8) segfault on IBM xSeries 3550 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Mar 2010 20:43:14 -0000 On 3/17/2010 11:14 AM, John Baldwin wrote: > On Monday 15 March 2010 4:03:23 pm Charles Owens wrote: > >> John Baldwin wrote: >> >>> On Friday 19 February 2010 1:01:38 pm Charles Owens wrote: >>> >>> >>>> John Baldwin wrote: >>>> >>>> >>>>> On Monday 15 February 2010 5:25:15 pm Charles Owens wrote: >>>>> >>>>> >>>>>> Charles Owens wrote: >>>>>> >>>>>> >>>>>>> Howdy, >>>>>>> >>>>>>> We're working with IBM hardware (xSeries 3550) that has an >>>>>>> mpt-based RAID controller... after initial success with testing the >>>>>>> mptutil utility, now operations other than "show adapter" and "show >>>>>>> volume" are resulting in segfaults. >>>>>>> >>>>>>> While it was working properly we created and removed volumes several >>>>>>> times, force-failed drives, and just generally put it through its >>>>>>> paces... and all seemed fine. Then, after a reboot, it suddenly >>>>>>> > started > >>>>>>> failing with segfault as described, and nothing we do has helped to >>>>>>> > get > >>>>>>> it out of this state (including trying to use the LSI in-BIOS manager >>>>>>> > to > >>>>>>> create/delete volumes -- which in and of itself works fine). >>>>>>> >>>>>>> We found recent thread >>>>>>> http://docs.freebsd.org/cgi/mid.cgi?4B56CD4C.80503 and hoped that it >>>>>>> might somehow relate... and even tried the patch that John Baldwin >>>>>>> posted, but to no avail. >>>>>>> >>>>>>> Has anyone seen this behavior and/or have a suggested fix or >>>>>>> > workaround? > >>>>>>> >>>>>>> Here's the output of "mptutil show adapter": >>>>>>> >>>>>>> mpt0 Adapter: >>>>>>> Board Name: SR-BR10i >>>>>>> Board Assembly: L3-25116-01H >>>>>>> Chip Name: C1068E >>>>>>> Chip Revision: UNUSED >>>>>>> RAID Levels: RAID0, RAID1, RAID1E >>>>>>> RAID0 Stripes: 64K >>>>>>> RAID1E Stripes: 64K >>>>>>> RAID0 Drives/Vol: 1-10 >>>>>>> RAID1 Drives/Vol: 2 >>>>>>> RAID1E Drives/Vol: 3-10 >>>>>>> >>>>>>> >>>>>>> This work is being done using FreeBSD 8.0-RELEASE-p2 + PAE. >>>>>>> >>>>>>> >>>>>>> >>>>>> I should add that the RAID controller in question is the IBM >>>>>> ServeRAID-BR10i SAS/SATA Controller which is based on the LSI 1068E >>>>>> processor, as described here: >>>>>> http://www-01.ibm.com/common/ssi/rep_ca/4/872/ENUSAG09-0104/index.html >>>>>> >>>>>> >>>>> Try this updated patch. It should fix the problems with 'mptutil show >>>>> > drives' > >>>>> displaying all daX devices in the system rather than just the ones for >>>>> > the > >>>>> mptX bus. I had incorrectly interpreted the XPT matches as being an AND >>>>> rather than an OR. This changes the code to first do a lookup for the >>>>> > logical > >>>>> "path" (SCSI bus) for mptX devices and then do a second lookup to fetch >>>>> > any > >>>>> daX devices on that path. I tested it on a machine with an mpt >>>>> > controller and > >>>>> a USB disk. Unfortunately I wasn't able to test any of the RAID stuff, >>>>> > just > >>>>> 'show drives'. This mpt(4) controller doesn't support RAID either, so I >>>>> > was > >>>>> also able to verify the fix you had already tested for cleaning up 'show >>>>> adapter' output in that case. >>>>> >>>>> [patch omitted] >>>>> >>>>> >>>> John, >>>> >>>> The patch appears to have resolved the problem. We're still banging on >>>> it, but so far it looks very good! >>>> >>>> Thanks very much! >>>> >>>> >>> Excellent, thanks! I've committed it to HEAD and will MFC it in a week or >>> so. It is probably too late to make 7.3 however. >>> >>> >> Again, thanks for the patch... overall it is working well... we're now >> able to successively do what we need to do with RAID system. We are, >> though, seeing some sor of error messages: >> >> # mptutil show volumes >> mpt0 Volumes: >> Id Size Level Stripe State Write-Cache Name >> mptutil: mpt_query_disk got 4 matches, expected 2 >> 0 ( 279G) RAID-1 OPTIMAL Disabled >> >> # mptutil show config >> mpt0 Configuration: 1 volumes, 2 drives >> mptutil: mpt_query_disk got 4 matches, expected 2 >> volume 0 (279G) RAID-1 OPTIMAL spans: >> drive 1 (279G) ONLINE SATA >> drive 0 (279G) ONLINE SATA >> spare pools: 0 >> > Are you sure this is a fixed binary? The new binary doesn't print out that > message anymore, it only ways 'got %d matches, expected 1'. Also, the 4 > instead of 2 is consistent with the old bug in that the two Linux virtual > floppies (da1 and da2) would be reported as extra for 'mptutil show drives' in > this case I think. You're right! It appears on one of my two devel systems I misapplied the patch somehow. Much better now... thanks!