From owner-freebsd-hardware@FreeBSD.ORG Fri Jan 22 14:32:05 2010 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 639B1106566B for ; Fri, 22 Jan 2010 14:32:05 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 1FCE88FC0C for ; Fri, 22 Jan 2010 14:32:05 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 91B1C46B1A; Fri, 22 Jan 2010 09:32:04 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id B055A8A025; Fri, 22 Jan 2010 09:32:03 -0500 (EST) From: John Baldwin To: Stephane LAPIE Date: Fri, 22 Jan 2010 08:46:17 -0500 User-Agent: KMail/1.12.1 (FreeBSD/7.2-CBSD-20091231; KDE/4.3.1; amd64; ; ) References: <4B56CD4C.80503@darkbsd.org> <201001210749.40575.jhb@freebsd.org> <4B59057D.9000500@darkbsd.org> In-Reply-To: <4B59057D.9000500@darkbsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201001220846.17419.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 22 Jan 2010 09:32:03 -0500 (EST) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-hardware@freebsd.org Subject: Re: DELL SAS5/E Controller bug X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jan 2010 14:32:05 -0000 On Thursday 21 January 2010 8:55:09 pm Stephane LAPIE wrote: > John Baldwin wrote: > > Gah, that should be the case that I ignore. Can you replace the second > > warnx() call I added with this: > > > > warnx("mpt_read_ioc_page(6): %s (%x)", mpt_ioc_status(IOCStatus), > > IOCStatus); > > I now get the following message : > mptutil: mpt_read_ioc_page(6): Invalid configuration page (8022) > > (Though I guess this doesn't tell anything that we did not know initially) Ah, I need to mask IOCStatus to only get the error code. The patch below should quiet the warning: Index: mpt_show.c =================================================================== --- mpt_show.c (revision 202705) +++ mpt_show.c (working copy) @@ -78,6 +78,7 @@ CONFIG_PAGE_MANUFACTURING_0 *man0; CONFIG_PAGE_IOC_2 *ioc2; CONFIG_PAGE_IOC_6 *ioc6; + U16 IOCStatus; int fd, comma; if (ac != 1) { @@ -108,7 +109,7 @@ free(man0); - ioc2 = mpt_read_ioc_page(fd, 2, NULL); + ioc2 = mpt_read_ioc_page(fd, 2, &IOCStatus); if (ioc2 != NULL) { printf(" RAID Levels:"); comma = 0; @@ -151,9 +152,11 @@ printf(" none"); printf("\n"); free(ioc2); - } + } else if ((IOCStatus & MPI_IOCSTATUS_MASK) != + MPI_IOCSTATUS_CONFIG_INVALID_PAGE) + warnx("mpt_read_ioc_page(2): %s", mpt_ioc_status(IOCStatus)); - ioc6 = mpt_read_ioc_page(fd, 6, NULL); + ioc6 = mpt_read_ioc_page(fd, 6, &IOCStatus); if (ioc6 != NULL) { display_stripe_map(" RAID0 Stripes", ioc6->SupportedStripeSizeMapIS); @@ -172,7 +175,9 @@ printf("-%u", ioc6->MaxDrivesIME); printf("\n"); free(ioc6); - } + } else if ((IOCStatus & MPI_IOCSTATUS_MASK) != + MPI_IOCSTATUS_CONFIG_INVALID_PAGE) + warnx("mpt_read_ioc_page(6): %s", mpt_ioc_status(IOCStatus)); /* TODO: Add an ioctl to fetch IOC_FACTS and print firmware version. */ @@ -541,7 +546,8 @@ for (i = 0; i <= 0xff; i++) { pinfo = mpt_pd_info(fd, i, &IOCStatus); if (pinfo == NULL) { - if (IOCStatus != MPI_IOCSTATUS_CONFIG_INVALID_PAGE) + if ((IOCStatus & MPI_IOCSTATUS_MASK) != + MPI_IOCSTATUS_CONFIG_INVALID_PAGE) warnx("mpt_pd_info(%d): %s", i, mpt_ioc_status(IOCStatus)); continue; > > I know that the rescan after removing a device is a bit messy (lots of > > messages before daX actually goes away), but I don't recall it taking such a > > long time. > > Even without rescanning the bus, the device actually goes away on its > own after the same delay of three minutes. > > > The documentation is not public. The 0x12 and 0x16 messages are events that > > I have seen. You can try talking to scottl@ as he has access to the docs. > > I could contact Scott, and here are the relevant bits of his answer : > > The basic problem is that FreeBSD still sees all of this as parallel SCSI, subject to rescans and resets and timeouts. It's fighting with the SAS controller. I'll explain more below. > > > I'm working on code that will make FreeBSD more aware of how SAS works. It's several months from being done, though. > > Reposting here for reference the meaning of 0x12 and 0x16 events : > 0x12 : SAS Link status changed > 0x16 : SAS Discovery Event > > I was wondering if using an Areca SAS controller could be a better > solution, but Scott's answer has me wondering if this is a common issue > to all SAS controllers on FreeBSD. I have no idea, I'd have to defer to Scott in that regard. -- John Baldwin