From owner-freebsd-hardware@FreeBSD.ORG Wed Jan 20 16:05:47 2010 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E24DD106566B for ; Wed, 20 Jan 2010 16:05:47 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id A3ADF8FC16 for ; Wed, 20 Jan 2010 16:05:47 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 28A0246B35; Wed, 20 Jan 2010 11:05:47 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 3B9808A025; Wed, 20 Jan 2010 11:05:46 -0500 (EST) From: John Baldwin To: Stephane LAPIE Date: Wed, 20 Jan 2010 11:05:26 -0500 User-Agent: KMail/1.12.1 (FreeBSD/7.2-CBSD-20091231; KDE/4.3.1; amd64; ; ) References: <4B56CD4C.80503@darkbsd.org> <201001200848.16874.jhb@freebsd.org> <4B571CB7.3020303@darkbsd.org> In-Reply-To: <4B571CB7.3020303@darkbsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201001201105.26367.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 20 Jan 2010 11:05:46 -0500 (EST) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-hardware@freebsd.org Subject: Re: DELL SAS5/E Controller bug X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jan 2010 16:05:48 -0000 On Wednesday 20 January 2010 10:09:43 am Stephane LAPIE wrote: > John Baldwin wrote: > > On Wednesday 20 January 2010 4:30:52 am Stephane LAPIE wrote: > >> Hello list, > >> > >> Basically I'm experiencing the same problem as described here : > >> https://forums.freebsd.org/showthread.php?t=9407 (linking for reference) > >> > >> Drives disconnections are not recognized instantly, and instead I get > >> the following dmesg entries : > >> mpt0: mpt_cam_event: 0x16 > >> mpt0: mpt_cam_event: 0x16 > >> > >> (Sometimes I also get "mpt0: mpt_cam_event: 0x12" events) > >> > >> This is really crippling as this litterally paralyzes the ZFS pool until > >> the controller finally comes to its senses (...or until a disk gets > >> replugged in, which provokes a flush of all the buffered failed SCSI > >> requests). > >> > >> Hardware is recognized as : > >> mpt0@pci0:6:8:0: class=0x010000 card=0x1f041028 chip=0x00541000 rev=0x01 > >> hdr=0x00 > >> vendor = 'LSI Logic (Was: Symbios Logic, NCR)' > >> device = 'SAS 3000 series, 8-port with 1068 -StorPort' > >> class = mass storage > >> subclass = SCSI > >> > >> Did anyone else experience this, or find a proper work-around ? > > > > Invoke 'camcontrol rescan' after removing a drive. mptutil(8) does the > > equivalent when adding and removing volumes to make up for the driver not > > automatically rescanning. > > I already tried reset/rescan via camcontrol, but after removing a drive, > the process freezes (process status "D", Ctrl+T in terminal shows it's > in a "cbwait" state, it can't be bg'ed). I did not wait for a hardware > timeout, I tried replugging the drive, which released the ZFS and > camcontrol locks. > > > Also, I tried poking around with mptutil and could obtain the following > information, if it can be of any help : > > freebsd-r610# mptutil -u 0 show adapter > mpt0 Adapter: > Board Name: SAS5e > Board Assembly: > Chip Name: C1068 > Chip Revision: UNUSED > RAID Levels: none > mptutil: Reading config page header failed: Invalid configuration page > > (The above error message should be normal since this is not a RAID > controller, though a bit jarring) This patch should fix that: Index: mpt_show.c =================================================================== --- mpt_show.c (revision 202640) +++ mpt_show.c (working copy) @@ -78,6 +78,7 @@ CONFIG_PAGE_MANUFACTURING_0 *man0; CONFIG_PAGE_IOC_2 *ioc2; CONFIG_PAGE_IOC_6 *ioc6; + U16 IOCStatus; int fd, comma; if (ac != 1) { @@ -108,7 +109,7 @@ free(man0); - ioc2 = mpt_read_ioc_page(fd, 2, NULL); + ioc2 = mpt_read_ioc_page(fd, 2, &IOCStatus); if (ioc2 != NULL) { printf(" RAID Levels:"); comma = 0; @@ -151,9 +152,10 @@ printf(" none"); printf("\n"); free(ioc2); - } + } else if (IOCStatus != MPI_IOCSTATUS_CONFIG_INVALID_PAGE) + warnx("mpt_read_ioc_page(2): %s", mpt_ioc_status(IOCStatus)); - ioc6 = mpt_read_ioc_page(fd, 6, NULL); + ioc6 = mpt_read_ioc_page(fd, 6, &IOCStatus); if (ioc6 != NULL) { display_stripe_map(" RAID0 Stripes", ioc6->SupportedStripeSizeMapIS); @@ -172,7 +174,8 @@ printf("-%u", ioc6->MaxDrivesIME); printf("\n"); free(ioc6); - } + } else if (IOCStatus != MPI_IOCSTATUS_CONFIG_INVALID_PAGE) + warnx("mpt_read_ioc_page(2): %s", mpt_ioc_status(IOCStatus)); /* TODO: Add an ioctl to fetch IOC_FACTS and print firmware version. */ > However, the following is a bit disturbing : > > freebsd-r610# mptutil -u 0 show drives > mpt0 Physical Drives: > da0 ( 932G) ONLINE SAS bus 0 id 0 > da1 ( 932G) ONLINE SAS bus 0 id 1 > da2 ( 932G) ONLINE SAS bus 0 id 2 > da3 ( 932G) ONLINE SAS bus 0 id 3 > da4 ( 932G) ONLINE SAS bus 0 id 4 > da5 ( 932G) ONLINE SAS bus 0 id 5 > da6 ( 932G) ONLINE SAS bus 0 id 6 > da7 ( 932G) ONLINE SAS bus 0 id 7 > da8 ( 932G) ONLINE SAS bus 0 id 8 > da9 ( 932G) ONLINE SAS bus 0 id 9 > da10 ( 932G) ONLINE SAS bus 0 id 10 > da11 ( 932G) ONLINE SAS bus 0 id 11 > da12 ( 932G) ONLINE SAS bus 0 id 12 > da13 ( 932G) ONLINE SAS bus 0 id 13 > da14 ( 932G) ONLINE SAS bus 0 id 14 > da15 ( 136G) ONLINE SAS bus 0 id 0 > > The above listing seems weird, as da15 should belong to mpt1. Agreed. I specifically ask that CAM only return results for devices on bus 0 of mptX. Before when I debugged this I used gdb and set a breakpoint in mpt_fetch_disks() so I could examine the structures that CAM returned. This is the code that identifies mptX vs mpt: /* Match mptX bus 0. */ ccb.cdm.patterns[0].type = DEV_MATCH_BUS; b = &ccb.cdm.patterns[0].pattern.bus_pattern; snprintf(b->dev_name, sizeof(b->dev_name), "mpt"); b->unit_number = mpt_unit; b->bus_id = 0; b->flags = BUS_MATCH_NAME | BUS_MATCH_UNIT | BUS_MATCH_BUS_ID; 'mpt_unit' is a global variable that is set to the value of the 'u' parameter. > freebsd-r610# mptutil -u 1 show drives > mptutil: mpt_fetch_disks got wrong CAM matches > mpt1 Physical Drives: > 0 ( 137G) ONLINE SAS bus 0 id 1 > 1 ( 137G) ONLINE SAS bus 0 id 9 Similarly I would use gdb to exmaine the reply from CAM here to see why it got 'wrong CAM matches'. The code expects the first match to match the bus and the next N matches should be 'daX' devices. -- John Baldwin