From owner-freebsd-scsi@FreeBSD.ORG Fri Jun 8 21:53:33 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E25A81065670 for ; Fri, 8 Jun 2012 21:53:33 +0000 (UTC) (envelope-from ken@kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) by mx1.freebsd.org (Postfix) with ESMTP id 912C68FC08 for ; Fri, 8 Jun 2012 21:53:33 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id q58LrQj9084946; Fri, 8 Jun 2012 15:53:26 -0600 (MDT) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id q58LrQV1084945; Fri, 8 Jun 2012 15:53:26 -0600 (MDT) (envelope-from ken) Date: Fri, 8 Jun 2012 15:53:26 -0600 From: "Kenneth D. Merry" To: Dustin Wenz Message-ID: <20120608215326.GA83721@nargothrond.kdm.org> References: <60F17E0E-EE4A-4F37-9925-055315B987B1@ebureau.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <60F17E0E-EE4A-4F37-9925-055315B987B1@ebureau.com> User-Agent: Mutt/1.4.2i Cc: freebsd-scsi@freebsd.org Subject: Re: Marginal disks prevent boot with mps(4) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jun 2012 21:53:34 -0000 On Fri, Jun 08, 2012 at 16:25:31 -0500, Dustin Wenz wrote: > I just installed a build of 9.0-STABLE in order to test the changes since release. I was hoping that some of the error-handling in mps would alter the behavior I've seen with some SATA disks (particularly, Seagate ST3000DM001 disks) connected through an LSI SAS 9201-16e HBA. > Are you using an expander, or are the disks connected directly to the HBA? What firmware version are you using on the HBA? Make sure you have the latest firmware version on the card. > It is apparently possible for these disks to get in a state where their presence prevents the machine from booting. This problem has existed for some time, according to some archive-searching I've done, but there isn't much consensus on how to fix it. > > The disks are good enough that they can be probed at startup, but some part of initialization cannot complete. This is the message I see repeated forever upon boot (the probe number does change slightly): > > (probe14:mps0:0:14:0): INQUIRY. CDB: 12 0 0 0 24 0 length 36 SMID 215 terminated ioc 804b scsi 0 state c xfer 0 > > There is a comment in mps_sas.c which suggests that this error is usually transient, but that seems not to be the case here. Can anyone suggest a modification that might permit booting in this state? > There is not a lot that the driver can do in this case. The command is getting terminated by the firmware in the HBA, and we really don't have a lot of information to indicate why. You could change the status returned for MPI2_IOCSTATUS_SCSI_IOC_TERMINATED to CAM_REQ_CMP_ERR, and that would just mean that the probe for that disk would eventually fail and the kernel would boot. CAM_REQUEUE_REQ tells CAM to retry the command without decrementing the retry count. That is why you aren't able to boot. If upgrading the HBA firmware doesn't fix the problem, I would suggest contacting LSI support, and see if they can get additional diagnostics off the board to figure out what the problem is. Ken -- Kenneth Merry ken@FreeBSD.ORG