Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Jun 2012 15:53:26 -0600
From:      "Kenneth D. Merry" <ken@freebsd.org>
To:        Dustin Wenz <dustinwenz@ebureau.com>
Cc:        freebsd-scsi@freebsd.org
Subject:   Re: Marginal disks prevent boot with mps(4)
Message-ID:  <20120608215326.GA83721@nargothrond.kdm.org>
In-Reply-To: <60F17E0E-EE4A-4F37-9925-055315B987B1@ebureau.com>
References:  <60F17E0E-EE4A-4F37-9925-055315B987B1@ebureau.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jun 08, 2012 at 16:25:31 -0500, Dustin Wenz wrote:
> I just installed a build of 9.0-STABLE in order to test the changes since release. I was hoping that some of the error-handling in mps would alter the behavior I've seen with some SATA disks (particularly, Seagate ST3000DM001 disks) connected through an LSI SAS 9201-16e HBA.
> 

Are you using an expander, or are the disks connected directly to the HBA?

What firmware version are you using on the HBA?  Make sure you have the
latest firmware version on the card.

> It is apparently possible for these disks to get in a state where their presence prevents the machine from booting. This problem has existed for some time, according to some archive-searching I've done, but there isn't much consensus on how to fix it.
> 
> The disks are good enough that they can be probed at startup, but some part of initialization cannot complete. This is the message I see repeated forever upon boot (the probe number does change slightly):
> 
> 	(probe14:mps0:0:14:0): INQUIRY. CDB: 12 0 0 0 24 0 length 36 SMID 215 terminated ioc 804b scsi 0 state c xfer 0
> 
> There is a comment in mps_sas.c which suggests that this error is usually transient, but that seems not to be the case here. Can anyone suggest a modification that might permit booting in this state?
> 

There is not a lot that the driver can do in this case.  The command is
getting terminated by the firmware in the HBA, and we really don't have a
lot of information to indicate why.

You could change the status returned for MPI2_IOCSTATUS_SCSI_IOC_TERMINATED
to CAM_REQ_CMP_ERR, and that would just mean that the probe for that disk
would eventually fail and the kernel would boot.  CAM_REQUEUE_REQ tells
CAM to retry the command without decrementing the retry count.  That is
why you aren't able to boot.

If upgrading the HBA firmware doesn't fix the problem, I would suggest
contacting LSI support, and see if they can get additional diagnostics off
the board to figure out what the problem is.

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120608215326.GA83721>