Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Jul 2014 13:42:56 -0500
From:      Graham Allan <allan@physics.umn.edu>
To:        freebsd-fs@freebsd.org
Subject:   Re: replaced da devices not being detected
Message-ID:  <20140710184256.GM18548@physics.umn.edu>
In-Reply-To: <53B5EA11.4060509@physics.umn.edu>
References:  <53B5B712.5050404@physics.umn.edu> <53B5EA11.4060509@physics.umn.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 03, 2014 at 06:41:05PM -0500, Graham Allan wrote:
> On 7/3/2014 3:03 PM, Graham Allan wrote:
> >
> >It does seem to me like we get to replace some number of drives without
> >incident, then after some point no new da devices are detected.
> 
> I should have given some more info about the HBA etc in use - it's
> an LSI 9205-8e (SAS2308, using mps driver), and dmesg is telling me
> the HBA has (IT) firmware 14.00.00.00. Don't know if this is good or
> bad but it appears to match the mps driver version, if that means
> anything.
> 
> I can see LSI is up to firmware 19.00.00.00 for the card, and I know
> I've seen discussion here of the favored version, but can't find it
> now.
 
> However SAS2IRCU can see the added drive even when camcontrol fails
> to, so I'm not sure that it's related to the HBA as such - unless
> SAS2IRCI gets that information by a different path such as querying
> the enclosure controller.

Funnily enough the "missing" drive showed up round about the time I was
messing with sas2ircu - though I didn't notice at first.

The first time I ran "sas2ircu 0 display", it took a *really* long time
to respond - subsequent runs were instant. I see now in kern.log that
something issued a reinit to the HBA:

Jul  3 17:57:39 hostname kernel: mps0: Calling Reinit from
mps_wait_command
Jul  3 17:57:39 hostname kernel: mps0: mps_reinit sc 0xffffff8002a77000
Jul  3 17:57:39 hostname kernel: mps0: mps_reinit mask interrupts
Jul  3 17:57:40 hostname kernel: mps0: mpssas_handle_reinit startup
Jul  3 17:57:40 hostname kernel: mps0: mpssas_announce_reset code 1
target -1 lun -1
Jul  3 17:57:40 hostname kernel: mps0: mpssas_complete_all_commands
Jul  3 17:57:40 hostname kernel: (noperiph:mps0:0:4294967295:0): SMID 370
waking up cm 0xffffff8002aa7a10 state 1 ccb 0 for diag reset
Jul  3 17:57:40 hostname kernel: mps0: mpssas_handle_reinit startup 0 tm
0 after command completion
Jul  3 17:57:40 hostname kernel: mps0: mps_reinit doorbell 0x24000000
Jul  3 17:57:40 hostname kernel: mps0: mps_reinit unmask interrupts post
0 free 1055
Jul  3 17:57:40 hostname kernel: mps0: mps_reinit restarting post 0 free
1055
Jul  3 17:57:40 hostname kernel: mps0: mps_reinit finished sc
0xffffff8002a77000 post 0 free 1055
Jul  3 17:57:40 hostname kernel: mps0: Reinit success
Jul  3 17:57:40 hostname kernel: mps0: mps_user_pass_thru: invalid
request: error 60

the drive showed up right after this.

Jul  3 18:00:15 hostname kernel: da91 at mps0 bus 0 scbus0 target 218 lun
0
Jul  3 18:00:15 hostname kernel: da91: <ATA ST3000DM001-1CH1 CC26> Fixed
Direct Access SCSI-6 device
Jul  3 18:00:15 hostname kernel: da91: 600.000MB/s transfers
Jul  3 18:00:15 hostname kernel: da91: Command Queueing enabled
Jul  3 18:00:15 hostname kernel: da91: 2861588MB (5860533168 512 byte
sectors: 255H 63S/T 364801C)

I suspect sas2ircu was probably responsible for this. The system was
generally unresponsive during that first sas2ircu run, but was normal
before and after.

Does this make any sense?

Is there a recommended firmware version (other than our current
14.00.00.00) for the 9205-8e which might help with this?

Thanks for any ideas,

Graham
-- 
-------------------------------------------------------------------------
Graham Allan
School of Physics and Astronomy - University of Minnesota
-------------------------------------------------------------------------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140710184256.GM18548>