Date: Thu, 10 Jul 2014 13:42:56 -0500 From: Graham Allan <allan@physics.umn.edu> To: freebsd-fs@freebsd.org Subject: Re: replaced da devices not being detected Message-ID: <20140710184256.GM18548@physics.umn.edu> In-Reply-To: <53B5EA11.4060509@physics.umn.edu> References: <53B5B712.5050404@physics.umn.edu> <53B5EA11.4060509@physics.umn.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 03, 2014 at 06:41:05PM -0500, Graham Allan wrote: > On 7/3/2014 3:03 PM, Graham Allan wrote: > > > >It does seem to me like we get to replace some number of drives without > >incident, then after some point no new da devices are detected. > > I should have given some more info about the HBA etc in use - it's > an LSI 9205-8e (SAS2308, using mps driver), and dmesg is telling me > the HBA has (IT) firmware 14.00.00.00. Don't know if this is good or > bad but it appears to match the mps driver version, if that means > anything. > > I can see LSI is up to firmware 19.00.00.00 for the card, and I know > I've seen discussion here of the favored version, but can't find it > now. > However SAS2IRCU can see the added drive even when camcontrol fails > to, so I'm not sure that it's related to the HBA as such - unless > SAS2IRCI gets that information by a different path such as querying > the enclosure controller. Funnily enough the "missing" drive showed up round about the time I was messing with sas2ircu - though I didn't notice at first. The first time I ran "sas2ircu 0 display", it took a *really* long time to respond - subsequent runs were instant. I see now in kern.log that something issued a reinit to the HBA: Jul 3 17:57:39 hostname kernel: mps0: Calling Reinit from mps_wait_command Jul 3 17:57:39 hostname kernel: mps0: mps_reinit sc 0xffffff8002a77000 Jul 3 17:57:39 hostname kernel: mps0: mps_reinit mask interrupts Jul 3 17:57:40 hostname kernel: mps0: mpssas_handle_reinit startup Jul 3 17:57:40 hostname kernel: mps0: mpssas_announce_reset code 1 target -1 lun -1 Jul 3 17:57:40 hostname kernel: mps0: mpssas_complete_all_commands Jul 3 17:57:40 hostname kernel: (noperiph:mps0:0:4294967295:0): SMID 370 waking up cm 0xffffff8002aa7a10 state 1 ccb 0 for diag reset Jul 3 17:57:40 hostname kernel: mps0: mpssas_handle_reinit startup 0 tm 0 after command completion Jul 3 17:57:40 hostname kernel: mps0: mps_reinit doorbell 0x24000000 Jul 3 17:57:40 hostname kernel: mps0: mps_reinit unmask interrupts post 0 free 1055 Jul 3 17:57:40 hostname kernel: mps0: mps_reinit restarting post 0 free 1055 Jul 3 17:57:40 hostname kernel: mps0: mps_reinit finished sc 0xffffff8002a77000 post 0 free 1055 Jul 3 17:57:40 hostname kernel: mps0: Reinit success Jul 3 17:57:40 hostname kernel: mps0: mps_user_pass_thru: invalid request: error 60 the drive showed up right after this. Jul 3 18:00:15 hostname kernel: da91 at mps0 bus 0 scbus0 target 218 lun 0 Jul 3 18:00:15 hostname kernel: da91: <ATA ST3000DM001-1CH1 CC26> Fixed Direct Access SCSI-6 device Jul 3 18:00:15 hostname kernel: da91: 600.000MB/s transfers Jul 3 18:00:15 hostname kernel: da91: Command Queueing enabled Jul 3 18:00:15 hostname kernel: da91: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C) I suspect sas2ircu was probably responsible for this. The system was generally unresponsive during that first sas2ircu run, but was normal before and after. Does this make any sense? Is there a recommended firmware version (other than our current 14.00.00.00) for the 9205-8e which might help with this? Thanks for any ideas, Graham -- ------------------------------------------------------------------------- Graham Allan School of Physics and Astronomy - University of Minnesota -------------------------------------------------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140710184256.GM18548>