From owner-freebsd-fs@FreeBSD.ORG Thu Jul 10 18:43:04 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 39B75D77 for ; Thu, 10 Jul 2014 18:43:04 +0000 (UTC) Received: from mail.physics.umn.edu (smtp.spa.umn.edu [128.101.220.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 10EA027E9 for ; Thu, 10 Jul 2014 18:43:03 +0000 (UTC) Received: from peevish.spa.umn.edu ([128.101.220.230]) by mail.physics.umn.edu with esmtp (Exim 4.77 (FreeBSD)) (envelope-from ) id 1X5JJ2-00073Y-7e for freebsd-fs@freebsd.org; Thu, 10 Jul 2014 13:42:56 -0500 Received: by peevish.spa.umn.edu (Postfix, from userid 5000) id 2C0CB472; Thu, 10 Jul 2014 13:42:56 -0500 (CDT) Date: Thu, 10 Jul 2014 13:42:56 -0500 From: Graham Allan To: freebsd-fs@freebsd.org Subject: Re: replaced da devices not being detected Message-ID: <20140710184256.GM18548@physics.umn.edu> References: <53B5B712.5050404@physics.umn.edu> <53B5EA11.4060509@physics.umn.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53B5EA11.4060509@physics.umn.edu> User-Agent: Mutt/1.5.20 (2009-12-10) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jul 2014 18:43:04 -0000 On Thu, Jul 03, 2014 at 06:41:05PM -0500, Graham Allan wrote: > On 7/3/2014 3:03 PM, Graham Allan wrote: > > > >It does seem to me like we get to replace some number of drives without > >incident, then after some point no new da devices are detected. > > I should have given some more info about the HBA etc in use - it's > an LSI 9205-8e (SAS2308, using mps driver), and dmesg is telling me > the HBA has (IT) firmware 14.00.00.00. Don't know if this is good or > bad but it appears to match the mps driver version, if that means > anything. > > I can see LSI is up to firmware 19.00.00.00 for the card, and I know > I've seen discussion here of the favored version, but can't find it > now. > However SAS2IRCU can see the added drive even when camcontrol fails > to, so I'm not sure that it's related to the HBA as such - unless > SAS2IRCI gets that information by a different path such as querying > the enclosure controller. Funnily enough the "missing" drive showed up round about the time I was messing with sas2ircu - though I didn't notice at first. The first time I ran "sas2ircu 0 display", it took a *really* long time to respond - subsequent runs were instant. I see now in kern.log that something issued a reinit to the HBA: Jul 3 17:57:39 hostname kernel: mps0: Calling Reinit from mps_wait_command Jul 3 17:57:39 hostname kernel: mps0: mps_reinit sc 0xffffff8002a77000 Jul 3 17:57:39 hostname kernel: mps0: mps_reinit mask interrupts Jul 3 17:57:40 hostname kernel: mps0: mpssas_handle_reinit startup Jul 3 17:57:40 hostname kernel: mps0: mpssas_announce_reset code 1 target -1 lun -1 Jul 3 17:57:40 hostname kernel: mps0: mpssas_complete_all_commands Jul 3 17:57:40 hostname kernel: (noperiph:mps0:0:4294967295:0): SMID 370 waking up cm 0xffffff8002aa7a10 state 1 ccb 0 for diag reset Jul 3 17:57:40 hostname kernel: mps0: mpssas_handle_reinit startup 0 tm 0 after command completion Jul 3 17:57:40 hostname kernel: mps0: mps_reinit doorbell 0x24000000 Jul 3 17:57:40 hostname kernel: mps0: mps_reinit unmask interrupts post 0 free 1055 Jul 3 17:57:40 hostname kernel: mps0: mps_reinit restarting post 0 free 1055 Jul 3 17:57:40 hostname kernel: mps0: mps_reinit finished sc 0xffffff8002a77000 post 0 free 1055 Jul 3 17:57:40 hostname kernel: mps0: Reinit success Jul 3 17:57:40 hostname kernel: mps0: mps_user_pass_thru: invalid request: error 60 the drive showed up right after this. Jul 3 18:00:15 hostname kernel: da91 at mps0 bus 0 scbus0 target 218 lun 0 Jul 3 18:00:15 hostname kernel: da91: Fixed Direct Access SCSI-6 device Jul 3 18:00:15 hostname kernel: da91: 600.000MB/s transfers Jul 3 18:00:15 hostname kernel: da91: Command Queueing enabled Jul 3 18:00:15 hostname kernel: da91: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C) I suspect sas2ircu was probably responsible for this. The system was generally unresponsive during that first sas2ircu run, but was normal before and after. Does this make any sense? Is there a recommended firmware version (other than our current 14.00.00.00) for the 9205-8e which might help with this? Thanks for any ideas, Graham -- ------------------------------------------------------------------------- Graham Allan School of Physics and Astronomy - University of Minnesota -------------------------------------------------------------------------