From owner-freebsd-scsi@freebsd.org Sat Dec 12 06:48:52 2015 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 45369A1462F for ; Sat, 12 Dec 2015 06:48:52 +0000 (UTC) (envelope-from mykel@mWare.ca) Received: from Vice.ServerNorth.net (vice.ServerNorth.net [209.44.123.194]) by mx1.freebsd.org (Postfix) with ESMTP id 1FEA41C45; Sat, 12 Dec 2015 06:48:50 +0000 (UTC) (envelope-from mykel@mWare.ca) Received: from mail.servernorth.net (localhost [127.0.0.1]) by Vice.ServerNorth.net (Postfix) with ESMTP id F305D564C3; Sat, 12 Dec 2015 01:48:47 -0500 (EST) Received: from mykel@mWare.ca by mail.servernorth.net (Archiveopteryx 3.1.4) with esmtpsa id 1449902926-24972-24971/9/8; Sat, 12 Dec 2015 01:48:46 -0500 Subject: Re: Informal(?) sesX messages To: Alan Somers References: <566B4F68.2040807@mWare.ca> <566B8E2A.8070404@mWare.ca> Cc: freebsd-scsi@freebsd.org From: mykel@mWare.ca Message-Id: <566BC34D.2020404@mware.ca> Date: Sat, 12 Dec 2015 01:48:45 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 Mime-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Dec 2015 06:48:52 -0000 On 2015/12/11 22:55, Alan Somers wrote: > On Fri, Dec 11, 2015 at 8:02 PM, wrote: >> On 15-12-11 17:44, Alan Somers wrote: >>> On Fri, Dec 11, 2015 at 3:34 PM, wrote: >>>> Hi all, please CC me on reply as I'm not subscribed to this list. >>>> >>>> I've got one of those Supermicro 72-drive monster machines, all ZFS'd up. >>>> https://www.supermicro.com/products/system/4u/6048/SSG-6048R-E1CR72L.cfm >>>> >>>> And before & after replacing a faulty SAS Expander and a pair of cables >>>> (gobs of WRITE/ABORT errors), I'm still occasionally seeing these kernel >>>> messages (in groups), and I'm not sure if they're benign, or pointing to >>>> a >>>> SAS expander event... or what. I admit, this is my first time dealing >>>> with a >>>> machine with SAS expanders, so I'm a bit out of my depth in diagnosis >>>> thereof. >>>> >>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: da7,pass7: Element descriptor: >>>> 'Slot00' >>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: da7,pass7: SAS Device Slot Element: >>>> 1 >>>> Phys at Slot 0 >>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: phy 0: SAS device type 1 id 0 >>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: phy 0: protocols: Initiator( None ) >>>> Target( SSP ) >>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: phy 0: parent 500304801ea2df3f addr >>>> 5000c500844bd449 >>>> >>> These look like device arrival notifications. If you scroll up, do >>> you see any departure notifications? They should look like this: >>> >>> mps0: mpssas_prepare_remove: Sending reset for target ID 10 >>> da0 at mps0 bus 0 scbus0 target 10 lun 0 >>> da0: s/n JPW930HQ15H26H detached >>> mps0: Unfreezing devq for target ID 10 >>> xpt_release_devq(): requested 1 > present 0 >>> (da0:mps0:0:10:0): Periph destroyed >>> >>> Also, could you post your HBA and expander firmware versions? >>> >>> -Alan >> >> I can say, without doubt, that I do NOT have any preceding detachments... >> which is why I'm so baffled by the messages. If the devices aren't >> de/reattaching, what's the point of these informal/benign ones? I am >> familiar with them from other hot-swap and disk failure scenarios in other >> machines. >> >> Could this be a driver bug not logging the disconnection? But when I >> hot-unplugged them, I do see that in dmesg. >> Or does SAS do something where it might renegotiate or reconfigure the >> lanes, and I'm just seeing it do that? >> >> Thanks, >> >> Myke >> >> >> dev.mpr.0.driver_version: 09.255.01.00-fbsd >> dev.mpr.0.firmware_version: 06.00.00.00 >> dev.mpr.1.driver_version: 09.255.01.00-fbsd >> dev.mpr.1.firmware_version: 08.00.00.00 >> dev.mpr.2.driver_version: 09.255.01.00-fbsd >> dev.mpr.2.firmware_version: 08.00.00.00 >> >> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses0 >> 00 0d 00 05 02 33 00 40 02 4c 53 49 20 20 20 20 20 ....3.@.LSI >> 10 53 41 53 33 78 34 38 20 20 20 20 20 20 20 20 20 SAS3x48 >> 20 30 37 30 31 78 34 38 2d 36 36 2e 37 2e 31 2e 31 0701x48-66.7.1.1 >> 30 37 00 20 20 20 20 20 20 7. >> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses1 >> 00 0d 00 05 02 33 00 40 02 4c 53 49 20 20 20 20 20 ....3.@.LSI >> 10 53 41 53 33 78 33 36 20 20 20 20 20 20 20 20 20 SAS3x36 >> 20 30 37 30 31 78 33 36 2d 36 36 2e 37 2e 31 2e 31 0701x36-66.7.1.1 >> 30 37 00 20 20 20 20 20 20 7. >> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses2 >> SCSI INQUIRY failed on ses2, res=-1 >> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses3 >> SCSI INQUIRY failed on ses3, res=-1 >> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses4 >> 00 0d 00 05 02 33 00 40 02 4c 53 49 20 20 20 20 20 ....3.@.LSI >> 10 53 41 53 33 78 32 38 20 20 20 20 20 20 20 20 20 SAS3x28 >> 20 30 37 30 31 78 32 38 2d 36 36 2e 37 2e 31 2e 31 0701x28-66.7.1.1 >> 30 37 00 20 20 20 20 20 20 7. >> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses5 >> 00 0d 00 05 02 33 00 40 02 4c 53 49 20 20 20 20 20 ....3.@.LSI >> 10 53 41 53 33 78 34 38 20 20 20 20 20 20 20 20 20 SAS3x48 >> 20 30 37 30 31 78 34 38 2d 36 36 2e 37 2e 31 2e 31 0701x48-66.7.1.1 >> 30 37 00 20 20 20 20 20 20 7. >> [root@ZFS-AF ~]# >> >> >> And here's dmesg after fresh reboot: > Well, that's weird. Your firmware versions look OK, though you might > want to upgrade mpr0 just to be consistent. The next thing I would > check, if I were you, would be devctl messages. Edit /etc/syslog.conf > and change devd's loglevel to INFO, then HUP syslogd. Now every > devctl message should get logged in /var/log/devd.log. That will tell > you more precisely than dmesg whether there are any arrival or > departure events. > > -Alan Huh, I never noticed the 6 vs. 8; curiously, mpr0 and mpr1 are the two connected to the front expander... and where I've never seen an issue. Tho perhaps I scrambled which cards are serving was which in my testing - I also moved mpr2 to sit on the other CPU's PCI bus. I've added the devd log, although I haven't been able to trigger the event yet anyway. Tried to assert hw.mpr.2.debug_level, however it seems like hw.mpr doesn't exist. Finally, I haven't the slightest clue how to update the firmware; the Avago site only has a product brochure for the 3008 anyway :(