Date: Sat, 12 Dec 2015 01:48:45 -0500 From: mykel@mWare.ca To: Alan Somers <asomers@freebsd.org> Cc: freebsd-scsi@freebsd.org Subject: Re: Informal(?) sesX messages Message-ID: <566BC34D.2020404@mware.ca> In-Reply-To: <CAOtMX2jQUQqDuW21grACVvYzdNcREdtMB55=2YR8TZ9V22FGqg@mail.gmail.com> References: <566B4F68.2040807@mWare.ca> <CAOtMX2ibBUkS58EXfTc=Aznf_oc%2B_y4fC1xNAo=1F-yNSmTwSA@mail.gmail.com> <566B8E2A.8070404@mWare.ca> <CAOtMX2jQUQqDuW21grACVvYzdNcREdtMB55=2YR8TZ9V22FGqg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2015/12/11 22:55, Alan Somers wrote: > On Fri, Dec 11, 2015 at 8:02 PM, <Mykel@mware.ca> wrote: >> On 15-12-11 17:44, Alan Somers wrote: >>> On Fri, Dec 11, 2015 at 3:34 PM, <Mykel@mware.ca> wrote: >>>> Hi all, please CC me on reply as I'm not subscribed to this list. >>>> >>>> I've got one of those Supermicro 72-drive monster machines, all ZFS'd up. >>>> https://www.supermicro.com/products/system/4u/6048/SSG-6048R-E1CR72L.cfm >>>> >>>> And before & after replacing a faulty SAS Expander and a pair of cables >>>> (gobs of WRITE/ABORT errors), I'm still occasionally seeing these kernel >>>> messages (in groups), and I'm not sure if they're benign, or pointing to >>>> a >>>> SAS expander event... or what. I admit, this is my first time dealing >>>> with a >>>> machine with SAS expanders, so I'm a bit out of my depth in diagnosis >>>> thereof. >>>> >>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: da7,pass7: Element descriptor: >>>> 'Slot00' >>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: da7,pass7: SAS Device Slot Element: >>>> 1 >>>> Phys at Slot 0 >>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: phy 0: SAS device type 1 id 0 >>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: phy 0: protocols: Initiator( None ) >>>> Target( SSP ) >>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: phy 0: parent 500304801ea2df3f addr >>>> 5000c500844bd449 >>>> >>> These look like device arrival notifications. If you scroll up, do >>> you see any departure notifications? They should look like this: >>> >>> mps0: mpssas_prepare_remove: Sending reset for target ID 10 >>> da0 at mps0 bus 0 scbus0 target 10 lun 0 >>> da0: <ATA Hitachi HUA72201 A39C> s/n JPW930HQ15H26H detached >>> mps0: Unfreezing devq for target ID 10 >>> xpt_release_devq(): requested 1 > present 0 >>> (da0:mps0:0:10:0): Periph destroyed >>> >>> Also, could you post your HBA and expander firmware versions? >>> >>> -Alan >> >> I can say, without doubt, that I do NOT have any preceding detachments... >> which is why I'm so baffled by the messages. If the devices aren't >> de/reattaching, what's the point of these informal/benign ones? I am >> familiar with them from other hot-swap and disk failure scenarios in other >> machines. >> >> Could this be a driver bug not logging the disconnection? But when I >> hot-unplugged them, I do see that in dmesg. >> Or does SAS do something where it might renegotiate or reconfigure the >> lanes, and I'm just seeing it do that? >> >> Thanks, >> >> Myke >> >> >> dev.mpr.0.driver_version: 09.255.01.00-fbsd >> dev.mpr.0.firmware_version: 06.00.00.00 >> dev.mpr.1.driver_version: 09.255.01.00-fbsd >> dev.mpr.1.firmware_version: 08.00.00.00 >> dev.mpr.2.driver_version: 09.255.01.00-fbsd >> dev.mpr.2.firmware_version: 08.00.00.00 >> >> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses0 >> 00 0d 00 05 02 33 00 40 02 4c 53 49 20 20 20 20 20 ....3.@.LSI >> 10 53 41 53 33 78 34 38 20 20 20 20 20 20 20 20 20 SAS3x48 >> 20 30 37 30 31 78 34 38 2d 36 36 2e 37 2e 31 2e 31 0701x48-66.7.1.1 >> 30 37 00 20 20 20 20 20 20 7. >> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses1 >> 00 0d 00 05 02 33 00 40 02 4c 53 49 20 20 20 20 20 ....3.@.LSI >> 10 53 41 53 33 78 33 36 20 20 20 20 20 20 20 20 20 SAS3x36 >> 20 30 37 30 31 78 33 36 2d 36 36 2e 37 2e 31 2e 31 0701x36-66.7.1.1 >> 30 37 00 20 20 20 20 20 20 7. >> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses2 >> SCSI INQUIRY failed on ses2, res=-1 >> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses3 >> SCSI INQUIRY failed on ses3, res=-1 >> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses4 >> 00 0d 00 05 02 33 00 40 02 4c 53 49 20 20 20 20 20 ....3.@.LSI >> 10 53 41 53 33 78 32 38 20 20 20 20 20 20 20 20 20 SAS3x28 >> 20 30 37 30 31 78 32 38 2d 36 36 2e 37 2e 31 2e 31 0701x28-66.7.1.1 >> 30 37 00 20 20 20 20 20 20 7. >> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses5 >> 00 0d 00 05 02 33 00 40 02 4c 53 49 20 20 20 20 20 ....3.@.LSI >> 10 53 41 53 33 78 34 38 20 20 20 20 20 20 20 20 20 SAS3x48 >> 20 30 37 30 31 78 34 38 2d 36 36 2e 37 2e 31 2e 31 0701x48-66.7.1.1 >> 30 37 00 20 20 20 20 20 20 7. >> [root@ZFS-AF ~]# >> >> >> And here's dmesg after fresh reboot: > Well, that's weird. Your firmware versions look OK, though you might > want to upgrade mpr0 just to be consistent. The next thing I would > check, if I were you, would be devctl messages. Edit /etc/syslog.conf > and change devd's loglevel to INFO, then HUP syslogd. Now every > devctl message should get logged in /var/log/devd.log. That will tell > you more precisely than dmesg whether there are any arrival or > departure events. > > -Alan Huh, I never noticed the 6 vs. 8; curiously, mpr0 and mpr1 are the two connected to the front expander... and where I've never seen an issue. Tho perhaps I scrambled which cards are serving was which in my testing - I also moved mpr2 to sit on the other CPU's PCI bus. I've added the devd log, although I haven't been able to trigger the event yet anyway. Tried to assert hw.mpr.2.debug_level, however it seems like hw.mpr doesn't exist. Finally, I haven't the slightest clue how to update the firmware; the Avago site only has a product brochure for the 3008 anyway :(
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?566BC34D.2020404>