Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Dec 2015 20:55:39 -0700
From:      Alan Somers <asomers@freebsd.org>
To:        Mykel@mware.ca
Cc:        FreeBSD-scsi <freebsd-scsi@freebsd.org>
Subject:   Re: Informal(?) sesX messages
Message-ID:  <CAOtMX2jQUQqDuW21grACVvYzdNcREdtMB55=2YR8TZ9V22FGqg@mail.gmail.com>
In-Reply-To: <566B8E2A.8070404@mWare.ca>
References:  <566B4F68.2040807@mWare.ca> <CAOtMX2ibBUkS58EXfTc=Aznf_oc%2B_y4fC1xNAo=1F-yNSmTwSA@mail.gmail.com> <566B8E2A.8070404@mWare.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Dec 11, 2015 at 8:02 PM,  <Mykel@mware.ca> wrote:
> On 15-12-11 17:44, Alan Somers wrote:
>>
>> On Fri, Dec 11, 2015 at 3:34 PM,  <Mykel@mware.ca> wrote:
>>>
>>> Hi all, please CC me on reply as I'm not subscribed to this list.
>>>
>>> I've got one of those Supermicro 72-drive monster machines, all ZFS'd up.
>>> https://www.supermicro.com/products/system/4u/6048/SSG-6048R-E1CR72L.cfm
>>>
>>> And before & after replacing a faulty SAS Expander and a pair of cables
>>> (gobs of WRITE/ABORT errors), I'm still occasionally seeing these kernel
>>> messages (in groups), and I'm not sure if they're benign, or pointing to
>>> a
>>> SAS expander event... or what. I admit, this is my first time dealing
>>> with a
>>> machine with SAS expanders, so I'm a bit out of my depth in diagnosis
>>> thereof.
>>>
>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: da7,pass7: Element descriptor:
>>> 'Slot00'
>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: da7,pass7: SAS Device Slot Element:
>>> 1
>>> Phys at Slot 0
>>> Dec 11 16:06:54 ZFS-AF kernel: ses5:  phy 0: SAS device type 1 id 0
>>> Dec 11 16:06:54 ZFS-AF kernel: ses5:  phy 0: protocols: Initiator( None )
>>> Target( SSP )
>>> Dec 11 16:06:54 ZFS-AF kernel: ses5:  phy 0: parent 500304801ea2df3f addr
>>> 5000c500844bd449
>>>
>> These look like device arrival notifications.  If you scroll up, do
>> you see any departure notifications?  They should look like this:
>>
>> mps0: mpssas_prepare_remove: Sending reset for target ID 10
>> da0 at mps0 bus 0 scbus0 target 10 lun 0
>> da0: <ATA Hitachi HUA72201 A39C> s/n       JPW930HQ15H26H detached
>> mps0: Unfreezing devq for target ID 10
>> xpt_release_devq(): requested 1 > present 0
>> (da0:mps0:0:10:0): Periph destroyed
>>
>> Also, could you post your HBA and expander firmware versions?  For the
>> HBA, use "sysctl dev.mps.0.firmware_version".  For the expander,
>> install sg3_utils and do "sg_inq --hex --len=64 ses0".  The firmware
>> version is the dotted quad at the end.
>>
>> # sg_inq --hex --len=64 ses0
>>   00     0d 00 05 02 34 00 40 02  41 49 43 20 43 4f 52 50    ....4.@.AIC
>> CORP
>>   10     53 41 53 20 36 47 20 45  78 70 61 6e 64 65 72 20    SAS 6G
>> Expander
>>   20     30 62 30 31 78 33 36 2d  31 2e 31 31 2e 31 2e 31
>> 0b01x36-1.11.1.1
>>   30     00 20 20 20 20 20 20 20
>>
>> -Alan
>
>
> I can say, without doubt, that I do NOT have any preceding detachments...
> which is why I'm so baffled by the messages. If the devices aren't
> de/reattaching, what's the point of these informal/benign ones? I am
> familiar with them from other hot-swap and disk failure scenarios in other
> machines.
>
> Could this be a driver bug not logging the disconnection? But when I
> hot-unplugged them, I do see that in dmesg.
> Or does SAS do something where it might renegotiate or reconfigure the
> lanes, and I'm just seeing it do that?
>
> Thanks,
>
> Myke
>
>
> dev.mpr.0.driver_version: 09.255.01.00-fbsd
> dev.mpr.0.firmware_version: 06.00.00.00
> dev.mpr.1.driver_version: 09.255.01.00-fbsd
> dev.mpr.1.firmware_version: 08.00.00.00
> dev.mpr.2.driver_version: 09.255.01.00-fbsd
> dev.mpr.2.firmware_version: 08.00.00.00
>
> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses0
>  00     0d 00 05 02 33 00 40 02  4c 53 49 20 20 20 20 20 ....3.@.LSI
>  10     53 41 53 33 78 34 38 20  20 20 20 20 20 20 20 20 SAS3x48
>  20     30 37 30 31 78 34 38 2d  36 36 2e 37 2e 31 2e 31 0701x48-66.7.1.1
>  30     37 00 20 20 20 20 20 20 7.
> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses1
>  00     0d 00 05 02 33 00 40 02  4c 53 49 20 20 20 20 20 ....3.@.LSI
>  10     53 41 53 33 78 33 36 20  20 20 20 20 20 20 20 20 SAS3x36
>  20     30 37 30 31 78 33 36 2d  36 36 2e 37 2e 31 2e 31 0701x36-66.7.1.1
>  30     37 00 20 20 20 20 20 20 7.
> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses2
> SCSI INQUIRY failed on ses2, res=-1
> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses3
> SCSI INQUIRY failed on ses3, res=-1
> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses4
>  00     0d 00 05 02 33 00 40 02  4c 53 49 20 20 20 20 20 ....3.@.LSI
>  10     53 41 53 33 78 32 38 20  20 20 20 20 20 20 20 20 SAS3x28
>  20     30 37 30 31 78 32 38 2d  36 36 2e 37 2e 31 2e 31 0701x28-66.7.1.1
>  30     37 00 20 20 20 20 20 20 7.
> [root@ZFS-AF ~]# sg_inq --hex --len=64 ses5
>  00     0d 00 05 02 33 00 40 02  4c 53 49 20 20 20 20 20 ....3.@.LSI
>  10     53 41 53 33 78 34 38 20  20 20 20 20 20 20 20 20 SAS3x48
>  20     30 37 30 31 78 34 38 2d  36 36 2e 37 2e 31 2e 31 0701x48-66.7.1.1
>  30     37 00 20 20 20 20 20 20 7.
> [root@ZFS-AF ~]#
>
>
> And here's dmesg after fresh reboot:

Well, that's weird.  Your firmware versions look OK, though you might
want to upgrade mpr0 just to be consistent.  The next thing I would
check, if I were you, would be devctl messages.  Edit /etc/syslog.conf
and change devd's loglevel to INFO, then HUP syslogd.  Now every
devctl message should get logged in /var/log/devd.log.  That will tell
you more precisely than dmesg whether there are any arrival or
departure events.

-Alan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2jQUQqDuW21grACVvYzdNcREdtMB55=2YR8TZ9V22FGqg>