Date: Fri, 11 Dec 2015 20:55:39 -0700 From: Alan Somers <asomers@freebsd.org> To: Mykel@mware.ca Cc: FreeBSD-scsi <freebsd-scsi@freebsd.org> Subject: Re: Informal(?) sesX messages Message-ID: <CAOtMX2jQUQqDuW21grACVvYzdNcREdtMB55=2YR8TZ9V22FGqg@mail.gmail.com> In-Reply-To: <566B8E2A.8070404@mWare.ca> References: <566B4F68.2040807@mWare.ca> <CAOtMX2ibBUkS58EXfTc=Aznf_oc%2B_y4fC1xNAo=1F-yNSmTwSA@mail.gmail.com> <566B8E2A.8070404@mWare.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Dec 11, 2015 at 8:02 PM, <Mykel@mware.ca> wrote: > On 15-12-11 17:44, Alan Somers wrote: >> >> On Fri, Dec 11, 2015 at 3:34 PM, <Mykel@mware.ca> wrote: >>> >>> Hi all, please CC me on reply as I'm not subscribed to this list. >>> >>> I've got one of those Supermicro 72-drive monster machines, all ZFS'd up. >>> https://www.supermicro.com/products/system/4u/6048/SSG-6048R-E1CR72L.cfm >>> >>> And before & after replacing a faulty SAS Expander and a pair of cables >>> (gobs of WRITE/ABORT errors), I'm still occasionally seeing these kernel >>> messages (in groups), and I'm not sure if they're benign, or pointing to >>> a >>> SAS expander event... or what. I admit, this is my first time dealing >>> with a >>> machine with SAS expanders, so I'm a bit out of my depth in diagnosis >>> thereof. >>> >>> Dec 11 16:06:54 ZFS-AF kernel: ses5: da7,pass7: Element descriptor: >>> 'Slot00' >>> Dec 11 16:06:54 ZFS-AF kernel: ses5: da7,pass7: SAS Device Slot Element: >>> 1 >>> Phys at Slot 0 >>> Dec 11 16:06:54 ZFS-AF kernel: ses5: phy 0: SAS device type 1 id 0 >>> Dec 11 16:06:54 ZFS-AF kernel: ses5: phy 0: protocols: Initiator( None ) >>> Target( SSP ) >>> Dec 11 16:06:54 ZFS-AF kernel: ses5: phy 0: parent 500304801ea2df3f addr >>> 5000c500844bd449 >>> >> These look like device arrival notifications. If you scroll up, do >> you see any departure notifications? They should look like this: >> >> mps0: mpssas_prepare_remove: Sending reset for target ID 10 >> da0 at mps0 bus 0 scbus0 target 10 lun 0 >> da0: <ATA Hitachi HUA72201 A39C> s/n JPW930HQ15H26H detached >> mps0: Unfreezing devq for target ID 10 >> xpt_release_devq(): requested 1 > present 0 >> (da0:mps0:0:10:0): Periph destroyed >> >> Also, could you post your HBA and expander firmware versions? For the >> HBA, use "sysctl dev.mps.0.firmware_version". For the expander, >> install sg3_utils and do "sg_inq --hex --len=64 ses0". The firmware >> version is the dotted quad at the end. >> >> # sg_inq --hex --len=64 ses0 >> 00 0d 00 05 02 34 00 40 02 41 49 43 20 43 4f 52 50 ....4.@.AIC >> CORP >> 10 53 41 53 20 36 47 20 45 78 70 61 6e 64 65 72 20 SAS 6G >> Expander >> 20 30 62 30 31 78 33 36 2d 31 2e 31 31 2e 31 2e 31 >> 0b01x36-1.11.1.1 >> 30 00 20 20 20 20 20 20 20 >> >> -Alan > > > I can say, without doubt, that I do NOT have any preceding detachments... > which is why I'm so baffled by the messages. If the devices aren't > de/reattaching, what's the point of these informal/benign ones? I am > familiar with them from other hot-swap and disk failure scenarios in other > machines. > > Could this be a driver bug not logging the disconnection? But when I > hot-unplugged them, I do see that in dmesg. > Or does SAS do something where it might renegotiate or reconfigure the > lanes, and I'm just seeing it do that? > > Thanks, > > Myke > > > dev.mpr.0.driver_version: 09.255.01.00-fbsd > dev.mpr.0.firmware_version: 06.00.00.00 > dev.mpr.1.driver_version: 09.255.01.00-fbsd > dev.mpr.1.firmware_version: 08.00.00.00 > dev.mpr.2.driver_version: 09.255.01.00-fbsd > dev.mpr.2.firmware_version: 08.00.00.00 > > [root@ZFS-AF ~]# sg_inq --hex --len=64 ses0 > 00 0d 00 05 02 33 00 40 02 4c 53 49 20 20 20 20 20 ....3.@.LSI > 10 53 41 53 33 78 34 38 20 20 20 20 20 20 20 20 20 SAS3x48 > 20 30 37 30 31 78 34 38 2d 36 36 2e 37 2e 31 2e 31 0701x48-66.7.1.1 > 30 37 00 20 20 20 20 20 20 7. > [root@ZFS-AF ~]# sg_inq --hex --len=64 ses1 > 00 0d 00 05 02 33 00 40 02 4c 53 49 20 20 20 20 20 ....3.@.LSI > 10 53 41 53 33 78 33 36 20 20 20 20 20 20 20 20 20 SAS3x36 > 20 30 37 30 31 78 33 36 2d 36 36 2e 37 2e 31 2e 31 0701x36-66.7.1.1 > 30 37 00 20 20 20 20 20 20 7. > [root@ZFS-AF ~]# sg_inq --hex --len=64 ses2 > SCSI INQUIRY failed on ses2, res=-1 > [root@ZFS-AF ~]# sg_inq --hex --len=64 ses3 > SCSI INQUIRY failed on ses3, res=-1 > [root@ZFS-AF ~]# sg_inq --hex --len=64 ses4 > 00 0d 00 05 02 33 00 40 02 4c 53 49 20 20 20 20 20 ....3.@.LSI > 10 53 41 53 33 78 32 38 20 20 20 20 20 20 20 20 20 SAS3x28 > 20 30 37 30 31 78 32 38 2d 36 36 2e 37 2e 31 2e 31 0701x28-66.7.1.1 > 30 37 00 20 20 20 20 20 20 7. > [root@ZFS-AF ~]# sg_inq --hex --len=64 ses5 > 00 0d 00 05 02 33 00 40 02 4c 53 49 20 20 20 20 20 ....3.@.LSI > 10 53 41 53 33 78 34 38 20 20 20 20 20 20 20 20 20 SAS3x48 > 20 30 37 30 31 78 34 38 2d 36 36 2e 37 2e 31 2e 31 0701x48-66.7.1.1 > 30 37 00 20 20 20 20 20 20 7. > [root@ZFS-AF ~]# > > > And here's dmesg after fresh reboot: Well, that's weird. Your firmware versions look OK, though you might want to upgrade mpr0 just to be consistent. The next thing I would check, if I were you, would be devctl messages. Edit /etc/syslog.conf and change devd's loglevel to INFO, then HUP syslogd. Now every devctl message should get logged in /var/log/devd.log. That will tell you more precisely than dmesg whether there are any arrival or departure events. -Alan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2jQUQqDuW21grACVvYzdNcREdtMB55=2YR8TZ9V22FGqg>