Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Jul 2018 16:35:23 -0400
From:      Ken Merry <ken@freebsd.org>
To:        Stephen Mcconnell <stephen.mcconnell@broadcom.com>, Oliver Sech <crimsonthunder@gmx.net>
Cc:        FreeBSD-scsi <freebsd-scsi@freebsd.org>
Subject:   Re: problems with SAS JBODs 2
Message-ID:  <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org>
In-Reply-To: <0af047d477d15ec364140653bd967c89@mail.gmail.com>
References:  <trinity-14d18077-ea73-40f6-9e87-d2d4000b1f7e-1530620937871@3c-app-gmx-bs01> <CAOtMX2h8r31AeNCKyckK2P0VLn1CKFogo9bWom2So1x2ngpa4A@mail.gmail.com> <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <b785fe02-9242-c95f-56cb-2130f90e17f5@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Yes, I agree, Oliver=E2=80=99s problem looks different.

Oliver, for your second set of files (freebsd_sas2.zip) it looks like =
you may have devices that aren=E2=80=99t completely going away, even =
from a SAS standpoint.

Here are the 25 target IDs that show up in 2_shelf_connected_dmesg.txt:

mpr0: mprsas_add_device: Target ID for added device is 467.
mpr0: mprsas_add_device: Target ID for added device is 468.
mpr0: mprsas_add_device: Target ID for added device is 469.
mpr0: mprsas_add_device: Target ID for added device is 470.
mpr0: mprsas_add_device: Target ID for added device is 471.
mpr0: mprsas_add_device: Target ID for added device is 472.
mpr0: mprsas_add_device: Target ID for added device is 473.
mpr0: mprsas_add_device: Target ID for added device is 474.
mpr0: mprsas_add_device: Target ID for added device is 475.
mpr0: mprsas_add_device: Target ID for added device is 476.
mpr0: mprsas_add_device: Target ID for added device is 477.
mpr0: mprsas_add_device: Target ID for added device is 478.
mpr0: mprsas_add_device: Target ID for added device is 479.
mpr0: mprsas_add_device: Target ID for added device is 480.
mpr0: mprsas_add_device: Target ID for added device is 481.
mpr0: mprsas_add_device: Target ID for added device is 482.
mpr0: mprsas_add_device: Target ID for added device is 483.
mpr0: mprsas_add_device: Target ID for added device is 484.
mpr0: mprsas_add_device: Target ID for added device is 485.
mpr0: mprsas_add_device: Target ID for added device is 486.
mpr0: mprsas_add_device: Target ID for added device is 487.
mpr0: mprsas_add_device: Target ID for added device is 488.
mpr0: mprsas_add_device: Target ID for added device is 489.
mpr0: mprsas_add_device: Target ID for added device is 490.
mpr0: mprsas_add_device: Target ID for added device is 503.

Here are the 8 target IDs that disappear in =
3_shelf_disconnected_dmesg.txt:

mpr0: mprsas_prepare_remove: Sending reset for target ID 467
mpr0: mprsas_prepare_remove: Sending reset for target ID 468
mpr0: mprsas_prepare_remove: Sending reset for target ID 469
mpr0: mprsas_prepare_remove: Sending reset for target ID 470
mpr0: mprsas_prepare_remove: Sending reset for target ID 471
mpr0: mprsas_prepare_remove: Sending reset for target ID 472
mpr0: mprsas_prepare_remove: Sending reset for target ID 473
mpr0: mprsas_prepare_remove: Sending reset for target ID 474

And here are the same 8 target IDs getting added in =
4_shelf_reconnected_dmesg.txt:

mpr0: mprsas_add_device: Target ID for added device is 467.
mpr0: mprsas_add_device: Target ID for added device is 468.
mpr0: mprsas_add_device: Target ID for added device is 469.
mpr0: mprsas_add_device: Target ID for added device is 470.
mpr0: mprsas_add_device: Target ID for added device is 471.
mpr0: mprsas_add_device: Target ID for added device is 472.
mpr0: mprsas_add_device: Target ID for added device is 473.
mpr0: mprsas_add_device: Target ID for added device is 474.

Oliver, what happens when you try to do I/O to the devices that don=E2=80=99=
t go away after you pull the cable?  Does that cause the devices to go =
away?

Looking at the mprutil output, it also shows the devices sticking around =
from the adapter=E2=80=99s standpoint.

You can also try a =E2=80=98camcontrol rescan all=E2=80=99 or a =
=E2=80=98camcontrol rescan N=E2=80=99 (where N is the scbus number shown =
by =E2=80=98camcontrol devlist -v=E2=80=99).  That will do some basic =
probes for each of the devices and should in theory cause them to go =
away if they aren=E2=80=99t accessible.

It seems like the adapter may not be recognizing that the devices in =
question have gone.

Steve, do you have any ideas what could be going on?

Ken
=E2=80=94=20
Ken Merry
ken@FreeBSD.ORG



> On Jul 10, 2018, at 11:48 AM, Stephen Mcconnell via freebsd-scsi =
<freebsd-scsi@freebsd.org> wrote:
>=20
> Ken, I looked at the logs and I don't see anything in them that =
suggests
> that the driver is not adding any of the devices. In fact, I don't see
> anything that looks strange at all. This looks like a different =
problem than
> the other one you mentioned. What do you think?
>=20
> Steve
>=20
>> -----Original Message-----
>> From: Stephen Mcconnell [mailto:stephen.mcconnell@broadcom.com]
>> Sent: Tuesday, July 10, 2018 9:28 AM
>> To: 'Oliver Sech'; 'FreeBSD-scsi'
>> Subject: RE: problems with SAS JBODs 2
>>=20
>> Hi Oliver, I can't get to your links. Can you try to send the logs in
>> another
>> way?
>>=20
>> Steve
>>=20
>>> -----Original Message-----
>>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd-
>>> scsi@freebsd.org] On Behalf Of Oliver Sech
>>> Sent: Tuesday, July 10, 2018 9:14 AM
>>> To: FreeBSD-scsi
>>> Subject: Re: problems with SAS JBODs 2
>>>=20
>>> I tested a few additional things. I don't think this is a multipath,
>>> daisy
>> chain
>>> nor a SAS wide ports problem.
>>> I can reproduce the problem with just a single connection to an
>>> Expander/JBOD.
>>>=20
>>> Test:
>>> * physically disconnect all shelves
>>> * reboot system
>>> * connect one shelf via SAS cable
>>> * check number of disks (after a reboot everything always shows up)
>>> * disconnect the shelf and wait (geom disk list still shows most =
disks.)
>>> * connect the shelf (missing disks)
>>>=20
>>> Tested Hardware:
>>> * Supermicro SAS3 847E2C-R1K28JBOD     + SAS3 LSI 9305-16e ( =
internal
>> daisy
>>> chain + wide links)
>>> * Supermicro SAS3 847E2C-R1K28JBOD     + SAS3 LSI 9305-16e (straight =
HBA
>> <-
>>>> EXPANDER connection. (no wide links, no daisy chain))
>>> * Supermicro SAS2 SC847E26-RJBOD1      + SAS3 LSI 9305-16e (internal
>>> daisy
>>> chain)
>>> * Promise    SAS2 VTrak 830            + SAS3 LSI 9305-16e (straight =
HBA
>>> <->
>>> EXPANDER connection.)
>>>=20
>>>=20
>>>=20
>>> On 07/04/2018 12:15 PM, Oliver Sech wrote:
>>>>> 1) Are the expanders daisy chained?  Some SAS expanders don't work
>>> reliably
>>>>> when daisy chained.   Best to direct connect each one to the =
server.
>>>> At the moment I have 1 JBOD connected to 1 HBA Port with 1 cable (4
>>> lanes?).
>>>> Unfortunately the JBOD has 24 slots in the front and 20 in the back
>>>> and,
>>> those are connected via a internal SAS daisy chaining.
>>>> I could rewire and connect each backplane directly to the server, =
but
>>> unfortunately I do not have enough ports..
>>>>=20
>>>> JOBD Model: Supermicro 847E2C-R1K28JBOD
>>>>=20
>>>>> 2) Are the expanders connected in multipath or single path?  You =
need
>>>>> geom_multipath if you're going to do that.
>>>> See answer 1. There is a single path from the host to the first
>>>> expander.
>>>>=20
>>>>> 3) Are you attempting to use wide ports (two SAS cables connecting
>> each
>>>>> expander to the HBA).  If do, you'll need to make sure that each =
pair
>>>>> of
>>>>> SAS cables goes to the same HBA chip (not merely the same card, as
>> some
>>>>> cards contain two HBA chips).
>>>> see 1. The last time I opened one of those JBODs there were 8 SAS
>>>> cables
>>> between the Front and Back expander. I assume that wide ports are =
being
>>> used.
>>>> (2 expanders per backplane as well)
>>>>=20
>>>>> 4) Are you trying to remove an expander while ZFS is active on =
that
>>>>> expander?  That will suspend your pool, and ZFS doesn't always
>>>>> recover
>>> from
>>>>> a suspended state.
>>>> I'm testing with a new unused disk shelf that was never part of the
>>>> ZFS
>>> pool. There were
>>>> _______________________________________________
>>>> freebsd-scsi@freebsd.org mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
>>>> To unsubscribe, send any mail to
>>>> "freebsd-scsi-unsubscribe@freebsd.org"
>>> _______________________________________________
>>> freebsd-scsi@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
>>> To unsubscribe, send any mail to =
"freebsd-scsi-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to =
"freebsd-scsi-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54B10B7C-CDCE-4428-B584-59CE8F38B120>