Date: Wed, 11 Jul 2018 16:35:23 -0400 From: Ken Merry <ken@freebsd.org> To: Stephen Mcconnell <stephen.mcconnell@broadcom.com>, Oliver Sech <crimsonthunder@gmx.net> Cc: FreeBSD-scsi <freebsd-scsi@freebsd.org> Subject: Re: problems with SAS JBODs 2 Message-ID: <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> In-Reply-To: <0af047d477d15ec364140653bd967c89@mail.gmail.com> References: <trinity-14d18077-ea73-40f6-9e87-d2d4000b1f7e-1530620937871@3c-app-gmx-bs01> <CAOtMX2h8r31AeNCKyckK2P0VLn1CKFogo9bWom2So1x2ngpa4A@mail.gmail.com> <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <b785fe02-9242-c95f-56cb-2130f90e17f5@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Yes, I agree, Oliver=E2=80=99s problem looks different. Oliver, for your second set of files (freebsd_sas2.zip) it looks like = you may have devices that aren=E2=80=99t completely going away, even = from a SAS standpoint. Here are the 25 target IDs that show up in 2_shelf_connected_dmesg.txt: mpr0: mprsas_add_device: Target ID for added device is 467. mpr0: mprsas_add_device: Target ID for added device is 468. mpr0: mprsas_add_device: Target ID for added device is 469. mpr0: mprsas_add_device: Target ID for added device is 470. mpr0: mprsas_add_device: Target ID for added device is 471. mpr0: mprsas_add_device: Target ID for added device is 472. mpr0: mprsas_add_device: Target ID for added device is 473. mpr0: mprsas_add_device: Target ID for added device is 474. mpr0: mprsas_add_device: Target ID for added device is 475. mpr0: mprsas_add_device: Target ID for added device is 476. mpr0: mprsas_add_device: Target ID for added device is 477. mpr0: mprsas_add_device: Target ID for added device is 478. mpr0: mprsas_add_device: Target ID for added device is 479. mpr0: mprsas_add_device: Target ID for added device is 480. mpr0: mprsas_add_device: Target ID for added device is 481. mpr0: mprsas_add_device: Target ID for added device is 482. mpr0: mprsas_add_device: Target ID for added device is 483. mpr0: mprsas_add_device: Target ID for added device is 484. mpr0: mprsas_add_device: Target ID for added device is 485. mpr0: mprsas_add_device: Target ID for added device is 486. mpr0: mprsas_add_device: Target ID for added device is 487. mpr0: mprsas_add_device: Target ID for added device is 488. mpr0: mprsas_add_device: Target ID for added device is 489. mpr0: mprsas_add_device: Target ID for added device is 490. mpr0: mprsas_add_device: Target ID for added device is 503. Here are the 8 target IDs that disappear in = 3_shelf_disconnected_dmesg.txt: mpr0: mprsas_prepare_remove: Sending reset for target ID 467 mpr0: mprsas_prepare_remove: Sending reset for target ID 468 mpr0: mprsas_prepare_remove: Sending reset for target ID 469 mpr0: mprsas_prepare_remove: Sending reset for target ID 470 mpr0: mprsas_prepare_remove: Sending reset for target ID 471 mpr0: mprsas_prepare_remove: Sending reset for target ID 472 mpr0: mprsas_prepare_remove: Sending reset for target ID 473 mpr0: mprsas_prepare_remove: Sending reset for target ID 474 And here are the same 8 target IDs getting added in = 4_shelf_reconnected_dmesg.txt: mpr0: mprsas_add_device: Target ID for added device is 467. mpr0: mprsas_add_device: Target ID for added device is 468. mpr0: mprsas_add_device: Target ID for added device is 469. mpr0: mprsas_add_device: Target ID for added device is 470. mpr0: mprsas_add_device: Target ID for added device is 471. mpr0: mprsas_add_device: Target ID for added device is 472. mpr0: mprsas_add_device: Target ID for added device is 473. mpr0: mprsas_add_device: Target ID for added device is 474. Oliver, what happens when you try to do I/O to the devices that don=E2=80=99= t go away after you pull the cable? Does that cause the devices to go = away? Looking at the mprutil output, it also shows the devices sticking around = from the adapter=E2=80=99s standpoint. You can also try a =E2=80=98camcontrol rescan all=E2=80=99 or a = =E2=80=98camcontrol rescan N=E2=80=99 (where N is the scbus number shown = by =E2=80=98camcontrol devlist -v=E2=80=99). That will do some basic = probes for each of the devices and should in theory cause them to go = away if they aren=E2=80=99t accessible. It seems like the adapter may not be recognizing that the devices in = question have gone. Steve, do you have any ideas what could be going on? Ken =E2=80=94=20 Ken Merry ken@FreeBSD.ORG > On Jul 10, 2018, at 11:48 AM, Stephen Mcconnell via freebsd-scsi = <freebsd-scsi@freebsd.org> wrote: >=20 > Ken, I looked at the logs and I don't see anything in them that = suggests > that the driver is not adding any of the devices. In fact, I don't see > anything that looks strange at all. This looks like a different = problem than > the other one you mentioned. What do you think? >=20 > Steve >=20 >> -----Original Message----- >> From: Stephen Mcconnell [mailto:stephen.mcconnell@broadcom.com] >> Sent: Tuesday, July 10, 2018 9:28 AM >> To: 'Oliver Sech'; 'FreeBSD-scsi' >> Subject: RE: problems with SAS JBODs 2 >>=20 >> Hi Oliver, I can't get to your links. Can you try to send the logs in >> another >> way? >>=20 >> Steve >>=20 >>> -----Original Message----- >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- >>> scsi@freebsd.org] On Behalf Of Oliver Sech >>> Sent: Tuesday, July 10, 2018 9:14 AM >>> To: FreeBSD-scsi >>> Subject: Re: problems with SAS JBODs 2 >>>=20 >>> I tested a few additional things. I don't think this is a multipath, >>> daisy >> chain >>> nor a SAS wide ports problem. >>> I can reproduce the problem with just a single connection to an >>> Expander/JBOD. >>>=20 >>> Test: >>> * physically disconnect all shelves >>> * reboot system >>> * connect one shelf via SAS cable >>> * check number of disks (after a reboot everything always shows up) >>> * disconnect the shelf and wait (geom disk list still shows most = disks.) >>> * connect the shelf (missing disks) >>>=20 >>> Tested Hardware: >>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e ( = internal >> daisy >>> chain + wide links) >>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e (straight = HBA >> <- >>>> EXPANDER connection. (no wide links, no daisy chain)) >>> * Supermicro SAS2 SC847E26-RJBOD1 + SAS3 LSI 9305-16e (internal >>> daisy >>> chain) >>> * Promise SAS2 VTrak 830 + SAS3 LSI 9305-16e (straight = HBA >>> <-> >>> EXPANDER connection.) >>>=20 >>>=20 >>>=20 >>> On 07/04/2018 12:15 PM, Oliver Sech wrote: >>>>> 1) Are the expanders daisy chained? Some SAS expanders don't work >>> reliably >>>>> when daisy chained. Best to direct connect each one to the = server. >>>> At the moment I have 1 JBOD connected to 1 HBA Port with 1 cable (4 >>> lanes?). >>>> Unfortunately the JBOD has 24 slots in the front and 20 in the back >>>> and, >>> those are connected via a internal SAS daisy chaining. >>>> I could rewire and connect each backplane directly to the server, = but >>> unfortunately I do not have enough ports.. >>>>=20 >>>> JOBD Model: Supermicro 847E2C-R1K28JBOD >>>>=20 >>>>> 2) Are the expanders connected in multipath or single path? You = need >>>>> geom_multipath if you're going to do that. >>>> See answer 1. There is a single path from the host to the first >>>> expander. >>>>=20 >>>>> 3) Are you attempting to use wide ports (two SAS cables connecting >> each >>>>> expander to the HBA). If do, you'll need to make sure that each = pair >>>>> of >>>>> SAS cables goes to the same HBA chip (not merely the same card, as >> some >>>>> cards contain two HBA chips). >>>> see 1. The last time I opened one of those JBODs there were 8 SAS >>>> cables >>> between the Front and Back expander. I assume that wide ports are = being >>> used. >>>> (2 expanders per backplane as well) >>>>=20 >>>>> 4) Are you trying to remove an expander while ZFS is active on = that >>>>> expander? That will suspend your pool, and ZFS doesn't always >>>>> recover >>> from >>>>> a suspended state. >>>> I'm testing with a new unused disk shelf that was never part of the >>>> ZFS >>> pool. There were >>>> _______________________________________________ >>>> freebsd-scsi@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>> To unsubscribe, send any mail to >>>> "freebsd-scsi-unsubscribe@freebsd.org" >>> _______________________________________________ >>> freebsd-scsi@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>> To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54B10B7C-CDCE-4428-B584-59CE8F38B120>