Date: Wed, 11 Jul 2018 14:50:02 -0600 From: slm@freebsd.org To: Ken Merry <ken@freebsd.org>, Oliver Sech <crimsonthunder@gmx.net> Cc: FreeBSD-scsi <freebsd-scsi@freebsd.org> Subject: RE: problems with SAS JBODs 2 Message-ID: <6bc79bf80dbfbba8e77bb40d5b1a0512@mail.gmail.com> In-Reply-To: <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> References: <trinity-14d18077-ea73-40f6-9e87-d2d4000b1f7e-1530620937871@3c-app-gmx-bs01> <CAOtMX2h8r31AeNCKyckK2P0VLn1CKFogo9bWom2So1x2ngpa4A@mail.gmail.com> <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <b785fe02-9242-c95f-56cb-2130f90e17f5@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
I'm think this is a mapping table problem or the use_phy_num problem. I'm having Oliver change the use_phy_num sysctl values to 0 and then use your script to clear out the controller mapping entries to see what happens. Steve > -----Original Message----- > From: Ken Merry [mailto:ken@freebsd.org] > Sent: Wednesday, July 11, 2018 2:35 PM > To: Stephen Mcconnell; Oliver Sech > Cc: FreeBSD-scsi > Subject: Re: problems with SAS JBODs 2 > > Yes, I agree, Oliver=E2=80=99s problem looks different. > > Oliver, for your second set of files (freebsd_sas2.zip) it looks like you > may > have devices that aren=E2=80=99t completely going away, even from a SAS > standpoint. > > Here are the 25 target IDs that show up in 2_shelf_connected_dmesg.txt: > > mpr0: mprsas_add_device: Target ID for added device is 467. > mpr0: mprsas_add_device: Target ID for added device is 468. > mpr0: mprsas_add_device: Target ID for added device is 469. > mpr0: mprsas_add_device: Target ID for added device is 470. > mpr0: mprsas_add_device: Target ID for added device is 471. > mpr0: mprsas_add_device: Target ID for added device is 472. > mpr0: mprsas_add_device: Target ID for added device is 473. > mpr0: mprsas_add_device: Target ID for added device is 474. > mpr0: mprsas_add_device: Target ID for added device is 475. > mpr0: mprsas_add_device: Target ID for added device is 476. > mpr0: mprsas_add_device: Target ID for added device is 477. > mpr0: mprsas_add_device: Target ID for added device is 478. > mpr0: mprsas_add_device: Target ID for added device is 479. > mpr0: mprsas_add_device: Target ID for added device is 480. > mpr0: mprsas_add_device: Target ID for added device is 481. > mpr0: mprsas_add_device: Target ID for added device is 482. > mpr0: mprsas_add_device: Target ID for added device is 483. > mpr0: mprsas_add_device: Target ID for added device is 484. > mpr0: mprsas_add_device: Target ID for added device is 485. > mpr0: mprsas_add_device: Target ID for added device is 486. > mpr0: mprsas_add_device: Target ID for added device is 487. > mpr0: mprsas_add_device: Target ID for added device is 488. > mpr0: mprsas_add_device: Target ID for added device is 489. > mpr0: mprsas_add_device: Target ID for added device is 490. > mpr0: mprsas_add_device: Target ID for added device is 503. > > Here are the 8 target IDs that disappear in > 3_shelf_disconnected_dmesg.txt: > > mpr0: mprsas_prepare_remove: Sending reset for target ID 467 > mpr0: mprsas_prepare_remove: Sending reset for target ID 468 > mpr0: mprsas_prepare_remove: Sending reset for target ID 469 > mpr0: mprsas_prepare_remove: Sending reset for target ID 470 > mpr0: mprsas_prepare_remove: Sending reset for target ID 471 > mpr0: mprsas_prepare_remove: Sending reset for target ID 472 > mpr0: mprsas_prepare_remove: Sending reset for target ID 473 > mpr0: mprsas_prepare_remove: Sending reset for target ID 474 > > And here are the same 8 target IDs getting added in > 4_shelf_reconnected_dmesg.txt: > > mpr0: mprsas_add_device: Target ID for added device is 467. > mpr0: mprsas_add_device: Target ID for added device is 468. > mpr0: mprsas_add_device: Target ID for added device is 469. > mpr0: mprsas_add_device: Target ID for added device is 470. > mpr0: mprsas_add_device: Target ID for added device is 471. > mpr0: mprsas_add_device: Target ID for added device is 472. > mpr0: mprsas_add_device: Target ID for added device is 473. > mpr0: mprsas_add_device: Target ID for added device is 474. > > Oliver, what happens when you try to do I/O to the devices that don=E2=80= =99t go > away after you pull the cable? Does that cause the devices to go away? > > Looking at the mprutil output, it also shows the devices sticking around > from > the adapter=E2=80=99s standpoint. > > You can also try a =E2=80=98camcontrol rescan all=E2=80=99 or a =E2=80=98= camcontrol rescan N=E2=80=99 > (where N > is the scbus number shown by =E2=80=98camcontrol devlist -v=E2=80=99). T= hat will do some > basic probes for each of the devices and should in theory cause them to g= o > away if they aren=E2=80=99t accessible. > > It seems like the adapter may not be recognizing that the devices in > question > have gone. > > Steve, do you have any ideas what could be going on? > > Ken > =E2=80=94 > Ken Merry > ken@FreeBSD.ORG > > > > > On Jul 10, 2018, at 11:48 AM, Stephen Mcconnell via freebsd-scsi > > <freebsd- > scsi@freebsd.org> wrote: > > > > Ken, I looked at the logs and I don't see anything in them that suggest= s > > that the driver is not adding any of the devices. In fact, I don't see > > anything that looks strange at all. This looks like a different problem > > than > > the other one you mentioned. What do you think? > > > > Steve > > > >> -----Original Message----- > >> From: Stephen Mcconnell [mailto:stephen.mcconnell@broadcom.com] > >> Sent: Tuesday, July 10, 2018 9:28 AM > >> To: 'Oliver Sech'; 'FreeBSD-scsi' > >> Subject: RE: problems with SAS JBODs 2 > >> > >> Hi Oliver, I can't get to your links. Can you try to send the logs in > >> another > >> way? > >> > >> Steve > >> > >>> -----Original Message----- > >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > >>> scsi@freebsd.org] On Behalf Of Oliver Sech > >>> Sent: Tuesday, July 10, 2018 9:14 AM > >>> To: FreeBSD-scsi > >>> Subject: Re: problems with SAS JBODs 2 > >>> > >>> I tested a few additional things. I don't think this is a multipath, > >>> daisy > >> chain > >>> nor a SAS wide ports problem. > >>> I can reproduce the problem with just a single connection to an > >>> Expander/JBOD. > >>> > >>> Test: > >>> * physically disconnect all shelves > >>> * reboot system > >>> * connect one shelf via SAS cable > >>> * check number of disks (after a reboot everything always shows up) > >>> * disconnect the shelf and wait (geom disk list still shows most > >>> disks.) > >>> * connect the shelf (missing disks) > >>> > >>> Tested Hardware: > >>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e ( internal > >> daisy > >>> chain + wide links) > >>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e (straight > HBA > >> <- > >>>> EXPANDER connection. (no wide links, no daisy chain)) > >>> * Supermicro SAS2 SC847E26-RJBOD1 + SAS3 LSI 9305-16e (internal > >>> daisy > >>> chain) > >>> * Promise SAS2 VTrak 830 + SAS3 LSI 9305-16e (straight > >>> HBA > >>> <-> > >>> EXPANDER connection.) > >>> > >>> > >>> > >>> On 07/04/2018 12:15 PM, Oliver Sech wrote: > >>>>> 1) Are the expanders daisy chained? Some SAS expanders don't work > >>> reliably > >>>>> when daisy chained. Best to direct connect each one to the server= . > >>>> At the moment I have 1 JBOD connected to 1 HBA Port with 1 cable (4 > >>> lanes?). > >>>> Unfortunately the JBOD has 24 slots in the front and 20 in the back > >>>> and, > >>> those are connected via a internal SAS daisy chaining. > >>>> I could rewire and connect each backplane directly to the server, bu= t > >>> unfortunately I do not have enough ports.. > >>>> > >>>> JOBD Model: Supermicro 847E2C-R1K28JBOD > >>>> > >>>>> 2) Are the expanders connected in multipath or single path? You > need > >>>>> geom_multipath if you're going to do that. > >>>> See answer 1. There is a single path from the host to the first > >>>> expander. > >>>> > >>>>> 3) Are you attempting to use wide ports (two SAS cables connecting > >> each > >>>>> expander to the HBA). If do, you'll need to make sure that each > >>>>> pair > >>>>> of > >>>>> SAS cables goes to the same HBA chip (not merely the same card, as > >> some > >>>>> cards contain two HBA chips). > >>>> see 1. The last time I opened one of those JBODs there were 8 SAS > >>>> cables > >>> between the Front and Back expander. I assume that wide ports are > being > >>> used. > >>>> (2 expanders per backplane as well) > >>>> > >>>>> 4) Are you trying to remove an expander while ZFS is active on that > >>>>> expander? That will suspend your pool, and ZFS doesn't always > >>>>> recover > >>> from > >>>>> a suspended state. > >>>> I'm testing with a new unused disk shelf that was never part of the > >>>> ZFS > >>> pool. There were > >>>> _______________________________________________ > >>>> freebsd-scsi@freebsd.org mailing list > >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > >>>> To unsubscribe, send any mail to > >>>> "freebsd-scsi-unsubscribe@freebsd.org" > >>> _______________________________________________ > >>> freebsd-scsi@freebsd.org mailing list > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > >>> To unsubscribe, send any mail to "freebsd-scsi- > unsubscribe@freebsd.org" > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6bc79bf80dbfbba8e77bb40d5b1a0512>