From owner-freebsd-scsi@freebsd.org Wed Jul 11 20:56:32 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EFB2510443DD for ; Wed, 11 Jul 2018 20:56:31 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-io0-f182.google.com (mail-io0-f182.google.com [209.85.223.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 807B986FD8 for ; Wed, 11 Jul 2018 20:56:31 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-io0-f182.google.com with SMTP id r24-v6so25643271ioh.9 for ; Wed, 11 Jul 2018 13:56:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc :content-transfer-encoding; bh=OodQzPoCYQZhb17TiwZnJlfwChFvuJUI8DwTTJl9CI0=; b=QRO9OeX5eQ0BCPm2E5ZgKVFj69T8PG3U4wnKK2HqprdqP2wVDXm709e4wWmFNBPQUN /pDgJDSOaik+TTbONLw8Wq3X2Kd++GaBFnol3J7WlvTB9XQYXQvi+UKtR7SciOs4I+lG wLMtOWJeYRvTjexOqSHndN89OUNl3niU2lyZd9iM6Kv58Edg/KSanuyGxheLOcipHfIg HNRdUKZjJM5bdLiBz/nyoFhJAgcYCqkh0fZI5HIKjStpnq1ArP3mCm+w+QNxZz/uU7le nmIjT8DsDRdsSbYhznPeH2AM4XAq6Qy6qTpwVc4LzOHq/dlM1JvZs7CJFiv5D5ih3Oiv Y19A== X-Gm-Message-State: AOUpUlFKBeYnKyUlg2jMX95AbrA2rm/krTvZNdwhA191PADDd+KfYsxj GYSPxQwn1oIFxkgiCMMGIjx1h8bB X-Google-Smtp-Source: AAOMgpdAmH79mL9aMflVUfBLiUopzgjopadVHSiH+GIFhw5AdizEHycCt7LRGh9r98sFwRVrKxKjNw== X-Received: by 2002:a6b:660e:: with SMTP id a14-v6mr584017ioc.339.1531342204427; Wed, 11 Jul 2018 13:50:04 -0700 (PDT) Received: from mail-it0-f54.google.com (mail-it0-f54.google.com. [209.85.214.54]) by smtp.gmail.com with ESMTPSA id t66-v6sm1527764ita.24.2018.07.11.13.50.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Jul 2018 13:50:04 -0700 (PDT) Received: by mail-it0-f54.google.com with SMTP id p4-v6so4470311itf.2 for ; Wed, 11 Jul 2018 13:50:03 -0700 (PDT) X-Received: by 2002:a02:8952:: with SMTP id u18-v6mr110026jaj.13.1531342203259; Wed, 11 Jul 2018 13:50:03 -0700 (PDT) From: slm@freebsd.org References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> In-Reply-To: <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQHJZ/UmTT9Y1rodqvzH7TRwbPT2YALnpLW+Ap4aqgkCOUyURQHo0+HHAeMuJHcCKy6uYKQyLw/A Date: Wed, 11 Jul 2018 14:50:02 -0600 X-Gmail-Original-Message-ID: <6bc79bf80dbfbba8e77bb40d5b1a0512@mail.gmail.com> Message-ID: <6bc79bf80dbfbba8e77bb40d5b1a0512@mail.gmail.com> Subject: RE: problems with SAS JBODs 2 To: Ken Merry , Oliver Sech Cc: FreeBSD-scsi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jul 2018 20:56:32 -0000 I'm think this is a mapping table problem or the use_phy_num problem. I'm having Oliver change the use_phy_num sysctl values to 0 and then use your script to clear out the controller mapping entries to see what happens. Steve > -----Original Message----- > From: Ken Merry [mailto:ken@freebsd.org] > Sent: Wednesday, July 11, 2018 2:35 PM > To: Stephen Mcconnell; Oliver Sech > Cc: FreeBSD-scsi > Subject: Re: problems with SAS JBODs 2 > > Yes, I agree, Oliver=E2=80=99s problem looks different. > > Oliver, for your second set of files (freebsd_sas2.zip) it looks like you > may > have devices that aren=E2=80=99t completely going away, even from a SAS > standpoint. > > Here are the 25 target IDs that show up in 2_shelf_connected_dmesg.txt: > > mpr0: mprsas_add_device: Target ID for added device is 467. > mpr0: mprsas_add_device: Target ID for added device is 468. > mpr0: mprsas_add_device: Target ID for added device is 469. > mpr0: mprsas_add_device: Target ID for added device is 470. > mpr0: mprsas_add_device: Target ID for added device is 471. > mpr0: mprsas_add_device: Target ID for added device is 472. > mpr0: mprsas_add_device: Target ID for added device is 473. > mpr0: mprsas_add_device: Target ID for added device is 474. > mpr0: mprsas_add_device: Target ID for added device is 475. > mpr0: mprsas_add_device: Target ID for added device is 476. > mpr0: mprsas_add_device: Target ID for added device is 477. > mpr0: mprsas_add_device: Target ID for added device is 478. > mpr0: mprsas_add_device: Target ID for added device is 479. > mpr0: mprsas_add_device: Target ID for added device is 480. > mpr0: mprsas_add_device: Target ID for added device is 481. > mpr0: mprsas_add_device: Target ID for added device is 482. > mpr0: mprsas_add_device: Target ID for added device is 483. > mpr0: mprsas_add_device: Target ID for added device is 484. > mpr0: mprsas_add_device: Target ID for added device is 485. > mpr0: mprsas_add_device: Target ID for added device is 486. > mpr0: mprsas_add_device: Target ID for added device is 487. > mpr0: mprsas_add_device: Target ID for added device is 488. > mpr0: mprsas_add_device: Target ID for added device is 489. > mpr0: mprsas_add_device: Target ID for added device is 490. > mpr0: mprsas_add_device: Target ID for added device is 503. > > Here are the 8 target IDs that disappear in > 3_shelf_disconnected_dmesg.txt: > > mpr0: mprsas_prepare_remove: Sending reset for target ID 467 > mpr0: mprsas_prepare_remove: Sending reset for target ID 468 > mpr0: mprsas_prepare_remove: Sending reset for target ID 469 > mpr0: mprsas_prepare_remove: Sending reset for target ID 470 > mpr0: mprsas_prepare_remove: Sending reset for target ID 471 > mpr0: mprsas_prepare_remove: Sending reset for target ID 472 > mpr0: mprsas_prepare_remove: Sending reset for target ID 473 > mpr0: mprsas_prepare_remove: Sending reset for target ID 474 > > And here are the same 8 target IDs getting added in > 4_shelf_reconnected_dmesg.txt: > > mpr0: mprsas_add_device: Target ID for added device is 467. > mpr0: mprsas_add_device: Target ID for added device is 468. > mpr0: mprsas_add_device: Target ID for added device is 469. > mpr0: mprsas_add_device: Target ID for added device is 470. > mpr0: mprsas_add_device: Target ID for added device is 471. > mpr0: mprsas_add_device: Target ID for added device is 472. > mpr0: mprsas_add_device: Target ID for added device is 473. > mpr0: mprsas_add_device: Target ID for added device is 474. > > Oliver, what happens when you try to do I/O to the devices that don=E2=80= =99t go > away after you pull the cable? Does that cause the devices to go away? > > Looking at the mprutil output, it also shows the devices sticking around > from > the adapter=E2=80=99s standpoint. > > You can also try a =E2=80=98camcontrol rescan all=E2=80=99 or a =E2=80=98= camcontrol rescan N=E2=80=99 > (where N > is the scbus number shown by =E2=80=98camcontrol devlist -v=E2=80=99). T= hat will do some > basic probes for each of the devices and should in theory cause them to g= o > away if they aren=E2=80=99t accessible. > > It seems like the adapter may not be recognizing that the devices in > question > have gone. > > Steve, do you have any ideas what could be going on? > > Ken > =E2=80=94 > Ken Merry > ken@FreeBSD.ORG > > > > > On Jul 10, 2018, at 11:48 AM, Stephen Mcconnell via freebsd-scsi > > scsi@freebsd.org> wrote: > > > > Ken, I looked at the logs and I don't see anything in them that suggest= s > > that the driver is not adding any of the devices. In fact, I don't see > > anything that looks strange at all. This looks like a different problem > > than > > the other one you mentioned. What do you think? > > > > Steve > > > >> -----Original Message----- > >> From: Stephen Mcconnell [mailto:stephen.mcconnell@broadcom.com] > >> Sent: Tuesday, July 10, 2018 9:28 AM > >> To: 'Oliver Sech'; 'FreeBSD-scsi' > >> Subject: RE: problems with SAS JBODs 2 > >> > >> Hi Oliver, I can't get to your links. Can you try to send the logs in > >> another > >> way? > >> > >> Steve > >> > >>> -----Original Message----- > >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > >>> scsi@freebsd.org] On Behalf Of Oliver Sech > >>> Sent: Tuesday, July 10, 2018 9:14 AM > >>> To: FreeBSD-scsi > >>> Subject: Re: problems with SAS JBODs 2 > >>> > >>> I tested a few additional things. I don't think this is a multipath, > >>> daisy > >> chain > >>> nor a SAS wide ports problem. > >>> I can reproduce the problem with just a single connection to an > >>> Expander/JBOD. > >>> > >>> Test: > >>> * physically disconnect all shelves > >>> * reboot system > >>> * connect one shelf via SAS cable > >>> * check number of disks (after a reboot everything always shows up) > >>> * disconnect the shelf and wait (geom disk list still shows most > >>> disks.) > >>> * connect the shelf (missing disks) > >>> > >>> Tested Hardware: > >>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e ( internal > >> daisy > >>> chain + wide links) > >>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e (straight > HBA > >> <- > >>>> EXPANDER connection. (no wide links, no daisy chain)) > >>> * Supermicro SAS2 SC847E26-RJBOD1 + SAS3 LSI 9305-16e (internal > >>> daisy > >>> chain) > >>> * Promise SAS2 VTrak 830 + SAS3 LSI 9305-16e (straight > >>> HBA > >>> <-> > >>> EXPANDER connection.) > >>> > >>> > >>> > >>> On 07/04/2018 12:15 PM, Oliver Sech wrote: > >>>>> 1) Are the expanders daisy chained? Some SAS expanders don't work > >>> reliably > >>>>> when daisy chained. Best to direct connect each one to the server= . > >>>> At the moment I have 1 JBOD connected to 1 HBA Port with 1 cable (4 > >>> lanes?). > >>>> Unfortunately the JBOD has 24 slots in the front and 20 in the back > >>>> and, > >>> those are connected via a internal SAS daisy chaining. > >>>> I could rewire and connect each backplane directly to the server, bu= t > >>> unfortunately I do not have enough ports.. > >>>> > >>>> JOBD Model: Supermicro 847E2C-R1K28JBOD > >>>> > >>>>> 2) Are the expanders connected in multipath or single path? You > need > >>>>> geom_multipath if you're going to do that. > >>>> See answer 1. There is a single path from the host to the first > >>>> expander. > >>>> > >>>>> 3) Are you attempting to use wide ports (two SAS cables connecting > >> each > >>>>> expander to the HBA). If do, you'll need to make sure that each > >>>>> pair > >>>>> of > >>>>> SAS cables goes to the same HBA chip (not merely the same card, as > >> some > >>>>> cards contain two HBA chips). > >>>> see 1. The last time I opened one of those JBODs there were 8 SAS > >>>> cables > >>> between the Front and Back expander. I assume that wide ports are > being > >>> used. > >>>> (2 expanders per backplane as well) > >>>> > >>>>> 4) Are you trying to remove an expander while ZFS is active on that > >>>>> expander? That will suspend your pool, and ZFS doesn't always > >>>>> recover > >>> from > >>>>> a suspended state. > >>>> I'm testing with a new unused disk shelf that was never part of the > >>>> ZFS > >>> pool. There were > >>>> _______________________________________________ > >>>> freebsd-scsi@freebsd.org mailing list > >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > >>>> To unsubscribe, send any mail to > >>>> "freebsd-scsi-unsubscribe@freebsd.org" > >>> _______________________________________________ > >>> freebsd-scsi@freebsd.org mailing list > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > >>> To unsubscribe, send any mail to "freebsd-scsi- > unsubscribe@freebsd.org" > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"