From owner-freebsd-scsi@freebsd.org Sun Jul 22 21:01:09 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BFD03102AEF7 for ; Sun, 22 Jul 2018 21:01:09 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 58A818490D for ; Sun, 22 Jul 2018 21:01:09 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: by mailman.ysv.freebsd.org (Postfix) id 12F3C102AED9; Sun, 22 Jul 2018 21:01:09 +0000 (UTC) Delivered-To: scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F3403102AED7 for ; Sun, 22 Jul 2018 21:01:08 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8D46984902 for ; Sun, 22 Jul 2018 21:01:08 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id CCBD321D02 for ; Sun, 22 Jul 2018 21:01:07 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w6ML17eU034292 for ; Sun, 22 Jul 2018 21:01:07 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Received: (from bugzilla@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w6ML17vq034283 for scsi@FreeBSD.org; Sun, 22 Jul 2018 21:01:07 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201807222101.w6ML17vq034283@kenobi.freebsd.org> X-Authentication-Warning: kenobi.freebsd.org: bugzilla set sender to bugzilla-noreply@FreeBSD.org using -f From: bugzilla-noreply@FreeBSD.org To: scsi@FreeBSD.org Subject: Problem reports for scsi@FreeBSD.org that need special attention Date: Sun, 22 Jul 2018 21:01:07 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.27 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Jul 2018 21:01:09 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- New | 221952 | cam iosched: Fix trim statistics 1 problems total for which you should take action. From owner-freebsd-scsi@freebsd.org Mon Jul 23 14:14:37 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 21714104C33B for ; Mon, 23 Jul 2018 14:14:37 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8A0DF7DFF9 for ; Mon, 23 Jul 2018 14:14:36 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from [10.12.22.246] ([193.170.152.64]) by mail.gmx.com (mrgmx103 [212.227.17.168]) with ESMTPSA (Nemesis) id 0MYOCL-1fU2wb06M5-00V6v1 for ; Mon, 23 Jul 2018 16:14:32 +0200 Subject: Re: problems with SAS JBODs 2 References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> <9e0bf18f-0689-b2a0-1da4-b70c497b2f14@gmx.net> <7C1E630B-65AD-4FE8-BFDF-F13068070B5E@freebsd.org> From: Oliver Sech To: FreeBSD-scsi Message-ID: <6e0b8652-f227-271e-aeb4-a868ba6b90e2@gmx.net> Date: Mon, 23 Jul 2018 16:14:31 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <7C1E630B-65AD-4FE8-BFDF-F13068070B5E@freebsd.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K1:F1J2sDduEbRLxjjUBjPyuNX1m26qRc279ij2rKq2/AD0MWi6nkU xb8+Bc6T+arfPd0YyQL2NwanTZhUgsGFHzVrlmCKqrzw/jHvyVEFAkwQ4lr9oG1GOIUQlUh 1P6VsF2kNfQx8SIojbCdGLc0lpnVMd3Z1vgXX/Vnvqv8qIcaK+BgVHCp2pJuj6xOo8nQ6gC Zf/sNZ65O3EfnvW9N20Qw== X-UI-Out-Filterresults: notjunk:1;V01:K0:B9KQmCy9oo8=:tZmNpl5/FIxKkssI/GcwaO e72Aobtr0tmi8j479y6a1zDb55bo1ghmIxqdf6Q+hBOWgG8mhHLP1IkNGXgV7I6nxZrQKaGOd MMklQGfeVgJOmSB9EhQTXr7ZvXZOwBNK81u3nHO+NaFGedgUnk3H73ilHnkgNaIEWoe0/+m9d iBP3/pilcllo5lPG/qM5xiBi/ypHIhCrIAAlwKsSc9F+psmbUOynl6b+0lia1fGN7r0llcN0z tunIGWjIB6ZFsI9PuISiT+xneFPSn92ZZ09opD4drPwkokfRiHoCIvwjSgTizpXV6zgoy2P9y Nc8DRcpHx2KFZZEESLqmJPQ7bz/oSwlxKvIFKIz+n04hrksFvaommvhNA+vSYND8DUz80/x/R 5Ri9xYawOmmIoQeycCxXz3GM9nVBUKKygU3LKr31TclulnaM9USGSvJr1QJt9VlVY5/+i+2DY Y1XE66PwRtgpRbMuWei+gvIjjrHZZjMXAjU2uObmAuQW4cfjwUxVon4OzlFUOPNlJ+StJkzT8 1xQ5v3ME0JUrKki2knxH6sOR1QOCJZ0X+AjAbEMG5fZ2f90/O6eyzmTGHSupmiFPuSPJzq0dy w1e1N6K4Own7ZfnzQjDL2VGksxQo7xi11ar2LeLF4cBl7qWp92nF7hAxfZyPmxg0Joub9CkH4 2qBTpO51JRXrvw+y4pML9FLnQc40ZOGTpIDMs8RDLHMR1nt4Ox6Gqxy6jno7fKrpIRR32Xh4B b/1XMTz6/9nDubIeTnWIyc84JmGl9/N6uS+htkOxdTTbC2h4wVMXmPqERPEIv7kh9VGoVddww GdqPRmP X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jul 2018 14:14:37 -0000 Sorry for the delay. I moved to a different office and could not focus on this issue last week. I tested all of the hardware with different drivers and firmware on Linux to make sure this is not a hardware problem: * Firmware 09.00.101.00 + Driver 26.000.00.00 (compiled) -> GOOD * Firmware 09.00.101.00 + Driver 12.100.00.00 (default kernel) -> GOOD * Firmware 16.00.01.00 + Driver 26.000.00.00 -> BAD (42 out of 44 disks after reconnect) * Firmware 16.00.01.00 + Driver 12.100.00.00 -> BAD (42 out of 44 disks after reconnect) I tested a different HBA with an old firmware as well and there were no issues. Only with the latest FW disks are missing after a reconnect with the error "mpt3sas_cm0: "device is not present handle" I don't know yet how different Firmware behaves between version 09.00.000.00 and 16... Additional Info/Changes: * Upgraded testsystem to 11.2 as suggested in the mailing list. -> No Change * "camcontrol rescan all" removes the devices that are still present after the cable has been removed. "camcontrol devlist -v" does not show them anymore Setting the driver "use_phy_num" to 0 and using the clearDPM script between connects does not help. In fact I do not see a different behavior at all? I reflashed the controller multiple times and erased everything except the "manufacturing" area to make sure that no previous settings are kept. The only thing I know that "fixes" the missing drives is to reboot the server. A (similar?) problem also occurs once I start the server with all 6 disk shelves (11 backplanes, 17 expanders, 200+ disks). Everything comes up properly with 5 shelves, once I offline connect the 6th shelve, then some random disks are missing and I cannot longer import the ZFS pool. The following logs were collected with the very old FW 09.00.101.00 that worked on Linux. Logs: https://www.dropbox.com/s/6nw88rt6ajh713s/freebsd_sas3.zip?dl=0 best regards, Oliver On 07/12/2018 03:38 PM, Ken Merry wrote: > >> On Jul 12, 2018, at 6:00 AM, Oliver Sech wrote: >> >> On 07/11/2018 10:35 PM, Ken Merry wrote: >>> Oliver, what happens when you try to do I/O to the devices that don’t go away after you pull the cable? Does that cause the devices to go away? >> >> I tried to 'dd if=/dev/daX of=/dev/null bs=1k count=1' and at least the "da" device disappears. > > Ok, that’s good. Can you send the dmesg output and check with ‘camcontrol devlist -v’ to make sure the device has fully gone away? > > The reason I ask is that I have spent lots of time over the years debugging device arrival and departure problems in CAM, GEOM and devfs, and I want to make sure we aren’t running into any non-SAS related problems. > >> >>> Looking at the mprutil output, it also shows the devices sticking around from the adapter’s standpoint. >>> >>> You can also try a ‘camcontrol rescan all’ or a ‘camcontrol rescan N’ (where N is the scbus number shown by ‘camcontrol devlist -v’). That will do some basic probes for each of the devices and should in theory cause them to go away if they aren’t accessible. >>> >>> It seems like the adapter may not be recognizing that the devices in question have gone. >> >> >> I'm pretty sure that I tried this 'camcontrol rescan all' a few times. While I not sure anymore if that cleans up the non-working devices, I'm sure that no new devices were added. > > If doing a read from the device with dd makes it go away, ‘camcontrol rescan all’ should make it go away as well. It sends command to every device, and if the mpr(4) driver tells CAM the drive is no longer there, it’ll get removed. > > If it doesn’t cause the device to get removed (and the rescan doesn’t hang), it means that you’re getting a response from a device that is no longer physically connected to the machine, which is impossible with SAS. > >> >> Unfortunately I haven't gotten yet to Steves 'clear controller mapping' script but I did a few other things: > > Steve’s email made it sound like he was going to send it. I just sent it to you separately. > >> * The last time I tried to upgrade the firmware I had all sorts of problems. "sas3flash" reported bad checksums while flashing some of the files. >> So I reflashed both controllers with the DOS version of sas3flash. This was basically a challenge in itself because the DOS version of this utility does not seem to run on computers of this decade. (ERROR: Failed to initialize PAL. Exiting program.) >> The equivalent sas3flash.EFI version seems to be out of date and caused the checksum problems described before. >> (This time I wiped them before flashing with "sas3flash -o -e 6”.) > > That is unfortunate…perhaps Steve has some insight. > >> >> * I tried to change mpr tuneable "use_phy_num" after that but this has not improved the situation. I will retry and collect logs with Steves script. > > Changed it to what? I think it defaults to 1. Did you try 0? > >> * I retried with the latest "mpr.ko" from the broadcom download page. (Same problems, no "use_phy_num" tuneable.) >> >> * I retested this hardware with Linux (4.15 and 4.17) >> ** Some shelves could be replugged reliably (ie: 45 disks show up, 45 disks disappear, 45 disks reappear) >> ** The newest shelf 2 disks were missing after the replugging (ie: 44 disks show up, 44 disks disappear, 42 disks reappear) (kernel log mpt3sas_cm0: "device is not present handle) >> >> * I tired a different controller >> ** So far I used a Broadcom LSI SAS 9305-16e (Controller: SAS3216) (Firmware 16.00.01.00 or 15.00.00.00) >> ** Yesterday I switched to a new fresh out-of-the-box Broadcom LSI 9305-24i (Controller: SAS3224) (Firmware 09.00.00.00 (or something similar with 09*)) >> With the new controller everything seems work on Linux. It might be the old Firmware?... >> It is better with the new controller on FreeBSD in that sense that I at least get one out of two /dev/sesX devices back. But disks are still missing and are not getting completely cleaned up… > > It does sound a bit like a mapping table problem. Clearing it might help, we’ll see. > >> This whole thing is a bit frustrating, especially since up until now I thought that HBAs are kind of "connect and forget" devices. Next step is to set up a separate test environment and try to get it to work there. I will keep you updated and try provide log for all FreeBSD related problems. > > Thanks for debugging this. Unfortunately there are a number of ways it can go wrong. The mapping code has been the source of some problems, sometimes enclosure vendors do the wrong thing, and sometimes there are other bugs. > > Ken > From owner-freebsd-scsi@freebsd.org Tue Jul 24 18:22:49 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 69CCB1052B89 for ; Tue, 24 Jul 2018 18:22:49 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D3EC07A743 for ; Tue, 24 Jul 2018 18:22:48 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from [10.12.22.246] ([193.170.152.64]) by mail.gmx.com (mrgmx102 [212.227.17.168]) with ESMTPSA (Nemesis) id 0MdXSC-1fN4E33o2m-00POWk for ; Tue, 24 Jul 2018 20:22:43 +0200 Subject: Re: problems with SAS JBODs 2 From: Oliver Sech To: FreeBSD-scsi References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> <9e0bf18f-0689-b2a0-1da4-b70c497b2f14@gmx.net> <7C1E630B-65AD-4FE8-BFDF-F13068070B5E@freebsd.org> <6e0b8652-f227-271e-aeb4-a868ba6b90e2@gmx.net> Message-ID: <530b3e8e-4d76-e601-dd74-0ab6a06ebe25@gmx.net> Date: Tue, 24 Jul 2018 20:22:43 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <6e0b8652-f227-271e-aeb4-a868ba6b90e2@gmx.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K1:dEujqDEzzym440JxjQuZt2J1Jx27q/6XFrdkRHL8wLxE1Y5oZ9g qcNO+ptDaQNYxfGLUP0MjZ9rDfPsrUeimhjIabKI/gN0TYJcaC3xRN35d7iGwIKVVIdlN0P fto3ZmP2WtWnpEfOlFIk9tj7vplTTMxag/9RxtdSgIv4nXR+wLtjlAP7Jt29g91G2C2b4zK upFuTWTNmVPasdgc8UlQw== X-UI-Out-Filterresults: notjunk:1;V01:K0:/JP3FJjI31k=:rq/H+3gb09JRBzQpSbnh4p vg024Vl41sq2vd34xUHC5tUfMsN7nR6aIpWwYk3OXQJnsRLPa/dZ4kbVh++xdXDmBmyfNJ937 IDCQci5qlLTNyIg//6on0GiwoVaGNEO/SclL5PsmLTj2sx+LKF+A96FM4YmB8loUkGFofuact wS6yqohO/TaNS2MmYpXCw4PbMZbRVyComBtYaUYWlx6kJJjvJesMI2eDoYS+0FcXUnuUeXnuu dl37e93tLTJmZSTjsmVUYQbWvHmdVVeY1xCdwXCot622jV67cH75RZaro3Sn1S8LqCRa0Rrev B2VgxBv4yg4njC0DtplKnwPJWq2eI/KRAcHmUrzj4RI6Uzdi9YhED61at/6xkKmen3eZvpWOd qzDvfNH+oFfr00xQH/QN4kmsISrpS42jV5vhWzVSGtJad16KFYY0dMGOkfb4dH4GXkW0/1ACX XCa6jPDWVmcF05m8h2eI8FqUsgDmfVohnpc7/tQQqnPR5UzA7eHz/95Shcz5g912am00sxjc3 +3TanRSWHmGQ3pKtoAHv7sHajXLqYqqQkFKxeBIaR4POqxghH0nsjaGNvNxxAH8qs/3icKGKo pm/SrZh9b/lnKFb6UxUG67VoJlsyiLZ/fq0CoCmMKlwk7Eve9R7Trmj2BczDcEBh8Fkh58y+y UUy+If5PgJ8myg6pqXilLgk2ma7y0VjA1ci65iEnTWbb47Lnt4Ng2IRTXMKGtTJ3gYsvqamie vO9A3A4wdeb4ZiG8W1d9xXtzWIrNVBn+nbto1yXigmr6tRlgTwfSVZwRkwcoT/7KhQeIOTIpu 5xCmAbZ X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Jul 2018 18:22:49 -0000 update 2: I continued to test with more and different hardware. tested with a LSI SAS9207-8e HBA: * after disconnect all devices properly disappear /dev/daX /dev/ses no rescans or writing necessary * no more targets in mpsutil (not mprutil) * after reconnect all disks and all ses devs appear! tested with hardware raid LSI SAS 9286CV-8e * no problems with the shelf/sas in different configurations * switching the controller and importing configuration works reliably So far I think there is a problem with the mpr driver and I'm quite confident that it does affect other people. With a simple configuration is probably not immediately noticeable as everything seems to work after the first connect/boot. It probably gets scarier for people with multipathing and big SAS chains I guess... I will downgrade to SAS2 HBAs shortly as I'm running out of space. If there is anything I can help with while I still have hardware in the lab let me know. Oliver On 07/23/2018 04:14 PM, Oliver Sech wrote: > Sorry for the delay. I moved to a different office and could not focus on this issue last week. > > I tested all of the hardware with different drivers and firmware on Linux to make sure this is not a hardware problem: > * Firmware 09.00.101.00 + Driver 26.000.00.00 (compiled) -> GOOD > * Firmware 09.00.101.00 + Driver 12.100.00.00 (default kernel) -> GOOD > * Firmware 16.00.01.00 + Driver 26.000.00.00 -> BAD (42 out of 44 disks after reconnect) > * Firmware 16.00.01.00 + Driver 12.100.00.00 -> BAD (42 out of 44 disks after reconnect) > > I tested a different HBA with an old firmware as well and there were no issues. Only with the latest FW disks are missing after a reconnect with the error "mpt3sas_cm0: "device is not present handle" > I don't know yet how different Firmware behaves between version 09.00.000.00 and 16... > > Additional Info/Changes: > * Upgraded testsystem to 11.2 as suggested in the mailing list. -> No Change > * "camcontrol rescan all" removes the devices that are still present after the cable has been removed. "camcontrol devlist -v" does not show them anymore > > > Setting the driver "use_phy_num" to 0 and using the clearDPM script between connects does not help. In fact I do not see a different behavior at all? > I reflashed the controller multiple times and erased everything except the "manufacturing" area to make sure that no previous settings are kept. > The only thing I know that "fixes" the missing drives is to reboot the server. > > A (similar?) problem also occurs once I start the server with all 6 disk shelves (11 backplanes, 17 expanders, 200+ disks). Everything comes up properly with 5 shelves, once I offline connect the 6th shelve, then some random disks are missing and I cannot longer import the ZFS pool. > > The following logs were collected with the very old FW 09.00.101.00 that worked on Linux. > Logs: https://www.dropbox.com/s/6nw88rt6ajh713s/freebsd_sas3.zip?dl=0 > > best regards, > Oliver > > On 07/12/2018 03:38 PM, Ken Merry wrote: >> >>> On Jul 12, 2018, at 6:00 AM, Oliver Sech wrote: >>> >>> On 07/11/2018 10:35 PM, Ken Merry wrote: >>>> Oliver, what happens when you try to do I/O to the devices that don’t go away after you pull the cable? Does that cause the devices to go away? >>> >>> I tried to 'dd if=/dev/daX of=/dev/null bs=1k count=1' and at least the "da" device disappears. >> >> Ok, that’s good. Can you send the dmesg output and check with ‘camcontrol devlist -v’ to make sure the device has fully gone away? >> >> The reason I ask is that I have spent lots of time over the years debugging device arrival and departure problems in CAM, GEOM and devfs, and I want to make sure we aren’t running into any non-SAS related problems. >> >>> >>>> Looking at the mprutil output, it also shows the devices sticking around from the adapter’s standpoint. >>>> >>>> You can also try a ‘camcontrol rescan all’ or a ‘camcontrol rescan N’ (where N is the scbus number shown by ‘camcontrol devlist -v’). That will do some basic probes for each of the devices and should in theory cause them to go away if they aren’t accessible. >>>> >>>> It seems like the adapter may not be recognizing that the devices in question have gone. >>> >>> >>> I'm pretty sure that I tried this 'camcontrol rescan all' a few times. While I not sure anymore if that cleans up the non-working devices, I'm sure that no new devices were added. >> >> If doing a read from the device with dd makes it go away, ‘camcontrol rescan all’ should make it go away as well. It sends command to every device, and if the mpr(4) driver tells CAM the drive is no longer there, it’ll get removed. >> >> If it doesn’t cause the device to get removed (and the rescan doesn’t hang), it means that you’re getting a response from a device that is no longer physically connected to the machine, which is impossible with SAS. >> >>> >>> Unfortunately I haven't gotten yet to Steves 'clear controller mapping' script but I did a few other things: >> >> Steve’s email made it sound like he was going to send it. I just sent it to you separately. >> >>> * The last time I tried to upgrade the firmware I had all sorts of problems. "sas3flash" reported bad checksums while flashing some of the files. >>> So I reflashed both controllers with the DOS version of sas3flash. This was basically a challenge in itself because the DOS version of this utility does not seem to run on computers of this decade. (ERROR: Failed to initialize PAL. Exiting program.) >>> The equivalent sas3flash.EFI version seems to be out of date and caused the checksum problems described before. >>> (This time I wiped them before flashing with "sas3flash -o -e 6”.) >> >> That is unfortunate…perhaps Steve has some insight. >> >>> >>> * I tried to change mpr tuneable "use_phy_num" after that but this has not improved the situation. I will retry and collect logs with Steves script. >> >> Changed it to what? I think it defaults to 1. Did you try 0? >> >>> * I retried with the latest "mpr.ko" from the broadcom download page. (Same problems, no "use_phy_num" tuneable.) >>> >>> * I retested this hardware with Linux (4.15 and 4.17) >>> ** Some shelves could be replugged reliably (ie: 45 disks show up, 45 disks disappear, 45 disks reappear) >>> ** The newest shelf 2 disks were missing after the replugging (ie: 44 disks show up, 44 disks disappear, 42 disks reappear) (kernel log mpt3sas_cm0: "device is not present handle) >>> >>> * I tired a different controller >>> ** So far I used a Broadcom LSI SAS 9305-16e (Controller: SAS3216) (Firmware 16.00.01.00 or 15.00.00.00) >>> ** Yesterday I switched to a new fresh out-of-the-box Broadcom LSI 9305-24i (Controller: SAS3224) (Firmware 09.00.00.00 (or something similar with 09*)) >>> With the new controller everything seems work on Linux. It might be the old Firmware?... >>> It is better with the new controller on FreeBSD in that sense that I at least get one out of two /dev/sesX devices back. But disks are still missing and are not getting completely cleaned up… >> >> It does sound a bit like a mapping table problem. Clearing it might help, we’ll see. >> >>> This whole thing is a bit frustrating, especially since up until now I thought that HBAs are kind of "connect and forget" devices. Next step is to set up a separate test environment and try to get it to work there. I will keep you updated and try provide log for all FreeBSD related problems. >> >> Thanks for debugging this. Unfortunately there are a number of ways it can go wrong. The mapping code has been the source of some problems, sometimes enclosure vendors do the wrong thing, and sometimes there are other bugs. >> >> Ken >> > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > From owner-freebsd-scsi@freebsd.org Tue Jul 24 20:22:31 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0A902105533B for ; Tue, 24 Jul 2018 20:22:31 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-io0-x230.google.com (mail-io0-x230.google.com [IPv6:2607:f8b0:4001:c06::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A072F7E84B for ; Tue, 24 Jul 2018 20:22:30 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-io0-x230.google.com with SMTP id z20-v6so4521991iol.0 for ; Tue, 24 Jul 2018 13:22:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:content-transfer-encoding; bh=2kiYQHUgtFOnlpuQu1iPtD0N/b6gybjEAiIMIEx4Xy0=; b=IGd0kTKABNGsJD+Zo1gdm+jkqVrzzY7mrXZSGdJYpZ2aoCzAkFtwnv2wb3LsZkPF2C Uj71sCaMBizVjF4wI9GXawXlbxWApZjV57U7bUXCqR7ipoBbh/eidSTu2bApegDK45E5 qveAJcr66hIqIYgKNFT7X9fKNuot+Dg3T0G6CZoQ2Y5W3J9L9yfeY1HdlO2ydm5Kva9X VBCfqz5ADGH/gjr0Sakg/4ifr/E2YbCQDtRedT8BRdEIB0hW4C4ES52I05eNFVUomoX7 XHzdewx3JOPVvh8s3W4/SfkRfnTMEiS/WU4Spw8mCTauAUfiN4rQhzgSiBr6emYdsy4A 4iOg== X-Gm-Message-State: AOUpUlF7psXJv3JrEdF6CrWUJ/k+RGh5zufyoQ7ISWrGTFLsSbpE328A zQ0A5fMeJkKqPP3jO33xe64KiX7dHurbAJ6fEoZ7pA== X-Google-Smtp-Source: AAOMgpfui4K6MTV9Og2CM1hj8OCgoecSdmzF92+5WErCS4WVBIQtEwwwZqEVi1nolB7X4XiKgsPQFE0s3BXryqzwjP0= X-Received: by 2002:a6b:be83:: with SMTP id o125-v6mr14523089iof.173.1532463749895; Tue, 24 Jul 2018 13:22:29 -0700 (PDT) From: Stephen Mcconnell References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> <9e0bf18f-0689-b2a0-1da4-b70c497b2f14@gmx.net> <7C1E630B-65AD-4FE8-BFDF-F13068070B5E@freebsd.org> <6e0b8652-f227-271e-aeb4-a868ba6b90e2@gmx.net> <530b3e8e-4d76-e601-dd74-0ab6a06ebe25@gmx.net> In-Reply-To: <530b3e8e-4d76-e601-dd74-0ab6a06ebe25@gmx.net> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQHJZ/UmTT9Y1rodqvzH7TRwbPT2YALnpLW+Ap4aqgkCOUyURQHo0+HHAeMuJHcCKy6uYAGFltueAuXIYEwBeEFIMgGGNgQBpAtFWGA= Date: Tue, 24 Jul 2018 14:22:28 -0600 Message-ID: <0f26466617df38fd998dc87948b27273@mail.gmail.com> Subject: RE: problems with SAS JBODs 2 To: Oliver Sech , FreeBSD-scsi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Jul 2018 20:22:31 -0000 Oliver, can you try changing the mapping mode on the controller? I think you're using Enclosure/Slot Mapping and I want to see what happens with Device Persistent Mapping. To do that, follow these steps: 1. Run Ken=E2=80=99s script to clear the DPM entries 2. Use LSIUtil to change the mapping mode in IOC Page 8. Command 9, Page Type 1, Page Number 8. If you see 0000002 at offset 0x0C you're using Enclosure/Slot Mapping and I'd like you to change this. You will be asked i= f you want to make changes. Select =E2=80=98yes=E2=80=99 and then change offs= et 0x0C to 00000001 (you might have to type C instead of 0x0C for the offset). Just us= e the default setting to change NVRAM. 3. Reboot and see what happens and let me know how it goes. Steve > -----Original Message----- > From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > scsi@freebsd.org] On Behalf Of Oliver Sech > Sent: Tuesday, July 24, 2018 12:23 PM > To: FreeBSD-scsi > Subject: Re: problems with SAS JBODs 2 > > update 2: I continued to test with more and different hardware. > > tested with a LSI SAS9207-8e HBA: > * after disconnect all devices properly disappear /dev/daX /dev/ses > no rescans or writing necessary > * no more targets in mpsutil (not mprutil) > * after reconnect all disks and all ses devs appear! > > tested with hardware raid LSI SAS 9286CV-8e > * no problems with the shelf/sas in different configurations > * switching the controller and importing configuration works reliably > > So far I think there is a problem with the mpr driver and I'm quite > confident > that it does affect other people. > With a simple configuration is probably not immediately noticeable as > everything seems to work after the first connect/boot. > It probably gets scarier for people with multipathing and big SAS chains = I > guess... > > I will downgrade to SAS2 HBAs shortly as I'm running out of space. If > there is > anything I can help with while I still have hardware in the lab let me > know. > > Oliver > > On 07/23/2018 04:14 PM, Oliver Sech wrote: > > Sorry for the delay. I moved to a different office and could not focus > > on > this issue last week. > > > > I tested all of the hardware with different drivers and firmware on > > Linux to > make sure this is not a hardware problem: > > * Firmware 09.00.101.00 + Driver 26.000.00.00 (compiled) -> GOOD > > * Firmware 09.00.101.00 + Driver 12.100.00.00 (default kernel) -> GOOD > > * Firmware 16.00.01.00 + Driver 26.000.00.00 -> BAD (42 out of 44 disk= s > after reconnect) > > * Firmware 16.00.01.00 + Driver 12.100.00.00 -> BAD (42 out of 44 disk= s > after reconnect) > > > > I tested a different HBA with an old firmware as well and there were no > issues. Only with the latest FW disks are missing after a reconnect with > the > error "mpt3sas_cm0: "device is not present handle" > > I don't know yet how different Firmware behaves between version > 09.00.000.00 and 16... > > > > Additional Info/Changes: > > * Upgraded testsystem to 11.2 as suggested in the mailing list. -> No > Change > > * "camcontrol rescan all" removes the devices that are still present > > after > the cable has been removed. "camcontrol devlist -v" does not show them > anymore > > > > > > Setting the driver "use_phy_num" to 0 and using the clearDPM script > between connects does not help. In fact I do not see a different behavior > at > all? > > I reflashed the controller multiple times and erased everything except > > the > "manufacturing" area to make sure that no previous settings are kept. > > The only thing I know that "fixes" the missing drives is to reboot the > > server. > > > > A (similar?) problem also occurs once I start the server with all 6 dis= k > shelves (11 backplanes, 17 expanders, 200+ disks). Everything comes up > properly with 5 shelves, once I offline connect the 6th shelve, then some > random disks are missing and I cannot longer import the ZFS pool. > > > > The following logs were collected with the very old FW 09.00.101.00 tha= t > worked on Linux. > > Logs: https://www.dropbox.com/s/6nw88rt6ajh713s/freebsd_sas3.zip?dl=3D0 > > > > best regards, > > Oliver > > > > On 07/12/2018 03:38 PM, Ken Merry wrote: > >> > >>> On Jul 12, 2018, at 6:00 AM, Oliver Sech > wrote: > >>> > >>> On 07/11/2018 10:35 PM, Ken Merry wrote: > >>>> Oliver, what happens when you try to do I/O to the devices that don= =E2=80=99t > go away after you pull the cable? Does that cause the devices to go away= ? > >>> > >>> I tried to 'dd if=3D/dev/daX of=3D/dev/null bs=3D1k count=3D1' and at= least > >>> the > "da" device disappears. > >> > >> Ok, that=E2=80=99s good. Can you send the dmesg output and check with > =E2=80=98camcontrol devlist -v=E2=80=99 to make sure the device has fully= gone away? > >> > >> The reason I ask is that I have spent lots of time over the years > >> debugging > device arrival and departure problems in CAM, GEOM and devfs, and I want > to make sure we aren=E2=80=99t running into any non-SAS related problems. > >> > >>> > >>>> Looking at the mprutil output, it also shows the devices sticking > >>>> around > from the adapter=E2=80=99s standpoint. > >>>> > >>>> You can also try a =E2=80=98camcontrol rescan all=E2=80=99 or a =E2= =80=98camcontrol rescan N=E2=80=99 > (where N is the scbus number shown by =E2=80=98camcontrol devlist -v=E2= =80=99). That will > do > some basic probes for each of the devices and should in theory cause them > to go away if they aren=E2=80=99t accessible. > >>>> > >>>> It seems like the adapter may not be recognizing that the devices in > question have gone. > >>> > >>> > >>> I'm pretty sure that I tried this 'camcontrol rescan all' a few times= . > >>> While > I not sure anymore if that cleans up the non-working devices, I'm sure > that > no new devices were added. > >> > >> If doing a read from the device with dd makes it go away, =E2=80=98cam= control > rescan all=E2=80=99 should make it go away as well. It sends command to = every > device, and if the mpr(4) driver tells CAM the drive is no longer there, > it=E2=80=99ll get > removed. > >> > >> If it doesn=E2=80=99t cause the device to get removed (and the rescan = doesn=E2=80=99t > hang), it means that you=E2=80=99re getting a response from a device that= is no > longer physically connected to the machine, which is impossible with SAS. > >> > >>> > >>> Unfortunately I haven't gotten yet to Steves 'clear controller > >>> mapping' > script but I did a few other things: > >> > >> Steve=E2=80=99s email made it sound like he was going to send it. I j= ust sent > >> it to > you separately. > >> > >>> * The last time I tried to upgrade the firmware I had all sorts of > problems. "sas3flash" reported bad checksums while flashing some of the > files. > >>> So I reflashed both controllers with the DOS version of sas3flash. > >>> This > was basically a challenge in itself because the DOS version of this > utility does > not seem to run on computers of this decade. (ERROR: Failed to initializ= e > PAL. Exiting program.) > >>> The equivalent sas3flash.EFI version seems to be out of date and > >>> caused > the checksum problems described before. > >>> (This time I wiped them before flashing with "sas3flash -o -e 6=E2=80= =9D.) > >> > >> That is unfortunate=E2=80=A6perhaps Steve has some insight. > >> > >>> > >>> * I tried to change mpr tuneable "use_phy_num" after that but this ha= s > not improved the situation. I will retry and collect logs with Steves > script. > >> > >> Changed it to what? I think it defaults to 1. Did you try 0? > >> > >>> * I retried with the latest "mpr.ko" from the broadcom download page. > (Same problems, no "use_phy_num" tuneable.) > >>> > >>> * I retested this hardware with Linux (4.15 and 4.17) > >>> ** Some shelves could be replugged reliably (ie: 45 disks show up, 45 > disks disappear, 45 disks reappear) > >>> ** The newest shelf 2 disks were missing after the replugging (ie: 44 > disks show up, 44 disks disappear, 42 disks reappear) (kernel log > mpt3sas_cm0: "device is not present handle) > >>> > >>> * I tired a different controller > >>> ** So far I used a Broadcom LSI SAS 9305-16e (Controller: SAS3216) > (Firmware 16.00.01.00 or 15.00.00.00) > >>> ** Yesterday I switched to a new fresh out-of-the-box Broadcom LSI > 9305-24i (Controller: SAS3224) (Firmware 09.00.00.00 (or something simila= r > with 09*)) > >>> With the new controller everything seems work on Linux. It might be > >>> the > old Firmware?... > >>> It is better with the new controller on FreeBSD in that sense that I > >>> at > least get one out of two /dev/sesX devices back. But disks are still > missing > and are not getting completely cleaned up=E2=80=A6 > >> > >> It does sound a bit like a mapping table problem. Clearing it might > >> help, > we=E2=80=99ll see. > >> > >>> This whole thing is a bit frustrating, especially since up until now = I > thought that HBAs are kind of "connect and forget" devices. Next step is > to > set up a separate test environment and try to get it to work there. I wil= l > keep > you updated and try provide log for all FreeBSD related problems. > >> > >> Thanks for debugging this. Unfortunately there are a number of ways i= t > can go wrong. The mapping code has been the source of some problems, > sometimes enclosure vendors do the wrong thing, and sometimes there are > other bugs. > >> > >> Ken > >> > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Wed Jul 25 10:24:27 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2B6531046D5E for ; Wed, 25 Jul 2018 10:24:27 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 842387860F for ; Wed, 25 Jul 2018 10:24:26 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from [10.12.22.246] ([193.170.152.64]) by mail.gmx.com (mrgmx101 [212.227.17.168]) with ESMTPSA (Nemesis) id 0M0cs6-1fzNHP3oCs-00unjr; Wed, 25 Jul 2018 12:24:21 +0200 Subject: Re: problems with SAS JBODs 2 To: Stephen Mcconnell , FreeBSD-scsi References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> <9e0bf18f-0689-b2a0-1da4-b70c497b2f14@gmx.net> <7C1E630B-65AD-4FE8-BFDF-F13068070B5E@freebsd.org> <6e0b8652-f227-271e-aeb4-a868ba6b90e2@gmx.net> <530b3e8e-4d76-e601-dd74-0ab6a06ebe25@gmx.net> <0f26466617df38fd998dc87948b27273@mail.gmail.com> From: Oliver Sech Message-ID: <77b55ca6-25ce-3b26-e2f6-b0702a49ab28@gmx.net> Date: Wed, 25 Jul 2018 12:24:21 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <0f26466617df38fd998dc87948b27273@mail.gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K1:OE1khuBCBjHPQT9lCP+GzBbd7KNBWvUyWcrVuvvcBmhKn//rg4A 8BJyOTe9jjwhA/P4fkZVUNLdtu7GFKxKF1Qket/1+F15uvBXpk+CdwcxtmRctm39d3JZGTd 7h5g+V2ljh8+AnPQPzl12taCJ1KtMR3hxfbqDiyz+bziODSm59tXad6Kzh34PH6stdJ85KY ouz3gksY9kZQs4CxfaWGQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:QQWzNYxvuqc=:tF7xFftDOa7/K7zESDxlyw PvH703pQ3Ei89ZQZsBL+IwRTuiowkdEW9bwvSmsVQS9x+f/5RZ5ubk2crUbLdRLZuiq9Ov2X6 ULOyRU0j6XQCIuEiQmbHDiBFWgSl9MAVMQNlyopXFCP/bHWKhWAke+bSPQNeKI9U1CdY2+zPg J1Doi/xQa98QOoLOy8s2SGFZBzaIRX8hAVSDTUqYsm7o3ew492Bc9q1r0THn/6G/zBN6YZ2Af k9NY7QZPeT3Tb3HKFzAioua5Q9uQH3UJHsJwo4u21+4PQZ9N1nSKgQwxHHRu+vNHyrnb5w6R7 jcPZuAuccH5ABGVRJgzvltiME1+Yf9LKVwXM9k2tfPqY50D92G/ges4xfG/bspm6KG8D7EZ4t TW+vKIxAWlkyVABUP6ktDPNFaV+As36ElKaZcyb8/BBZd4FLdN8TvNoIht2uDhboc5S24tw0o brZR+cpb4YCNR9BUjuX1XlL+U6ePAivuyKcIyaofhEMreLKe4fF5LfIoLliVVvFJwBt0eM/uO elroXxyUDeprsZ12+iGcP4iYOMGmP/txAkibDCXeC3NpEEAIcx/3F1hEyzneNomt2qc+U1BfW 5w6xadFQt/4Tv+yq572KzgpAXlxhub+NPmsANIIiFDUNToioWoMKWoEDa1n3VSTJ+TgDzBA5D g1/YwAoNCdYy8dZdgHta+zWcTnQt9ypFAoUnlpPlUWjkLcBKAh97bv/azSRa2S2VVJY/iOkZx sO9IhTvpB+Hfwn/RaCJWCGA6xnrnOnqI/OChsbx/XCmOkeFQw2ZeYqPi1qjjEa3u7DL/mW7yH HLIjJYr X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jul 2018 10:24:27 -0000 I ran the clear_dpm.sh script and changed the value you suggested. Rebooted and retested. As far as I can tell there is no difference. I tried the menu option (99. Reset port) in lsiutil and this helps with missing devices. After reseting the port I get all my disks and ses devs again. Read NVRAM or current values? [0=NVRAM, 1=Current, default is 0] 0000 : 21080600 0004 : 00000001 0008 : 00180080 000c : 00000001 0010 : 00000000 0014 : 00000000 On 07/24/2018 10:22 PM, Stephen Mcconnell wrote: > Oliver, can you try changing the mapping mode on the controller? I think > you're using Enclosure/Slot Mapping and I want to see what happens with > Device Persistent Mapping. To do that, follow these steps: > 1. Run Ken’s script to clear the DPM entries > 2. Use LSIUtil to change the mapping mode in IOC Page 8. Command 9, Page > Type 1, Page Number 8. If you see 0000002 at offset 0x0C you're using > Enclosure/Slot Mapping and I'd like you to change this. You will be asked if > you want to make changes. Select ‘yes’ and then change offset 0x0C to > 00000001 (you might have to type C instead of 0x0C for the offset). Just use > the default setting to change NVRAM. > 3. Reboot and see what happens and let me know how it goes. > > > Steve > >> -----Original Message----- >> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- >> scsi@freebsd.org] On Behalf Of Oliver Sech >> Sent: Tuesday, July 24, 2018 12:23 PM >> To: FreeBSD-scsi >> Subject: Re: problems with SAS JBODs 2 >> >> update 2: I continued to test with more and different hardware. >> >> tested with a LSI SAS9207-8e HBA: >> * after disconnect all devices properly disappear /dev/daX /dev/ses >> no rescans or writing necessary >> * no more targets in mpsutil (not mprutil) >> * after reconnect all disks and all ses devs appear! >> >> tested with hardware raid LSI SAS 9286CV-8e >> * no problems with the shelf/sas in different configurations >> * switching the controller and importing configuration works reliably >> >> So far I think there is a problem with the mpr driver and I'm quite >> confident >> that it does affect other people. >> With a simple configuration is probably not immediately noticeable as >> everything seems to work after the first connect/boot. >> It probably gets scarier for people with multipathing and big SAS chains I >> guess... >> >> I will downgrade to SAS2 HBAs shortly as I'm running out of space. If >> there is >> anything I can help with while I still have hardware in the lab let me >> know. >> >> Oliver >> >> On 07/23/2018 04:14 PM, Oliver Sech wrote: >>> Sorry for the delay. I moved to a different office and could not focus >>> on >> this issue last week. >>> >>> I tested all of the hardware with different drivers and firmware on >>> Linux to >> make sure this is not a hardware problem: >>> * Firmware 09.00.101.00 + Driver 26.000.00.00 (compiled) -> GOOD >>> * Firmware 09.00.101.00 + Driver 12.100.00.00 (default kernel) -> GOOD >>> * Firmware 16.00.01.00 + Driver 26.000.00.00 -> BAD (42 out of 44 disks >> after reconnect) >>> * Firmware 16.00.01.00 + Driver 12.100.00.00 -> BAD (42 out of 44 disks >> after reconnect) >>> >>> I tested a different HBA with an old firmware as well and there were no >> issues. Only with the latest FW disks are missing after a reconnect with >> the >> error "mpt3sas_cm0: "device is not present handle" >>> I don't know yet how different Firmware behaves between version >> 09.00.000.00 and 16... >>> >>> Additional Info/Changes: >>> * Upgraded testsystem to 11.2 as suggested in the mailing list. -> No >> Change >>> * "camcontrol rescan all" removes the devices that are still present >>> after >> the cable has been removed. "camcontrol devlist -v" does not show them >> anymore >>> >>> >>> Setting the driver "use_phy_num" to 0 and using the clearDPM script >> between connects does not help. In fact I do not see a different behavior >> at >> all? >>> I reflashed the controller multiple times and erased everything except >>> the >> "manufacturing" area to make sure that no previous settings are kept. >>> The only thing I know that "fixes" the missing drives is to reboot the >>> server. >>> >>> A (similar?) problem also occurs once I start the server with all 6 disk >> shelves (11 backplanes, 17 expanders, 200+ disks). Everything comes up >> properly with 5 shelves, once I offline connect the 6th shelve, then some >> random disks are missing and I cannot longer import the ZFS pool. >>> >>> The following logs were collected with the very old FW 09.00.101.00 that >> worked on Linux. >>> Logs: https://www.dropbox.com/s/6nw88rt6ajh713s/freebsd_sas3.zip?dl=0 >>> >>> best regards, >>> Oliver >>> >>> On 07/12/2018 03:38 PM, Ken Merry wrote: >>>> >>>>> On Jul 12, 2018, at 6:00 AM, Oliver Sech >> wrote: >>>>> >>>>> On 07/11/2018 10:35 PM, Ken Merry wrote: >>>>>> Oliver, what happens when you try to do I/O to the devices that don’t >> go away after you pull the cable? Does that cause the devices to go away? >>>>> >>>>> I tried to 'dd if=/dev/daX of=/dev/null bs=1k count=1' and at least >>>>> the >> "da" device disappears. >>>> >>>> Ok, that’s good. Can you send the dmesg output and check with >> ‘camcontrol devlist -v’ to make sure the device has fully gone away? >>>> >>>> The reason I ask is that I have spent lots of time over the years >>>> debugging >> device arrival and departure problems in CAM, GEOM and devfs, and I want >> to make sure we aren’t running into any non-SAS related problems. >>>> >>>>> >>>>>> Looking at the mprutil output, it also shows the devices sticking >>>>>> around >> from the adapter’s standpoint. >>>>>> >>>>>> You can also try a ‘camcontrol rescan all’ or a ‘camcontrol rescan N’ >> (where N is the scbus number shown by ‘camcontrol devlist -v’). That will >> do >> some basic probes for each of the devices and should in theory cause them >> to go away if they aren’t accessible. >>>>>> >>>>>> It seems like the adapter may not be recognizing that the devices in >> question have gone. >>>>> >>>>> >>>>> I'm pretty sure that I tried this 'camcontrol rescan all' a few times. >>>>> While >> I not sure anymore if that cleans up the non-working devices, I'm sure >> that >> no new devices were added. >>>> >>>> If doing a read from the device with dd makes it go away, ‘camcontrol >> rescan all’ should make it go away as well. It sends command to every >> device, and if the mpr(4) driver tells CAM the drive is no longer there, >> it’ll get >> removed. >>>> >>>> If it doesn’t cause the device to get removed (and the rescan doesn’t >> hang), it means that you’re getting a response from a device that is no >> longer physically connected to the machine, which is impossible with SAS. >>>> >>>>> >>>>> Unfortunately I haven't gotten yet to Steves 'clear controller >>>>> mapping' >> script but I did a few other things: >>>> >>>> Steve’s email made it sound like he was going to send it. I just sent >>>> it to >> you separately. >>>> >>>>> * The last time I tried to upgrade the firmware I had all sorts of >> problems. "sas3flash" reported bad checksums while flashing some of the >> files. >>>>> So I reflashed both controllers with the DOS version of sas3flash. >>>>> This >> was basically a challenge in itself because the DOS version of this >> utility does >> not seem to run on computers of this decade. (ERROR: Failed to initialize >> PAL. Exiting program.) >>>>> The equivalent sas3flash.EFI version seems to be out of date and >>>>> caused >> the checksum problems described before. >>>>> (This time I wiped them before flashing with "sas3flash -o -e 6”.) >>>> >>>> That is unfortunate…perhaps Steve has some insight. >>>> >>>>> >>>>> * I tried to change mpr tuneable "use_phy_num" after that but this has >> not improved the situation. I will retry and collect logs with Steves >> script. >>>> >>>> Changed it to what? I think it defaults to 1. Did you try 0? >>>> >>>>> * I retried with the latest "mpr.ko" from the broadcom download page. >> (Same problems, no "use_phy_num" tuneable.) >>>>> >>>>> * I retested this hardware with Linux (4.15 and 4.17) >>>>> ** Some shelves could be replugged reliably (ie: 45 disks show up, 45 >> disks disappear, 45 disks reappear) >>>>> ** The newest shelf 2 disks were missing after the replugging (ie: 44 >> disks show up, 44 disks disappear, 42 disks reappear) (kernel log >> mpt3sas_cm0: "device is not present handle) >>>>> >>>>> * I tired a different controller >>>>> ** So far I used a Broadcom LSI SAS 9305-16e (Controller: SAS3216) >> (Firmware 16.00.01.00 or 15.00.00.00) >>>>> ** Yesterday I switched to a new fresh out-of-the-box Broadcom LSI >> 9305-24i (Controller: SAS3224) (Firmware 09.00.00.00 (or something similar >> with 09*)) >>>>> With the new controller everything seems work on Linux. It might be >>>>> the >> old Firmware?... >>>>> It is better with the new controller on FreeBSD in that sense that I >>>>> at >> least get one out of two /dev/sesX devices back. But disks are still >> missing >> and are not getting completely cleaned up… >>>> >>>> It does sound a bit like a mapping table problem. Clearing it might >>>> help, >> we’ll see. >>>> >>>>> This whole thing is a bit frustrating, especially since up until now I >> thought that HBAs are kind of "connect and forget" devices. Next step is >> to >> set up a separate test environment and try to get it to work there. I will >> keep >> you updated and try provide log for all FreeBSD related problems. >>>> >>>> Thanks for debugging this. Unfortunately there are a number of ways it >> can go wrong. The mapping code has been the source of some problems, >> sometimes enclosure vendors do the wrong thing, and sometimes there are >> other bugs. >>>> >>>> Ken >>>> >>> _______________________________________________ >>> freebsd-scsi@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" >>> >> _______________________________________________ >> freebsd-scsi@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi >> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Wed Jul 25 19:08:59 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8FF27105224A for ; Wed, 25 Jul 2018 19:08:59 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-it0-x22a.google.com (mail-it0-x22a.google.com [IPv6:2607:f8b0:4001:c0b::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 106458A5E2 for ; Wed, 25 Jul 2018 19:08:58 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-it0-x22a.google.com with SMTP id w16-v6so10301726ita.0 for ; Wed, 25 Jul 2018 12:08:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:content-transfer-encoding; bh=k2D4RIM5frPEQcKEIbaT5LKzUMm7+ZELm6zjXUvluE0=; b=eLlhaDIQ7c/kMlVWKHuk78LWrKyh0PXMQACPBlfl2V/Y+qrP/1DsyU4+VwQ0twFbn7 +C4up3DgWBOqTt1mbvFirUHKc6ROCn4K4Buh2pM4Rzl438WNFtBLXczgWTihArSaurnW ZAzE4FhKW62uSniqe20ppBDWv2cNH9+hLpRU1rYiViCDWVqIiBfif3wN6FFuBiDEG/Ug UA1pfuJ6dUlW3nTn9LbPUH8PJthVOk2C316NAHFn3us9tkulXGEX0QYTGqM0qG3JRBR3 /Kq0nt4vSI/p6o6AYyS1vMh48n1aSV4FEY2sBIsFJmBkGT7h86+fI/dp12yWoAXD7m8I fgSA== X-Gm-Message-State: AOUpUlGPUilyv32t7PV2RNynguqlY5wGuxsPNhZz/q8LGg6KeKHNaqjZ ouAGJRAiBPUoHGBFY0qTsWztdJV8T1Z6KVkB2UPCEZB/ X-Google-Smtp-Source: AAOMgpf/7g9AB0c/kDb/isjPvBueNiJns/RyG1ctELDD/11xYneYq//TJO+aDm+MhnGbQL+1wPkL0MSCprLf6Kp8uQo= X-Received: by 2002:a24:4612:: with SMTP id j18-v6mr7461168itb.65.1532545738047; Wed, 25 Jul 2018 12:08:58 -0700 (PDT) From: Stephen Mcconnell References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> <9e0bf18f-0689-b2a0-1da4-b70c497b2f14@gmx.net> <7C1E630B-65AD-4FE8-BFDF-F13068070B5E@freebsd.org> <6e0b8652-f227-271e-aeb4-a868ba6b90e2@gmx.net> <530b3e8e-4d76-e601-dd74-0ab6a06ebe25@gmx.net> <0f26466617df38fd998dc87948b27273@mail.gmail.com> <77b55ca6-25ce-3b26-e2f6-b0702a49ab28@gmx.net> In-Reply-To: <77b55ca6-25ce-3b26-e2f6-b0702a49ab28@gmx.net> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQHJZ/UmTT9Y1rodqvzH7TRwbPT2YALnpLW+Ap4aqgkCOUyURQHo0+HHAeMuJHcCKy6uYAGFltueAuXIYEwBeEFIMgGGNgQBAWwwR+cCkWi7A6Ps1YCw Date: Wed, 25 Jul 2018 13:08:56 -0600 Message-ID: Subject: RE: problems with SAS JBODs 2 To: Oliver Sech , FreeBSD-scsi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jul 2018 19:09:00 -0000 Can you enable Mapping Debugging, then do these steps again and send the logs. If I don't see anything interesting in the logs I might have you turn more debug bits on. So, first set the debug_level to 0x203. What I'm lookin= g for is some indication that the driver is dropping a device or not adding it. It that's not happening at the driver level, something else is causing the problem. You can try setting the Event Debug flag as well, but that might be too overwhelming to capture (debug_level =3D 0x207). Steve > -----Original Message----- > From: Oliver Sech [mailto:crimsonthunder@gmx.net] > Sent: Wednesday, July 25, 2018 4:24 AM > To: Stephen Mcconnell; FreeBSD-scsi > Subject: Re: problems with SAS JBODs 2 > > I ran the clear_dpm.sh script and changed the value you suggested. > Rebooted and retested. As far as I can tell there is no difference. > > I tried the menu option (99. Reset port) in lsiutil and this helps with > missing > devices. After reseting the port I get all my disks and ses devs again. > > Read NVRAM or current values? [0=3DNVRAM, 1=3DCurrent, default is 0] > > 0000 : 21080600 > 0004 : 00000001 > 0008 : 00180080 > 000c : 00000001 > 0010 : 00000000 > 0014 : 00000000 > > On 07/24/2018 10:22 PM, Stephen Mcconnell wrote: > > Oliver, can you try changing the mapping mode on the controller? I thin= k > > you're using Enclosure/Slot Mapping and I want to see what happens with > > Device Persistent Mapping. To do that, follow these steps: > > 1. Run Ken=E2=80=99s script to clear the DPM entries > > 2. Use LSIUtil to change the mapping mode in IOC Page 8. Command 9, > Page > > Type 1, Page Number 8. If you see 0000002 at offset 0x0C you're using > > Enclosure/Slot Mapping and I'd like you to change this. You will be > > asked if > > you want to make changes. Select =E2=80=98yes=E2=80=99 and then change = offset 0x0C to > > 00000001 (you might have to type C instead of 0x0C for the offset). Jus= t > use > > the default setting to change NVRAM. > > 3. Reboot and see what happens and let me know how it goes. > > > > > > Steve > > > >> -----Original Message----- > >> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > >> scsi@freebsd.org] On Behalf Of Oliver Sech > >> Sent: Tuesday, July 24, 2018 12:23 PM > >> To: FreeBSD-scsi > >> Subject: Re: problems with SAS JBODs 2 > >> > >> update 2: I continued to test with more and different hardware. > >> > >> tested with a LSI SAS9207-8e HBA: > >> * after disconnect all devices properly disappear /dev/daX /dev/ses > >> no rescans or writing necessary > >> * no more targets in mpsutil (not mprutil) > >> * after reconnect all disks and all ses devs appear! > >> > >> tested with hardware raid LSI SAS 9286CV-8e > >> * no problems with the shelf/sas in different configurations > >> * switching the controller and importing configuration works reliably > >> > >> So far I think there is a problem with the mpr driver and I'm quite > >> confident > >> that it does affect other people. > >> With a simple configuration is probably not immediately noticeable as > >> everything seems to work after the first connect/boot. > >> It probably gets scarier for people with multipathing and big SAS > >> chains I > >> guess... > >> > >> I will downgrade to SAS2 HBAs shortly as I'm running out of space. If > >> there is > >> anything I can help with while I still have hardware in the lab let me > >> know. > >> > >> Oliver > >> > >> On 07/23/2018 04:14 PM, Oliver Sech wrote: > >>> Sorry for the delay. I moved to a different office and could not focu= s > >>> on > >> this issue last week. > >>> > >>> I tested all of the hardware with different drivers and firmware on > >>> Linux to > >> make sure this is not a hardware problem: > >>> * Firmware 09.00.101.00 + Driver 26.000.00.00 (compiled) -> GOOD > >>> * Firmware 09.00.101.00 + Driver 12.100.00.00 (default kernel) -> GOO= D > >>> * Firmware 16.00.01.00 + Driver 26.000.00.00 -> BAD (42 out of 44 > >>> disks > >> after reconnect) > >>> * Firmware 16.00.01.00 + Driver 12.100.00.00 -> BAD (42 out of 44 > >>> disks > >> after reconnect) > >>> > >>> I tested a different HBA with an old firmware as well and there were > >>> no > >> issues. Only with the latest FW disks are missing after a reconnect > >> with > >> the > >> error "mpt3sas_cm0: "device is not present handle" > >>> I don't know yet how different Firmware behaves between version > >> 09.00.000.00 and 16... > >>> > >>> Additional Info/Changes: > >>> * Upgraded testsystem to 11.2 as suggested in the mailing list. -> No > >> Change > >>> * "camcontrol rescan all" removes the devices that are still present > >>> after > >> the cable has been removed. "camcontrol devlist -v" does not show them > >> anymore > >>> > >>> > >>> Setting the driver "use_phy_num" to 0 and using the clearDPM script > >> between connects does not help. In fact I do not see a different > >> behavior > >> at > >> all? > >>> I reflashed the controller multiple times and erased everything excep= t > >>> the > >> "manufacturing" area to make sure that no previous settings are kept. > >>> The only thing I know that "fixes" the missing drives is to reboot th= e > >>> server. > >>> > >>> A (similar?) problem also occurs once I start the server with all 6 > >>> disk > >> shelves (11 backplanes, 17 expanders, 200+ disks). Everything comes up > >> properly with 5 shelves, once I offline connect the 6th shelve, then > >> some > >> random disks are missing and I cannot longer import the ZFS pool. > >>> > >>> The following logs were collected with the very old FW 09.00.101.00 > >>> that > >> worked on Linux. > >>> Logs: > https://www.dropbox.com/s/6nw88rt6ajh713s/freebsd_sas3.zip?dl=3D0 > >>> > >>> best regards, > >>> Oliver > >>> > >>> On 07/12/2018 03:38 PM, Ken Merry wrote: > >>>> > >>>>> On Jul 12, 2018, at 6:00 AM, Oliver Sech > >> wrote: > >>>>> > >>>>> On 07/11/2018 10:35 PM, Ken Merry wrote: > >>>>>> Oliver, what happens when you try to do I/O to the devices that > don=E2=80=99t > >> go away after you pull the cable? Does that cause the devices to go > away? > >>>>> > >>>>> I tried to 'dd if=3D/dev/daX of=3D/dev/null bs=3D1k count=3D1' and = at least > >>>>> the > >> "da" device disappears. > >>>> > >>>> Ok, that=E2=80=99s good. Can you send the dmesg output and check wi= th > >> =E2=80=98camcontrol devlist -v=E2=80=99 to make sure the device has fu= lly gone away? > >>>> > >>>> The reason I ask is that I have spent lots of time over the years > >>>> debugging > >> device arrival and departure problems in CAM, GEOM and devfs, and I > want > >> to make sure we aren=E2=80=99t running into any non-SAS related proble= ms. > >>>> > >>>>> > >>>>>> Looking at the mprutil output, it also shows the devices sticking > >>>>>> around > >> from the adapter=E2=80=99s standpoint. > >>>>>> > >>>>>> You can also try a =E2=80=98camcontrol rescan all=E2=80=99 or a = =E2=80=98camcontrol rescan > >>>>>> N=E2=80=99 > >> (where N is the scbus number shown by =E2=80=98camcontrol devlist -v= =E2=80=99). That > >> will > >> do > >> some basic probes for each of the devices and should in theory cause > them > >> to go away if they aren=E2=80=99t accessible. > >>>>>> > >>>>>> It seems like the adapter may not be recognizing that the devices > >>>>>> in > >> question have gone. > >>>>> > >>>>> > >>>>> I'm pretty sure that I tried this 'camcontrol rescan all' a few > >>>>> times. > >>>>> While > >> I not sure anymore if that cleans up the non-working devices, I'm sure > >> that > >> no new devices were added. > >>>> > >>>> If doing a read from the device with dd makes it go away, =E2=80=98c= amcontrol > >> rescan all=E2=80=99 should make it go away as well. It sends command = to every > >> device, and if the mpr(4) driver tells CAM the drive is no longer > >> there, > >> it=E2=80=99ll get > >> removed. > >>>> > >>>> If it doesn=E2=80=99t cause the device to get removed (and the resca= n doesn=E2=80=99t > >> hang), it means that you=E2=80=99re getting a response from a device t= hat is no > >> longer physically connected to the machine, which is impossible with > >> SAS. > >>>> > >>>>> > >>>>> Unfortunately I haven't gotten yet to Steves 'clear controller > >>>>> mapping' > >> script but I did a few other things: > >>>> > >>>> Steve=E2=80=99s email made it sound like he was going to send it. I= just > >>>> sent > >>>> it to > >> you separately. > >>>> > >>>>> * The last time I tried to upgrade the firmware I had all sorts of > >> problems. "sas3flash" reported bad checksums while flashing some of th= e > >> files. > >>>>> So I reflashed both controllers with the DOS version of sas3flash. > >>>>> This > >> was basically a challenge in itself because the DOS version of this > >> utility does > >> not seem to run on computers of this decade. (ERROR: Failed to > >> initialize > >> PAL. Exiting program.) > >>>>> The equivalent sas3flash.EFI version seems to be out of date and > >>>>> caused > >> the checksum problems described before. > >>>>> (This time I wiped them before flashing with "sas3flash -o -e 6=E2= =80=9D.) > >>>> > >>>> That is unfortunate=E2=80=A6perhaps Steve has some insight. > >>>> > >>>>> > >>>>> * I tried to change mpr tuneable "use_phy_num" after that but this > has > >> not improved the situation. I will retry and collect logs with Steves > >> script. > >>>> > >>>> Changed it to what? I think it defaults to 1. Did you try 0? > >>>> > >>>>> * I retried with the latest "mpr.ko" from the broadcom download > page. > >> (Same problems, no "use_phy_num" tuneable.) > >>>>> > >>>>> * I retested this hardware with Linux (4.15 and 4.17) > >>>>> ** Some shelves could be replugged reliably (ie: 45 disks show up, > >>>>> 45 > >> disks disappear, 45 disks reappear) > >>>>> ** The newest shelf 2 disks were missing after the replugging (ie: > >>>>> 44 > >> disks show up, 44 disks disappear, 42 disks reappear) (kernel log > >> mpt3sas_cm0: "device is not present handle) > >>>>> > >>>>> * I tired a different controller > >>>>> ** So far I used a Broadcom LSI SAS 9305-16e (Controller: SAS3216) > >> (Firmware 16.00.01.00 or 15.00.00.00) > >>>>> ** Yesterday I switched to a new fresh out-of-the-box Broadcom LSI > >> 9305-24i (Controller: SAS3224) (Firmware 09.00.00.00 (or something > similar > >> with 09*)) > >>>>> With the new controller everything seems work on Linux. It might be > >>>>> the > >> old Firmware?... > >>>>> It is better with the new controller on FreeBSD in that sense that = I > >>>>> at > >> least get one out of two /dev/sesX devices back. But disks are still > >> missing > >> and are not getting completely cleaned up=E2=80=A6 > >>>> > >>>> It does sound a bit like a mapping table problem. Clearing it might > >>>> help, > >> we=E2=80=99ll see. > >>>> > >>>>> This whole thing is a bit frustrating, especially since up until no= w > >>>>> I > >> thought that HBAs are kind of "connect and forget" devices. Next step > >> is > >> to > >> set up a separate test environment and try to get it to work there. I > >> will > >> keep > >> you updated and try provide log for all FreeBSD related problems. > >>>> > >>>> Thanks for debugging this. Unfortunately there are a number of ways > >>>> it > >> can go wrong. The mapping code has been the source of some problems, > >> sometimes enclosure vendors do the wrong thing, and sometimes there > are > >> other bugs. > >>>> > >>>> Ken > >>>> > >>> _______________________________________________ > >>> freebsd-scsi@freebsd.org mailing list > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > >>> To unsubscribe, send any mail to "freebsd-scsi- > unsubscribe@freebsd.org" > >>> > >> _______________________________________________ > >> freebsd-scsi@freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > >> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org= " From owner-freebsd-scsi@freebsd.org Thu Jul 26 12:16:08 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 18733104D8B1 for ; Thu, 26 Jul 2018 12:16:08 +0000 (UTC) (envelope-from jwd@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [96.47.72.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C167F8C2A6 for ; Thu, 26 Jul 2018 12:16:07 +0000 (UTC) (envelope-from jwd@freebsd.org) Received: by freefall.freebsd.org (Postfix, from userid 821) id BA9AD1A16F; Thu, 26 Jul 2018 12:16:07 +0000 (UTC) Date: Thu, 26 Jul 2018 12:16:07 +0000 From: John To: FreeBSD-scsi Subject: SmartPQI utility support? Message-ID: <20180726121607.GA75366@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="zhXaljGHf11kAtnf" Content-Disposition: inline User-Agent: Mutt/1.9.5 (2018-04-13) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jul 2018 12:16:08 -0000 --zhXaljGHf11kAtnf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Folks, I have some new HP systems with SmartPQI cards in them. The smartpqi driver pushed to the tree recently does seem to work correctly. However, I cannot find any sign of a configuration utility like ARCCONF to query/create volumes, etc. Am I missing something obvious? Thanks, John --zhXaljGHf11kAtnf Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQF8BAEBCgBmBQJbWbuFXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQwNDBGOTgxNzM0NzQ3OEFBNDYyODNGQzVC NjI0OTlBMTQyNEY3RjgxAAoJELYkmaFCT3+BQgcIAODD7NTCS3dDyEMLgiGbnG+5 +9opsbaUJJxF8acr1IRn6cyXzIfSMrIvwCG/FjssdfflVG/gAowv/VdFbD3jq8r/ oY2tBZyGQirMhKsGq1mmhA9Jn2cGF172MXrx9ntG7vZmvEKE3lZC7I3nMlBAbwDT q4+U5AiuGGh8DjXeWfpRHd/OMzR8a3jWQptREgUTYoKK+3hWecl5PFu6JmqxHpLQ 6w6OKtPbBFclg7rKLDvE36omF5yH6Wq4eQcWtnLPQARkk+t0ZtPsPZr+e3aaV6vl hKgZ1I+rJdxab3w4oSg6pR93+3fqAbQEnoG4d8G0drTpVMb2T609dbXZHZHkNXg= =Xv1b -----END PGP SIGNATURE----- --zhXaljGHf11kAtnf-- From owner-freebsd-scsi@freebsd.org Thu Jul 26 15:13:08 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1D70610522F6 for ; Thu, 26 Jul 2018 15:13:08 +0000 (UTC) (envelope-from Scott.Benesh@microchip.com) Received: from esa1.microchip.iphmx.com (esa1.microchip.iphmx.com [68.232.147.91]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.microchip.iphmx.com", Issuer "*.microchip.iphmx.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FB6D940D0; Thu, 26 Jul 2018 15:13:07 +0000 (UTC) (envelope-from Scott.Benesh@microchip.com) X-IronPort-AV: E=Sophos;i="5.51,405,1526367600"; d="scan'208";a="17559505" Received: from smtpout.microchip.com (HELO email.microchip.com) ([198.175.253.82]) by esa1.microchip.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 26 Jul 2018 08:11:56 -0700 Received: from NAM01-BY2-obe.outbound.protection.outlook.com (10.10.215.89) by email.microchip.com (10.10.76.107) with Microsoft SMTP Server (TLS) id 14.3.352.0; Thu, 26 Jul 2018 08:11:56 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microchiptechnology.onmicrosoft.com; s=selector1-microchiptechnology-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GK2FCjGdkHtsiAoaQgxts+7OVLYQX4zP43GIlhZxClQ=; b=RPZnLoatD7S0YNGS1ltOdB1FtoxUckW6mHZBjDe0XTK/7el5BifXKVROcHeILizssL7SCIwmxVCvd+sMXZ8nExwyD/MXgNtQmsJssYQ45Nxd7ZIv7UzWM6yjERrzS06azvv+0WCryidPDGG6m7aIoqe9kQBMDVCIm/TH+G4FqdQ= Received: from BN7PR11MB2819.namprd11.prod.outlook.com (52.135.246.146) by BN7PR11MB2771.namprd11.prod.outlook.com (52.135.246.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.973.21; Thu, 26 Jul 2018 15:11:54 +0000 Received: from BN7PR11MB2819.namprd11.prod.outlook.com ([fe80::c852:cec3:d83b:61a3]) by BN7PR11MB2819.namprd11.prod.outlook.com ([fe80::c852:cec3:d83b:61a3%2]) with mapi id 15.20.0973.022; Thu, 26 Jul 2018 15:11:54 +0000 From: To: , Subject: RE: SmartPQI utility support? Thread-Topic: SmartPQI utility support? Thread-Index: AQHUJNqwY3Mklbgh10uZGzRGYMIGCqShnEOQ Date: Thu, 26 Jul 2018 15:11:54 +0000 Message-ID: References: <20180726121607.GA75366@FreeBSD.org> In-Reply-To: <20180726121607.GA75366@FreeBSD.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Scott.Benesh@microchip.com; x-originating-ip: [216.54.225.58] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; BN7PR11MB2771; 6:e9q2Jmzn1MnXggrkQ4/zM4V5wek/dD4itCNbohKrtWKyyAg7li8JhUPKhHAkvugaZv3r6b5V+3FDqQvC7dTpeb88VaGSR+0D7CDmY2P4YzXzo49Bnycntxp3clfdsBIPuMMWLWh2ptwtmyxaBzIxmDqzHykhmQCBbbDYCeK2xvcMIj+nEGff1i8YFBahSgzt37vRTIYbMeYrrCw5xZM/ce3XDfHp2AbnMiNsp0KW7pXCHoLrUWiYdPIuYZ586znEAAvU+5Wqu8Gs6QAevIQlBDVj8/4YvXWIo+jqpaAepD/3JiNXiNxVZ4qP+B1PZs1HD+DpUALdcFv8JpE4iUt5kIdOGihy7AEkiPkDmhoBioJRIbe+VZkIGXX4kU/Fm2H+2YNs46Ehxxbgh11Em2gXB+Cqr+5sYBdqhUmbwps17MCG49zMrxKizx/34d6erMNYIZ5MtcnV61m/dH37JNdmUw==; 5:5ikGIybBzLb5r0rZwlUELfcWSs5oVaTf8gSV4g4wUC19k/QHgKVz7cAnbImz4zVS4vqWjEHp64zzuGKY3IMQTCyhpmXZjLQFyfi87v4wT+AQKQwb/fS09H24ZCH4+ONBw+4e7YYruWgPUJys6rAya1EWz2M1zK8PnpLnR5b0YVM=; 7:I++smTRzC56A2LCge+lmFpEDcqOyUu0xqCUjfCk/a/hdexyutfdlasrP3XBAijLbDwO1IiI33IgS2GB3mSGLYF9evarQgepe6iqR9YZveemE5nsYtuFeosxOxxpVZnhWVbnX08JzClVrsrRa8XqPy2y4vZySPlXsPYmEpkiYmuuhcGUm26Ms3EZNMtev5Kypxyas8vcbCmxU2sWAxJoL7lMtUUJaCgLS/GlIBJKq7tkjQtSpFMj2VVOd+UJ2pKGc x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: b5d77fe5-78fe-4255-b9a7-08d5f30a1f97 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989117)(4534165)(4627221)(201703031133081)(201702281549075)(8990107)(5600073)(711020)(2017052603328)(7153060)(7193020); SRVR:BN7PR11MB2771; x-ms-traffictypediagnostic: BN7PR11MB2771: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(72170198267865); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(5005006)(8121501046)(3002001)(10201501046)(93006095)(93001095)(3231311)(944501410)(52105095)(149027)(150027)(6041310)(20161123558120)(20161123562045)(20161123564045)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011)(7699016); SRVR:BN7PR11MB2771; BCL:0; PCL:0; RULEID:; SRVR:BN7PR11MB2771; x-forefront-prvs: 07459438AA x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(136003)(376002)(346002)(396003)(366004)(39850400004)(13464003)(199004)(189003)(316002)(72206003)(110136005)(81166006)(5250100002)(81156014)(8676002)(8936002)(6116002)(74316002)(99286004)(66066001)(7736002)(305945005)(3846002)(2906002)(5660300001)(186003)(11346002)(446003)(7116003)(76176011)(6506007)(102836004)(53546011)(7696005)(25786009)(450100002)(14454004)(966005)(478600001)(26005)(6246003)(53936002)(229853002)(55016002)(6306002)(9686003)(105586002)(106356001)(86362001)(2900100001)(97736004)(3480700004)(68736007)(33656002)(476003)(486006)(14444005)(19627235002)(6436002)(256004); DIR:OUT; SFP:1101; SCL:1; SRVR:BN7PR11MB2771; H:BN7PR11MB2819.namprd11.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: microchip.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: yhnFUPQu07bjqZvJw/pDhbHcUUsfUJYTkzKT+3StMXt/aHDg6nVbX0lXAoYdhdCguuHU6OTszFVjasaHLsdnhE2fbFw1lVmcOybPVCO+78zhfPfuVHbpTFepyzGzcsGy+HFFkn7HfGfJaZZXdt6Fq2wuJrYPHJclXmQq37Y1ybggbXp1xLvLuIp5f9y3NA8loUB6qprUK1shuk4qV2DztwRtMPQh66Hq0UUEOqisjSlGdVw7wv2QyECdqviZ72tR8fyhfqCDO9PZ9PDvES8b/QAM/fRK9bL9DpA5OMH8Dbl/8SM9wCMZp4ngteyUfZ18iD0SmAQps8cDLJENTPOJwWvyD5jGNSp3sJZJqp0d6BU= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: b5d77fe5-78fe-4255-b9a7-08d5f30a1f97 X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Jul 2018 15:11:54.6177 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3f4057f3-b418-4d4e-ba84-d55b4e897d88 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN7PR11MB2771 X-OriginatorOrg: microchip.com X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jul 2018 15:13:08 -0000 Nothing obvious, you just need to know where to look. Under the following support link https://storage.microsemi.com/en-us/suppor= t/ choose the card and go to the support download page. For example, this link should be the download page for a SmartRAID 3154-8i. https://storage.microsemi.com/en-us/support/raid/sas_raid/asr-3154-8i/ Go to the Storage Manager Download link and choose the "Microsemi Adaptec A= RCCONF Command Line Utility v2.06.23167" . While the description says Micro= semi Adaptec CLI for Windows and Linux, the FreeBSD binaries are included i= n the zip package as well. The same utility will work on all of the HBA1100/SmartHBA2100/SmartRAID 310= 0 cards. HTH, Scott -----Original Message----- From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd-scsi@freebsd.org= ] On Behalf Of John Sent: Thursday, July 26, 2018 7:16 AM To: FreeBSD-scsi Subject: SmartPQI utility support? Hi Folks, I have some new HP systems with SmartPQI cards in them. The smartpqi dri= ver pushed to the tree recently does seem to work correctly. However, I cannot find any sign of a configuration utility like ARCCONF = to query/create volumes, etc. Am I missing something obvious? Thanks, John