From owner-freebsd-scsi@freebsd.org Thu Jul 12 10:00:46 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3316B104927B for ; Thu, 12 Jul 2018 10:00:46 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 93F89872C7; Thu, 12 Jul 2018 10:00:45 +0000 (UTC) (envelope-from crimsonthunder@gmx.net) Received: from [10.12.22.246] ([193.170.152.64]) by mail.gmx.com (mrgmx103 [212.227.17.168]) with ESMTPSA (Nemesis) id 0MKHik-1fbti331Gw-001k9W; Thu, 12 Jul 2018 12:00:41 +0200 Subject: Re: problems with SAS JBODs 2 To: Ken Merry , Stephen Mcconnell Cc: FreeBSD-scsi References: <237f77ab-89e2-188b-b2b1-84c6d88609b0@gmx.net> <3caf8ccd6fde8cfc4db25bae5327c46b@mail.gmail.com> <0af047d477d15ec364140653bd967c89@mail.gmail.com> <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> From: Oliver Sech Message-ID: <9e0bf18f-0689-b2a0-1da4-b70c497b2f14@gmx.net> Date: Thu, 12 Jul 2018 12:00:41 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <54B10B7C-CDCE-4428-B584-59CE8F38B120@freebsd.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K1:eTB5jW7biihbF9I8CDtdTIu87j4ViglNKsrBno24dTijl0k5xED K1WJEi00JHRjq5C6Wnjo4QT+2De6RUbCSaZMq7+ir++whzWA0LAKcD0Fm/zQf40F2H1/fj9 oBodhKjsZt9oobytOxEPaFdJRY726C9PrRnEuT3U5Kdbcoavks38SX+o/+5DAJPH8NWhMDu HJdI9YhPm99yTHeem2wTQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:azlQ65a6PYI=:TY0Y+miLb7z9GWSrmj9gzl GXhLcG/qihEWRS4DINjcPGNC9JDj5KdhVFEviY3ERggL8e/4JEorEyNjg5cDtIuhPUq6j2nSB 7RhZqlmu/9wQxbwpFXm14tzOH20CC9IDteuKCoMfmcZlG6YCbu5HdyWLzHJXfPxQ/Urm/gz1R AQI5De3QlkEbEX+ppX2LKrJrhDAD4AtO13fR5OKOBnmjqw/RZXBCvlrjdzT43ci1y7KeU+5Dk aIpCS1LshJyZwSgphhrUXmS5k5zR6JhGtXTX13xy5HJ0kDq6aOI3aYcwB1pz+C+Fs2/r1hRpT tjy1sZ81GLftj+foYOI/0RshAHO6SafU0o1kmEnoBun3rep2Dy5dNT1p8rEhwppaM7WXIMC3H V5H3rZihs7ytFHWwSjKl9oA85aJtQF/9NsoGqzZ3prnXcW1TKiobGL/umkSCpo92Q0u2wmXOF 6rm5MgC4u6sH4Lb/H306dClYVfeVrYSFJijd7I8f5iUE+wO0cfdVsEXCTY4AcjD7ORRAEzTFZ IpaTHHBhzVVx+Xx8hJm+ebcauQlQS79AXKSORWmZGyOMQaJREJtFs7YEnJFt5C7AFlCU3MCpC oHM/ehtFKVpHKz62UUbxBlOt3hLzy4Ra2Zb/5IzXPtfQ4P51a3VJR1H25bflIBXxsNDfRBGkN Oh/ZgiraD9bJOWDprLHIWBDvJxib6BKSfSFLmXV4N+vLu/EWyMBSEvBLJ11pgsDg6zR0fHR5W vqiYHL8FEbQbqgXtzXj2PT5As3X7QAxw/buC82/bAdMSWe5GkJ2A73haPhwuBtPDeajmsmvwf sFssUw8 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jul 2018 10:00:46 -0000 On 07/11/2018 10:35 PM, Ken Merry wrote: > Oliver, what happens when you try to do I/O to the devices that don’t go away after you pull the cable? Does that cause the devices to go away? I tried to 'dd if=/dev/daX of=/dev/null bs=1k count=1' and at least the "da" device disappears. > Looking at the mprutil output, it also shows the devices sticking around from the adapter’s standpoint. > > You can also try a ‘camcontrol rescan all’ or a ‘camcontrol rescan N’ (where N is the scbus number shown by ‘camcontrol devlist -v’). That will do some basic probes for each of the devices and should in theory cause them to go away if they aren’t accessible. > > It seems like the adapter may not be recognizing that the devices in question have gone. I'm pretty sure that I tried this 'camcontrol rescan all' a few times. While I not sure anymore if that cleans up the non-working devices, I'm sure that no new devices were added. Unfortunately I haven't gotten yet to Steves 'clear controller mapping' script but I did a few other things: * The last time I tried to upgrade the firmware I had all sorts of problems. "sas3flash" reported bad checksums while flashing some of the files. So I reflashed both controllers with the DOS version of sas3flash. This was basically a challenge in itself because the DOS version of this utility does not seem to run on computers of this decade. (ERROR: Failed to initialize PAL. Exiting program.) The equivalent sas3flash.EFI version seems to be out of date and caused the checksum problems described before. (This time I wiped them before flashing with "sas3flash -o -e 6".) * I tried to change mpr tuneable "use_phy_num" after that but this has not improved the situation. I will retry and collect logs with Steves script. * I retried with the latest "mpr.ko" from the broadcom download page. (Same problems, no "use_phy_num" tuneable.) * I retested this hardware with Linux (4.15 and 4.17) ** Some shelves could be replugged reliably (ie: 45 disks show up, 45 disks disappear, 45 disks reappear) ** The newest shelf 2 disks were missing after the replugging (ie: 44 disks show up, 44 disks disappear, 42 disks reappear) (kernel log mpt3sas_cm0: "device is not present handle) * I tired a different controller ** So far I used a Broadcom LSI SAS 9305-16e (Controller: SAS3216) (Firmware 16.00.01.00 or 15.00.00.00) ** Yesterday I switched to a new fresh out-of-the-box Broadcom LSI 9305-24i (Controller: SAS3224) (Firmware 09.00.00.00 (or something similar with 09*)) With the new controller everything seems work on Linux. It might be the old Firmware?... It is better with the new controller on FreeBSD in that sense that I at least get one out of two /dev/sesX devices back. But disks are still missing and are not getting completely cleaned up... This whole thing is a bit frustrating, especially since up until now I thought that HBAs are kind of "connect and forget" devices. Next step is to set up a separate test environment and try to get it to work there. I will keep you updated and try provide log for all FreeBSD related problems.