From owner-freebsd-scsi@freebsd.org Tue Jul 3 14:26:37 2018 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4937A1024BB8 for ; Tue, 3 Jul 2018 14:26:37 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (mithlond.kdm.org [96.89.93.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mithlond.kdm.org", Issuer "mithlond.kdm.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id CE3F08379D; Tue, 3 Jul 2018 14:26:36 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (localhost [127.0.0.1]) by mithlond.kdm.org (8.15.2/8.14.9) with ESMTPS id w63EQTqr022873 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 3 Jul 2018 10:26:29 -0400 (EDT) (envelope-from ken@mithlond.kdm.org) Received: (from ken@localhost) by mithlond.kdm.org (8.15.2/8.14.9/Submit) id w63EQT50022872; Tue, 3 Jul 2018 10:26:29 -0400 (EDT) (envelope-from ken) Date: Tue, 3 Jul 2018 10:26:29 -0400 From: "Kenneth D. Merry" To: Oliver Sech Cc: freebsd-scsi@freebsd.org, slm@freebsd.org Subject: Re: problems with SAS JBODs 2 Message-ID: <20180703142629.GF26046@mithlond.kdm.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [127.0.0.1]); Tue, 03 Jul 2018 10:26:30 -0400 (EDT) X-Spam-Status: No, score=-2.7 required=5.0 tests=ALL_TRUSTED,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mithlond.kdm.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jul 2018 14:26:37 -0000 On Tue, Jul 03, 2018 at 14:28:58 +0200, Oliver Sech wrote: > Hi! > ?? > I use FreeBSD with for a large ZFS pool (over 1PB) and I recently encountered a lot of problems with the JBODs. Generally everything works fine until I replug the shelves. > ?? > When I start with a clean system and attach a single shelf every thing seems fine. > -> 44 disks show up, I can use the enclosure services (sesutil) and the system continues to run without problems. > Once I disconnect the SAS cable, wait until all devices disapear and reconnect I get all sorts of problems. > -> a random number of disks shows up and the enclosure "ses" do not show up > Once I restart the system I can start over again. > ?? > On the server with the large pool there are only certain ports on the HBA that I can use, otherwise disks will be missing after a reboot and my ZFS pool won't go online. > I tried different firmware on the HBA. I tried the mpr.ko module from the broadcom site. (I replaced the one in /boot/kernel?) > I tested all the things above with a Linux as OS and everything seems to work. > ?? > ?? > Is there anything I'm missing? A command that can reset the SAS components? > ?? > ?? > FreeBSD version: 11.1-RELEASE-p11 > HBA: broadcom lsi 9305-16e (latest firmware) > JBOD:SC847E2C-R1K28JBOD (two expanders, internally daisy chained) Steve McConnell (CCed) and I have been corresponding with someone else who has a problem very similar to yours. The most likely issue is that the mapping table stored on the card is messed up. Can you send dmesg output with the following loader tunable set: hw.mpr.debug_level=0x203 That will turn on debugging for the mapping code and may show the problem. If you see messages like this: mpr0: Attempting to reuse target id 63 handle 0x000b mpr0: Attempting to reuse target id 64 handle 0x000c mpr0: Attempting to reuse target id 65 handle 0x000d mpr0: Attempting to reuse target id 66 handle 0x000e mpr0: Attempting to reuse target id 67 handle 0x000f mpr0: Attempting to reuse target id 68 handle 0x0010 mpr0: Attempting to reuse target id 69 handle 0x0011 mpr0: Attempting to reuse target id 70 handle 0x0012 mpr0: Attempting to reuse target id 66 handle 0x000e It indicates that the mapping code is preventing some of the drives from fully probing because there are collisions in the table. Unfortunately we have not yet fixed the problem in the other situation. (He is running with multipathing, which could be contributing to the problem.) I have a script and utility that will clear the mapping table in the card, but that hasn't been enough to fix the other situation. If you do have a mapping problem, I can give you the script/utility to clear the table and we can see whether it fixes your problem. If not, it'll probably have to wait until Steve gets back from vacation. Ken -- Kenneth Merry ken@FreeBSD.ORG