From owner-freebsd-scsi@FreeBSD.ORG Mon Dec 8 21:54:22 2014 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1DB56968; Mon, 8 Dec 2014 21:54:22 +0000 (UTC) Received: from mail-wi0-x22a.google.com (mail-wi0-x22a.google.com [IPv6:2a00:1450:400c:c05::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9ED79775; Mon, 8 Dec 2014 21:54:21 +0000 (UTC) Received: by mail-wi0-f170.google.com with SMTP id bs8so8394518wib.3 for ; Mon, 08 Dec 2014 13:54:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=exuh8rmE8NMY37VPH3qR1I3YfBL1pK5aNi/8FplO2sk=; b=ZmjVr73HoUxGDD16ubQwXAUanh3qZClrNqIZR6H5hNv1hwy989S0E6C9qQww2ghC6w nHJoFiboUZ9i2xjbsjqSfIRF+ioqJ+xPeV92Ew/OfeuAyd/0fozFnofe4kuX7WgWNCue SIP8arqvlrLGlAOZOjsBzKDmxrKQ3hXSzUZwa/yus9YpwoiGW8lS9rK2ZnRX6++LD4A8 2aQjyIDDi2oqJRpxviQLR9G5qD1nm0gElieOC2clcSAcD8tgLWl8f76U9xddE7WLrLtE 5OSNC62OyMD1oFpDg7KwWl1WuoGundf2LetKgScYCx3KjQ8jzsDv8G8w0qsmHpezLB6A vEDw== MIME-Version: 1.0 X-Received: by 10.194.60.45 with SMTP id e13mr49071011wjr.109.1418075656086; Mon, 08 Dec 2014 13:54:16 -0800 (PST) Sender: asomers@gmail.com Received: by 10.194.17.129 with HTTP; Mon, 8 Dec 2014 13:54:16 -0800 (PST) In-Reply-To: References: <54822835.3080800@crystal.harvard.edu> Date: Mon, 8 Dec 2014 14:54:16 -0700 X-Google-Sender-Auth: TAwfPGPAohVIyb59DFQEdm0YYmg Message-ID: Subject: Re: LSI SAS 3008 card - 35 out of 36 disks detected From: Alan Somers To: Jason Wolfe Content-Type: text/plain; charset=UTF-8 Cc: FreeBSD-scsi X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 21:54:22 -0000 On Mon, Dec 8, 2014 at 1:44 PM, Jason Wolfe wrote: > On Mon, Dec 8, 2014 at 11:20 AM, Alan Somers wrote: >> On Mon, Dec 8, 2014 at 8:58 AM, Justin O'Conor >> wrote: >>> Hi All, >>> Thanks, this is encouraging. smp_discover ses0|1 see 36 sata disks. This is from the 10.1 install. >>> >> >> There are certainly some inconsistencies in the smp_discover >> responses. For example, the SEP on ses0 (phy identifier: 36) has >> "connector type: SAS virtual connector" and "connector element index: >> 24" But the SEP on ses1 (phy identifier: 28) has "connector type: No >> information" and "connector element index: 0". Also note that "phy >> identifier: 12" on ses1 has "connector element index: 0". That would >> be the first slot on the rear expander, if the slots and phys are >> numbered the same way. My best guess is that phy 12 and phy 28 mapped >> to the same map_idx in mpr_mapping.c:1168. So the information for the >> SEP overwrote the information for the first disk slot. If my guess is >> true, then recabling your chassis as you suggested wouldn't help. >> However, you might try the attached but untested patch. It will >> prevent the SEP from being added to the mapping table while printing a >> useful error message. If I'm correct, then the patch will let you use >> all 36 disk slots, but you won't have ses1 anymore. >> >> In the meantime, I'll try to reproduce your problem. I have all the >> required equipment in my lab. >> >> -Alan >> > > I believe this is the same issue we ran into a few years ago on the > LSI2008, where the ses0 device would map over the boot disk. It's > long and spans over multiple months, so just the relevant bits: > > Initial report: > https://lists.freebsd.org/pipermail/freebsd-scsi/2012-February/005243.html > > The issue seems to be a shortcoming in the detection method where it > has no problem assigning the ses over an already mapped disk, LSI's > initial response was to use the LSI config utility and map the drives > manually: > https://lists.freebsd.org/pipermail/freebsd-scsi/2012-February/005243.html > https://lists.freebsd.org/pipermail/freebsd-scsi/2012-February/005267.html > https://lists.freebsd.org/pipermail/freebsd-scsi/2012-March/005343.html > > This was not an option for us as we had 2000 of these devices in the > field, and entering the LSI BIOS would be a large undertaking. In the > end after an internal dialogue with LSI guys, Kashyap was kind enough > to write a one off for us, that never made it upstream. It simply > assigns the ses device to max target + 1 when a conflict is found. > The core issue seems to be with the way LSI detects and assigns > devices on FreeBSD, so this is by no means 'proper', but it's sound > enough so resolve the issue for us on the LSI2008. In case it's > interesting to anyone: > > http://nitrology.com/mps_fix-fbsd10stable.diff > > Jason I've reproduced the problem, and it's the same one that I saw before. It's also the same one that Jason described, but the full problem is a little more general than just the SEP device's mapping. First a little background: LSI controllers have two methods for mapping phys to SCSI Bus and Target IDs. One method is called Device Persistence mapping. It is based on the SAS WWN attached to each phy. The other method is called Enclosure/Slot mapping. That method uses the Connector Element Index or the Device Slot Number field of the expander's SMP DISCOVER response for the given phy. It seems that all of LSI's SAS2 HBAs used Device Persistence mapping by default, but the SAS3 HBAs use Enclosure/Slot mapping. That's why this problem rarely or never shows up with SAS 2 HBAs. The SAS Protocol Layer 3 rev 6g spec, section 9.4.3.11, says that the CONNECTOR ELEMENT INDEX field shall be ignored if the CONNECTOR TYPE field is set to 0. Clearly, the HBA firmware isn't ignoring that field. That's a bug with the HBA firmware. But the expander firmware could be doing better. If it reported a unique CONNECTOR ELEMENT INDEX for the SEP phy, then we wouldn't have this problem. I'll take that up with the expander vendor. In the meantime, there is a workaround. Don't use the patch I sent you; it doesn't work. The workaround is to configure your HBA to use Device Persistence mapping. You can do that from FreeBSD using a tool called lsiutil. Unfortunately, it isn't publicly distributed, but you can ask Steve McConnell (cc'ed) for a copy. Here are the instructions: 1) Ensure that the hba of concern is named "mpr0". 2) Start lsiutil 3) Select "mps0" [sic] 4) (Optionally) enter e for expert mode 5) Enter 9 for "Read/change configuration pages" 6) Enter 1 for Page Type (that means the IOC pages, FYI) 7) Enter 8 for Page Number 8) Enter 0 for NVRAM values 9) Enter "yes" to make changes 10) Offset is "c" 11) Change "00000002" to "00000001". 12) Enter "yes" to save changes. 13) Either reboot, or unload and reload mpr(4). That change will put you in Device Persistence mapping but with persistent mapping disabled. All of the slots should work again. At least I hope so. -Alan