From owner-freebsd-drivers@FreeBSD.ORG Wed Apr 1 18:32:38 2015 Return-Path: Delivered-To: freebsd-drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EC806A5B for ; Wed, 1 Apr 2015 18:32:37 +0000 (UTC) Received: from bronze.cs.yorku.ca (bronze.cs.yorku.ca [130.63.95.34]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BC6BA7B1 for ; Wed, 1 Apr 2015 18:32:37 +0000 (UTC) Received: from [130.63.97.125] (ident=jas) by bronze.cs.yorku.ca with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.76) (envelope-from ) id 1YdNRL-00041N-ER for freebsd-drivers@freebsd.org; Wed, 01 Apr 2015 14:32:35 -0400 Message-ID: <551C39C3.10309@cse.yorku.ca> Date: Wed, 01 Apr 2015 14:32:35 -0400 From: Jason Keltz User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: freebsd-drivers@freebsd.org Subject: issue with hot swap and mps driver Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Spam-Level: - X-Spam-Report: Content preview: I have an LSI 9205-8e card in a system running FreeBSD 10.1-RELEASE-p5. > mps0: port 0x4000-0x40ff mem > 0xc1440000-0xc144ffff, 0xc1400000-0xc143ffff irq 16 at device 0.0 on pci1 > mps0: Firmware: 20.00.02.00, Driver: 19.00.00.00-fbsd > mps0: IOCCapabilities: > 5285c The card is connected to an AIC 24 disk JBOD (AIC SSG-JBSA21-4243-A1 SAS 6G) with hot swap capability. [...] Content analysis details: (-1.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SHORTCIRCUIT Not all rules were run, due to a shortcircuited rule -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Apr 2015 18:32:38 -0000 I have an LSI 9205-8e card in a system running FreeBSD 10.1-RELEASE-p5. > mps0: port 0x4000-0x40ff mem > 0xc1440000-0xc144ffff,0xc1400000-0xc143ffff irq 16 at device 0.0 on pci1 > mps0: Firmware: 20.00.02.00, Driver: 19.00.00.00-fbsd > mps0: IOCCapabilities: > 5285c The card is connected to an AIC 24 disk JBOD (AIC SSG-JBSA21-4243-A1 SAS 6G) with hot swap capability. I'm having some interesting hot swap issues under mps with 2 TB Western Digital SATA disks. 1) If I hot swap an older Western Digital disk, model WD2002FAEX-007BA0 with firmware 1D05, the disk hot swaps perfectly under FreeBSD. That is, when I remove the disk, the device entry in /dev is removed, and when I re-insert the disk, it returns. This is the behaviour I expect. > (da21:mps0:0:31:0): Periph destroyed > da21 at mps0 bus 0 scbus0 target 31 lun 0 > da21: Fixed Direct Access SCSI-6 device > da21: Serial Number WD-XXXXXXXXXXXXX > da21: 600.000MB/s transfers > da21: Command Queueing enabled > da21: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) 2) If I hot swap a slightly newer Western Digital disk, model WD2002FAEX-00MJRA0 with firmware 1L01, then when I re-insert the disk, the device entry does not return, and I instead see this: > mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x47 > mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x47 > mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x47 > mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x47 > mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x47 > _mapping_get_dev_info: failed to compute the hashed SAS Address for > SATA device with handle 0x0019 > failure at /usr/src/sys/dev/mps/mps_sas_lsi.c:670/mpssas_add_device()! > Could not get ID for device with handle 0x0019 > mpssas_fw_work: failed to add device with handle 0x19 3) If I hot swap a much newer RE disk, WD2000FYYZ-0 with 1K03 firmware, the problem is the same as 2). Worse, if I run an "sas2ircu 0 display" command to list the enclosure after the above error occurs, the kernel dumps. I needed to resolve this issue under both Red Hat Enterprise Linux, and FreeBSD because I am using one set of these disks in each system. Now, I've been in contact with AIC, Western Digital, LSI/Avago, and Red Hat and spent countless hours sending debugging details, etc. Through RHEL, I was able to get a patch indirectly through Avago which "solves" the driver problem for RHEL. ("In this patch driver won't block the device if the device state is "SDEV_CREATED" (i.e. driver won't block the drive when drive is still in the device add process at SCSI MID Layer). So that SCSI MID Layer can send the Inquiry commands.". The patch is slated for internal review at LSI. It would be nice to see a similar patch on the FreeBSD version of the mps driver which prevents the driver from hanging when the disk is inserted. There's no question that the disks take a little bit of extra time to respond. It's not really clear why it can't have a wee bit extra time to respond. However, in discussing the issue with other people, I'm told that this problem occurs on other vendor hard disks as well. Jason. ps: While I'm running with Firmware: 20.00.02.00, and Driver: 19.00.00.00-fbsd, I've tested with firmware 19 as well, and this doesn't change anything.