From owner-freebsd-scsi@FreeBSD.ORG Mon Oct 31 11:07:12 2011 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 09A90106574B for ; Mon, 31 Oct 2011 11:07:12 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id E41EF8FC28 for ; Mon, 31 Oct 2011 11:07:11 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9VB7BTU056865 for ; Mon, 31 Oct 2011 11:07:11 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9VB7Bf4056863 for freebsd-scsi@FreeBSD.org; Mon, 31 Oct 2011 11:07:11 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 31 Oct 2011 11:07:11 GMT Message-Id: <201110311107.p9VB7Bf4056863@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-scsi@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Oct 2011 11:07:12 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/161809 scsi [cam] [patch] set kern.cam.boot_delay via build option o kern/159412 scsi [ciss] 7.3 RELEASE: ciss0 ADAPTER HEARTBEAT FAILED err o kern/157770 scsi [iscsi] [panic] iscsi_initiator panic o kern/154432 scsi [xpt] run_interrupt_driven_hooks: still waiting after o kern/153514 scsi [cam] [panic] CAM related panic o kern/153361 scsi [ciss] Smart Array 5300 boot/detect drive problem o kern/152250 scsi [ciss] [patch] Kernel panic when hw.ciss.expose_hidden o kern/151564 scsi [ciss] ciss(4) should increase CISS_MAX_LOGICAL to 10 o docs/151336 scsi Missing documentation of scsi_ and ata_ functions in c s kern/149927 scsi [cam] hard drive not stopped before removing power dur o kern/148083 scsi [aac] Strange device reporting o kern/147704 scsi [mpt] sys/dev/mpt: new chip revision, partially unsupp o kern/146287 scsi [ciss] ciss(4) cannot see more than one SmartArray con o kern/145768 scsi [mpt] can't perform I/O on SAS based SAN disk in freeb o kern/144648 scsi [aac] Strange values of speed and bus width in dmesg o kern/144301 scsi [ciss] [hang] HP proliant server locks when using ciss o kern/142351 scsi [mpt] LSILogic driver performance problems o kern/141934 scsi [cam] [patch] add support for SEAGATE DAT Scopion 130 o kern/134488 scsi [mpt] MPT SCSI driver probes max. 8 LUNs per device o kern/132250 scsi [ciss] ciss driver does not support more then 15 drive o kern/132206 scsi [mpt] system panics on boot when mirroring and 2nd dri o kern/130621 scsi [mpt] tranfer rate is inscrutable slow when use lsi213 o kern/129602 scsi [ahd] ahd(4) gets confused and wedges SCSI bus o kern/128452 scsi [sa] [panic] Accessing SCSI tape drive randomly crashe o kern/128245 scsi [scsi] "inquiry data fails comparison at DV1 step" [re o kern/127927 scsi [isp] isp(4) target driver crashes kernel when set up o kern/127717 scsi [ata] [patch] [request] - support write cache toggling o kern/124667 scsi [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi o kern/123674 scsi [ahc] ahc driver dumping o kern/123520 scsi [ahd] unable to boot from net while using ahd o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/120487 scsi [sg] scsi_sg incompatible with scanners o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load o kern/60598 scsi wire down of scsi devices conflicts with config s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o bin/57088 scsi [cam] [patch] for a possible fd leak in libcam.c o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce 49 problems total. From owner-freebsd-scsi@FreeBSD.ORG Tue Nov 1 18:42:03 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7879D106564A for ; Tue, 1 Nov 2011 18:42:03 +0000 (UTC) (envelope-from nitroboost@gmail.com) Received: from mail-dy0-f54.google.com (mail-dy0-f54.google.com [209.85.220.54]) by mx1.freebsd.org (Postfix) with ESMTP id EE6068FC16 for ; Tue, 1 Nov 2011 18:42:02 +0000 (UTC) Received: by dye36 with SMTP id 36so397915dye.13 for ; Tue, 01 Nov 2011 11:42:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=3trq6LFWsnHWqs3tuc6y7zF9rP+zyikr5hs04y/L/rs=; b=Fugmkxctt2nkyllbtSUgysM7z1Fu7JrL3e6rypghXRVVmM+Hs5UuUuFuYODecwqyzM GRetBFz+3GoAL3pRibYtB2RN1dbc+fsEZSOIJCxgJmL0HdN8j4OJgyJ+U8g9GvF+qvxW s2Y1hkd8bYvz2r17AHIWWO0Y6y7XNIWxp6r+M= MIME-Version: 1.0 Received: by 10.182.115.40 with SMTP id jl8mr157403obb.8.1320171197190; Tue, 01 Nov 2011 11:13:17 -0700 (PDT) Received: by 10.182.35.193 with HTTP; Tue, 1 Nov 2011 11:13:17 -0700 (PDT) Date: Tue, 1 Nov 2011 11:13:17 -0700 Message-ID: From: Jason Wolfe To: freebsd-scsi@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: mps/LSI SAS2008 controller crashes when smartctl is run with upped disk tags X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Nov 2011 18:42:03 -0000 Hello, I have an issue with the mps driver on 8.2 where running 'smartctl -a' rarely causes the controller to freak out when disk tags are > 2. I've confirmed settings the tags to 1 resolves this crash, so that surely is a clue in the right direction.. I'm using Seagate 1TB SAS drives - ST91000640SS, and these are SuperMicro X8DTT-H chasis. This happens across over a thousand servers, so it surely not flaky hardware. It could obviously be some interoperability with these model drives and the mps controller, but unfortunately I don't have any other drives deployed on these cards to test that theory out :/ Luckily remote syslogging is enabled, so while nothing is kept locally, we see these messages similar to these transmitted before the server hangs, requiring a power cycle: (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID 510 (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID 713 (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID 942 (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID 356 (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID 492 (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID 976 (da11:mps0:0:12:0): SCSI command timeout on device handle 0x0015 SMID 339 (da11:mps0:0:12:0): SCSI command timeout on device handle 0x0015 SMID 746 (da5:mps0:0:6:0): SCSI command timeout on device handle 0x000f SMID 74 (da6:mps0:0:7:0): SCSI command timeout on device handle 0x0010 SMID 613 (da2:mps0:0:3:0): SCSI command timeout on device handle 0x000c SMID 16 (da10:mps0:0:11:0): SCSI command timeout on device handle 0x0014 SMID 305 (da1:mps0:0:2:0): SCSI command timeout on device handle 0x000b SMID 74 (da6:mps0:0:7:0): SCSI command timeout on device handle 0x0010 SMID 594 In some cases that would be followed by this, which would usually be the last transmission, though we don't see this in all cases. It may just be the system isn't always alive long enough to transmit: kernel: mps0: IOC Fault 0x40006003, Resetting I'm able to reproduce fairly easily within a minute or two by heavily loading the disks up by whatever means, and running smartctl -a in a loop: #!/bin/sh -x disks=`sysctl -n kern.disks|xargs -n1|grep ^da` for disk in $disks; do camcontrol tags $disk -N 4 done for z in `yes|head -100`; do for disk in $disks; do smartctl -s on -a /dev/$disk done done mps0: port 0xe000-0xe0ff mem 0xfbd3c000-0xfbd3ffff,0xfbd40000-0xfbd7ffff irq 26 at device 0.0 on pci4 mps0: Firmware: 07.00.00.00 mps0: IOCCapabilities: 1285c mps0: [ITHREAD] da0 at mps0 bus 0 scbus0 target 1 lun 0 da1 at mps0 bus 0 scbus0 target 2 lun 0 da2 at mps0 bus 0 scbus0 target 3 lun 0 da3 at mps0 bus 0 scbus0 target 4 lun 0 da4 at mps0 bus 0 scbus0 target 5 lun 0 da5 at mps0 bus 0 scbus0 target 6 lun 0 da6 at mps0 bus 0 scbus0 target 7 lun 0 da7 at mps0 bus 0 scbus0 target 8 lun 0 da8 at mps0 bus 0 scbus0 target 9 lun 0 da9 at mps0 bus 0 scbus0 target 10 lun 0 da10 at mps0 bus 0 scbus0 target 11 lun 0 da11 at mps0 bus 0 scbus0 target 12 lun 0 ses0 at mps0 bus 0 scbus0 target 13 lun 0 mps0@pci0:4:0:0: class=0x010700 card=0x040015d9 chip=0x00721000 rev=0x02 hdr=0x00 vendor = 'LSI Logic (Was: Symbios Logic, NCR)' class = mass storage subclass = SAS at scbus0 target 1 lun 0 (pass0,da0) at scbus0 target 2 lun 0 (pass1,da1) at scbus0 target 3 lun 0 (pass2,da2) at scbus0 target 4 lun 0 (pass3,da3) at scbus0 target 5 lun 0 (pass4,da4) at scbus0 target 6 lun 0 (pass5,da5) at scbus0 target 7 lun 0 (pass6,da6) at scbus0 target 8 lun 0 (pass7,da7) at scbus0 target 9 lun 0 (pass8,da8) at scbus0 target 10 lun 0 (pass9,da9) at scbus0 target 11 lun 0 (pass10,da10) at scbus0 target 12 lun 0 (pass11,da11) at scbus0 target 13 lun 0 (ses0,pass12) Thank you sirs, Jason Wolfe From owner-freebsd-scsi@FreeBSD.ORG Tue Nov 1 19:18:07 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C5846106566C for ; Tue, 1 Nov 2011 19:18:07 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from mo-p05-ob6.rzone.de (mo-p05-ob6.rzone.de [IPv6:2a01:238:20a:202:53f5::1]) by mx1.freebsd.org (Postfix) with ESMTP id C91908FC18 for ; Tue, 1 Nov 2011 19:18:06 +0000 (UTC) X-RZG-AUTH: :LWIKdA2leu0bPbLmhzXgqn0MTG6qiKEwQRWfNxSw4HzYIwjsnvdDt2oX8drk23mpKMZH7NA= X-RZG-CLASS-ID: mo05 Received: from [192.168.179.42] (hmbg-5f7606d1.pool.mediaWays.net [95.118.6.209]) by post.strato.de (mrclete mo57) (RZmta 26.10 AUTH) with (DHE-RSA-AES128-SHA encrypted) ESMTPA id w01ed2nA1IP5pQ for ; Tue, 1 Nov 2011 20:17:45 +0100 (MET) Message-ID: <4EAEF431.7090108@brockmann-consult.de> Date: Mon, 31 Oct 2011 20:17:05 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with upped disk tags X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Nov 2011 19:18:07 -0000 Dear Jason, Am 01.11.2011 19:13, schrieb Jason Wolfe: > Hello, > > I have an issue with the mps driver on 8.2 where running 'smartctl -a' > rarely causes the controller to freak out when disk tags are > 2. I've > confirmed settings the tags to 1 resolves this crash, so that surely is a > clue in the right direction.. I'm using Seagate 1TB SAS drives - > ST91000640SS, and these are SuperMicro X8DTT-H chasis. This happens across > over a thousand servers, so it surely not flaky hardware. It could > obviously be some interoperability with these model drives and the mps > controller, but unfortunately I don't have any other drives deployed on > these cards to test that theory out :/ I get a simlar problem on a system with an LSI 9211-8i with 20 SATA disks attached (2 SSDs and 18 spnning disks). My system doesn't hang, panic, or reset though. I just lose access to one disk, which is then considered FAULTED in my zpool status (with the ZFS file system). If I physically remove the FAULTED disk and run "gpart recover da0", I get a panic. Otherwise, the system keeps running in a degraded state. When I reboot and resilver, some data is found damaged and repaired, not just refreshed with the latest state. The server has 1 HBA and 2 backplanes, and I have the 2 mirrored root disks on different backplanes. Maybe that is why mine runs degraded and yours hang. This happened twice so far (in around a month or two), and both times it was one of the mirrored root disks (SSDs) that faulted. My tags are set to 255. I will try reproducing it as you said, and then if it fails, rebooting and trying again setting tags to 2 as you suggested. And *thank you very much for this information*. This is the last outstanding issue with this server. I hope this workaround helps. # camcontrol tags /dev/da0 (pass0:mps0:0:7:0): device openings: 255 > > Luckily remote syslogging is enabled, so while nothing is kept locally, we > see these messages similar to these transmitted before the server hangs, > requiring a power cycle: > > (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID > 510 > (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID > 713 > (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID > 942 > (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID > 356 > (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID > 492 > (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID > 976 > (da11:mps0:0:12:0): SCSI command timeout on device handle 0x0015 SMID > 339 > (da11:mps0:0:12:0): SCSI command timeout on device handle 0x0015 SMID > 746 > (da5:mps0:0:6:0): SCSI command timeout on device handle 0x000f SMID 74 > (da6:mps0:0:7:0): SCSI command timeout on device handle 0x0010 SMID > 613 > (da2:mps0:0:3:0): SCSI command timeout on device handle 0x000c SMID 16 > (da10:mps0:0:11:0): SCSI command timeout on device handle 0x0014 SMID > 305 > (da1:mps0:0:2:0): SCSI command timeout on device handle 0x000b SMID 74 > (da6:mps0:0:7:0): SCSI command timeout on device handle 0x0010 SMID > 594 > > In some cases that would be followed by this, which would usually be the > last transmission, though we don't see this in all cases. It may just be > the system isn't always alive long enough to transmit: > > kernel: mps0: IOC Fault 0x40006003, Resetting > > > I'm able to reproduce fairly easily within a minute or two by heavily > loading the disks up by whatever means, and running smartctl -a in a loop: > > #!/bin/sh -x > > disks=`sysctl -n kern.disks|xargs -n1|grep ^da` > > for disk in $disks; do > camcontrol tags $disk -N 4 > done > > for z in `yes|head -100`; do > for disk in $disks; do > smartctl -s on -a /dev/$disk > done > done > > mps0: port 0xe000-0xe0ff mem > 0xfbd3c000-0xfbd3ffff,0xfbd40000-0xfbd7ffff irq 26 at device 0.0 on pci4 > mps0: Firmware: 07.00.00.00 > mps0: IOCCapabilities: > 1285c > mps0: [ITHREAD] > da0 at mps0 bus 0 scbus0 target 1 lun 0 > da1 at mps0 bus 0 scbus0 target 2 lun 0 > da2 at mps0 bus 0 scbus0 target 3 lun 0 > da3 at mps0 bus 0 scbus0 target 4 lun 0 > da4 at mps0 bus 0 scbus0 target 5 lun 0 > da5 at mps0 bus 0 scbus0 target 6 lun 0 > da6 at mps0 bus 0 scbus0 target 7 lun 0 > da7 at mps0 bus 0 scbus0 target 8 lun 0 > da8 at mps0 bus 0 scbus0 target 9 lun 0 > da9 at mps0 bus 0 scbus0 target 10 lun 0 > da10 at mps0 bus 0 scbus0 target 11 lun 0 > da11 at mps0 bus 0 scbus0 target 12 lun 0 > ses0 at mps0 bus 0 scbus0 target 13 lun 0 > > mps0@pci0:4:0:0: class=0x010700 card=0x040015d9 chip=0x00721000 rev=0x02 > hdr=0x00 > vendor = 'LSI Logic (Was: Symbios Logic, NCR)' > class = mass storage > subclass = SAS > > at scbus0 target 1 lun 0 (pass0,da0) > at scbus0 target 2 lun 0 (pass1,da1) > at scbus0 target 3 lun 0 (pass2,da2) > at scbus0 target 4 lun 0 (pass3,da3) > at scbus0 target 5 lun 0 (pass4,da4) > at scbus0 target 6 lun 0 (pass5,da5) > at scbus0 target 7 lun 0 (pass6,da6) > at scbus0 target 8 lun 0 (pass7,da7) > at scbus0 target 9 lun 0 (pass8,da8) > at scbus0 target 10 lun 0 (pass9,da9) > at scbus0 target 11 lun 0 (pass10,da10) > at scbus0 target 12 lun 0 (pass11,da11) > at scbus0 target 13 lun 0 (ses0,pass12) > > Thank you sirs, > > Jason Wolfe > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" And my logs to compare: (note my root, swap, zfs cache, and zfs log is on the disk that fails). ) root@bcnas1:/var/log# swapinfo Device 1K-blocks Used Avail Capacity /dev/gpt/swap0 524288 5840 518448 1% /dev/gpt/swap1 524288 5640 518648 1% Total 1048576 11480 1037096 1% When it starts happening, it looks like this: Oct 29 00:02:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 220 Oct 29 00:02:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 87 Oct 29 00:02:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 795 Oct 29 00:02:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 423 Oct 29 00:02:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 338 Oct 29 02:19:12 bcnas1 kernel: :9:0): SCSI command timeout on device handle 0x0016 SMID 170 Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 637 Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 335 Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 798 Oct 29 02:19:12 bcnas1 kernel: mps0: (0:9:0) terminated ioc 804b scsi 0 state c xfer 0 Oct 29 02:19:12 bcnas1 last message repeated 14 times Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 991 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 4 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 4 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 227 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 227 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 652 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 652 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 125 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 125 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 101 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 1017 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 100 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 1004 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 487 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 487 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 279 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 279 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 929 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 929 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 346 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 346 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 817 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 817 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 170 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 170 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 637 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 637 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 335 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 335 complete Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 798 Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 798 complete Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 757 Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 833 Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 804 Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 464 Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 144 Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 912 Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 753 Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 422 Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 241 And then just before I rebooted it, basically looked the same, with the different messages mixed together: ct 31 07:52:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 807 ct 31 07:52:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 807 complete ct 31 07:53:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 1006 ct 31 07:53:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 111 ct 31 07:53:20 bcnas1 kernel: mps0: (0:9:0) terminated ioc 804b scsi 0 state c xfer 0 ct 31 07:53:20 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 1006 complete ct 31 07:53:20 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 111 ct 31 07:53:20 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 111 complete ct 31 07:54:20 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 669 ct 31 07:54:20 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 912 ct 31 07:54:28 bcnas1 kernel: mps0: (0:9:0) terminated ioc 804b scsi 0 state c xfer 0 ct 31 07:54:28 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 669 complete ct 31 07:54:28 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 912 ct 31 07:54:28 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 912 complete ct 31 07:55:29 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 804 ct 31 07:55:29 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 1001 ct 31 07:55:36 bcnas1 kernel: mps0: (0:9:0) terminated ioc 804b scsi 0 state c xfer 0 ct 31 07:55:36 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 804 complete ct 31 07:55:36 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 1001 ct 31 07:55:36 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 1001 complete ct 31 07:56:36 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 389 ct 31 07:56:36 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 885 ct 31 07:56:44 bcnas1 kernel: mps0: (0:9:0) terminated ioc 804b scsi 0 state c xfer 0 ct 31 07:56:44 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 389 complete ct 31 07:56:44 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending deferred task management request for handle 0x16 SMID 885 ct 31 07:56:44 bcnas1 kernel: swap_pager: I/O error - pageout failed; blkno 131393,size 65536, error 5 ct 31 07:56:44 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 885 complete ct 31 07:57:45 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 442 ct 31 07:57:48 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 442 complete ct 31 07:58:49 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 413 ct 31 07:58:52 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 413 complete ct 31 07:59:53 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 90 ct 31 07:59:56 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 90 complete ct 31 08:00:56 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 504 ct 31 08:01:00 bcnas1 kernel: mps0: mpssas_abort_complete: abort request on handle 0x16 SMID 504 complete ct 31 08:02:01 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on device handle 0x0016 SMID 861 ct 31 08:02:04 bcnas1 kernel: mps0: swap_pager: I/O error - pageout failed; blkno 131409,size 49152, error 5mpssas_abort_complete: abort request on handle 0x16 SMID 861 complete From owner-freebsd-scsi@FreeBSD.ORG Tue Nov 1 20:32:04 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76867106566B for ; Tue, 1 Nov 2011 20:32:04 +0000 (UTC) (envelope-from nitroboost@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 004EC8FC16 for ; Tue, 1 Nov 2011 20:32:03 +0000 (UTC) Received: by bkbzs2 with SMTP id zs2so5225899bkb.13 for ; Tue, 01 Nov 2011 13:32:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=+Fn/8sZCdbJlV92LzMXlrggIA4HOQJixyp4tB0xttk8=; b=YJSajKwityjlkE+LWXxvjnx90odRVY0+VAOF5/nzH8MS2SIe6pqAzX4xcAhEm/xpOB hEXVsJJHVCgv/4CV+Vsrs/jH7EB9OHmHwjdcdur9tt0Y8HTH9Yzx34Bed6sAAHYSgI8v OhKwBlFyocOQIW6nDVF3D4PaatHOgMuEesiTc= MIME-Version: 1.0 Received: by 10.182.74.41 with SMTP id q9mr257137obv.28.1320179522178; Tue, 01 Nov 2011 13:32:02 -0700 (PDT) Received: by 10.182.35.193 with HTTP; Tue, 1 Nov 2011 13:32:01 -0700 (PDT) In-Reply-To: <4EAEF431.7090108@brockmann-consult.de> References: <4EAEF431.7090108@brockmann-consult.de> Date: Tue, 1 Nov 2011 13:32:01 -0700 Message-ID: From: Jason Wolfe To: Peter Maloney Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-scsi@freebsd.org Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with upped disk tags X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Nov 2011 20:32:04 -0000 On Mon, Oct 31, 2011 at 12:17 PM, Peter Maloney < peter.maloney@brockmann-consult.de> wrote: > Dear Jason, > > I get a simlar problem on a system with an LSI 9211-8i with 20 SATA > disks attached (2 SSDs and 18 spnning disks). My system doesn't hang, > panic, or reset though. I just lose access to one disk, which is then > considered FAULTED in my zpool status (with the ZFS file system). If I > physically remove the FAULTED disk and run "gpart recover da0", I get a > panic. Otherwise, the system keeps running in a degraded state. When I > reboot and resilver, some data is found damaged and repaired, not just > refreshed with the latest state. The server has 1 HBA and 2 backplanes, > and I have the 2 mirrored root disks on different backplanes. Maybe that > is why mine runs degraded and yours hang. > > This happened twice so far (in around a month or two), and both times it > was one of the mirrored root disks (SSDs) that faulted. > > My tags are set to 255. I will try reproducing it as you said, and then > if it fails, rebooting and trying again setting tags to 2 as you suggested. > > And *thank you very much for this information*. This is the last > outstanding issue with this server. I hope this workaround helps. > > # camcontrol tags /dev/da0 > (pass0:mps0:0:7:0): device openings: 255 > Peter, This happens 'randomly' for you, or do you have some automated process running smartctl that trips the drives up occasionally? The way I'm getting around it currently is to just move /usr/local/sbin/smartctl elsewhere, and replacing it with a wrapper that simply drops the tags to 1, executes to the new smartctl location with the options passed, then moves the tags back to whatever you prefer. There will obviously be a small detriment here, but it should be fairly quick and hopefully not even noticeable in your case. If smartctl is not triggering these events for you, any idea what is? Jason From owner-freebsd-scsi@FreeBSD.ORG Tue Nov 1 21:30:15 2011 Return-Path: Delivered-To: freebsd-scsi@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9FCC9106566C for ; Tue, 1 Nov 2011 21:30:15 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8E71F8FC0C for ; Tue, 1 Nov 2011 21:30:15 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id pA1LUFkF011387 for ; Tue, 1 Nov 2011 21:30:15 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id pA1LUFel011384; Tue, 1 Nov 2011 21:30:15 GMT (envelope-from gnats) Date: Tue, 1 Nov 2011 21:30:15 GMT Message-Id: <201111012130.pA1LUFel011384@freefall.freebsd.org> To: freebsd-scsi@FreeBSD.org From: dfilter@FreeBSD.ORG (dfilter service) Cc: Subject: Re: kern/124667: commit references a PR X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dfilter service List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Nov 2011 21:30:15 -0000 The following reply was made to PR kern/124667; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/124667: commit references a PR Date: Tue, 1 Nov 2011 21:27:08 +0000 (UTC) Author: marius Date: Tue Nov 1 21:26:57 2011 New Revision: 227006 URL: http://svn.freebsd.org/changeset/base/227006 Log: Add a PCI front-end to esp(4) allowing it to support AMD Am53C974 and replace amd(4) with the former in the amd64, i386 and pc98 GENERIC kernel configuration files. Besides duplicating functionality, amd(4), which previously also supported the AMD Am53C974, unlike esp(4) is no longer maintained and has accumulated enough bit rot over time to always cause a panic during boot as long as at least one target is attached to it (see PR 124667). PR: 124667 Obtained from: NetBSD (based on) MFC after: 3 days Added: head/sys/dev/esp/am53c974reg.h (contents, props changed) head/sys/dev/esp/esp_pci.c (contents, props changed) Modified: head/UPDATING head/sys/amd64/conf/GENERIC head/sys/conf/NOTES head/sys/conf/files head/sys/i386/conf/GENERIC head/sys/modules/Makefile head/sys/modules/esp/Makefile head/sys/pc98/conf/GENERIC head/sys/sparc64/conf/GENERIC Modified: head/UPDATING ============================================================================== --- head/UPDATING Tue Nov 1 21:21:36 2011 (r227005) +++ head/UPDATING Tue Nov 1 21:26:57 2011 (r227006) @@ -22,6 +22,10 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 10 machines to maximize performance. (To disable malloc debugging, run ln -s aj /etc/malloc.conf.) +20111101: + The broken amd(4) driver has been replaced with esp(4) in the amd64, + i386 and pc98 GENERIC kernel configuration files. + 20110930: sysinstall has been removed Modified: head/sys/amd64/conf/GENERIC ============================================================================== --- head/sys/amd64/conf/GENERIC Tue Nov 1 21:21:36 2011 (r227005) +++ head/sys/amd64/conf/GENERIC Tue Nov 1 21:26:57 2011 (r227006) @@ -107,7 +107,7 @@ options AHC_REG_PRETTY_PRINT # Print re device ahd # AHA39320/29320 and onboard AIC79xx devices options AHD_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~215k to driver. -device amd # AMD 53C974 (Tekram DC-390(T)) +device esp # AMD Am53C974 (Tekram DC-390(T)) device hptiop # Highpoint RocketRaid 3xxx series device isp # Qlogic family #device ispfw # Firmware for QLogic HBAs- normally a module Modified: head/sys/conf/NOTES ============================================================================== --- head/sys/conf/NOTES Tue Nov 1 21:21:36 2011 (r227005) +++ head/sys/conf/NOTES Tue Nov 1 21:26:57 2011 (r227006) @@ -1459,7 +1459,9 @@ options TEKEN_UTF8 # UTF-8 output hand # such as the Tekram DC-390(T). # bt: Most Buslogic controllers: including BT-445, BT-54x, BT-64x, BT-74x, # BT-75x, BT-946, BT-948, BT-956, BT-958, SDC3211B, SDC3211F, SDC3222F -# esp: NCR53c9x. Only for SBUS hardware right now. +# esp: Emulex ESP, NCR 53C9x and QLogic FAS families based controllers +# including the AMD Am53C974 (found on devices such as the Tekram +# DC-390(T)) and the Sun ESP and FAS families of controllers # isp: Qlogic ISP 1020, 1040 and 1040B PCI SCSI host adapters, # ISP 1240 Dual Ultra SCSI, ISP 1080 and 1280 (Dual) Ultra2, # ISP 12160 Ultra3 SCSI, Modified: head/sys/conf/files ============================================================================== --- head/sys/conf/files Tue Nov 1 21:21:36 2011 (r227005) +++ head/sys/conf/files Tue Nov 1 21:26:57 2011 (r227006) @@ -1064,6 +1064,7 @@ dev/ep/if_ep_eisa.c optional ep eisa dev/ep/if_ep_isa.c optional ep isa dev/ep/if_ep_mca.c optional ep mca dev/ep/if_ep_pccard.c optional ep pccard +dev/esp/esp_pci.c optional esp pci dev/esp/ncr53c9x.c optional esp dev/ex/if_ex.c optional ex dev/ex/if_ex_isa.c optional ex isa Added: head/sys/dev/esp/am53c974reg.h ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/sys/dev/esp/am53c974reg.h Tue Nov 1 21:26:57 2011 (r227006) @@ -0,0 +1,72 @@ +/* $NetBSD: pcscpreg.h,v 1.2 2008/04/28 20:23:55 martin Exp $ */ + +/*- + * Copyright (c) 1998 The NetBSD Foundation, Inc. + * All rights reserved. + * + * This code is derived from software contributed to The NetBSD Foundation + * by Izumi Tsutsui. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +/* $FreeBSD$ */ + +#ifndef _AM53C974_H_ +#define _AM53C974_H_ + +/* + * Am53c974 DMA engine registers + */ + +#define DMA_CMD 0x40 /* Command */ +#define DMACMD_RSVD 0xFFFFFF28 /* reserved */ +#define DMACMD_DIR 0x00000080 /* Transfer Direction (read:1) */ +#define DMACMD_INTE 0x00000040 /* DMA Interrupt Enable */ +#define DMACMD_MDL 0x00000010 /* Map to Memory Description List */ +#define DMACMD_DIAG 0x00000004 /* Diagnostic */ +#define DMACMD_CMD 0x00000003 /* Command Code Bit */ +#define DMACMD_IDLE 0x00000000 /* Idle */ +#define DMACMD_BLAST 0x00000001 /* Blast */ +#define DMACMD_ABORT 0x00000002 /* Abort */ +#define DMACMD_START 0x00000003 /* Start */ + +#define DMA_STC 0x44 /* Start Transfer Count */ +#define DMA_SPA 0x48 /* Start Physical Address */ +#define DMA_WBC 0x4C /* Working Byte Counter */ +#define DMA_WAC 0x50 /* Working Address Counter */ + +#define DMA_STAT 0x54 /* Status Register */ +#define DMASTAT_RSVD 0xFFFFFF80 /* reserved */ +#define DMASTAT_PABT 0x00000040 /* PCI master/target Abort */ +#define DMASTAT_BCMP 0x00000020 /* BLAST Complete */ +#define DMASTAT_SINT 0x00000010 /* SCSI Interrupt */ +#define DMASTAT_DONE 0x00000008 /* DMA Transfer Terminated */ +#define DMASTAT_ABT 0x00000004 /* DMA Transfer Aborted */ +#define DMASTAT_ERR 0x00000002 /* DMA Transfer Error */ +#define DMASTAT_PWDN 0x00000001 /* Power Down Indicator */ + +#define DMA_SMDLA 0x58 /* Starting Memory Descpritor List Address */ +#define DMA_WMAC 0x5C /* Working MDL Counter */ +#define DMA_SBAC 0x70 /* SCSI Bus and Control */ + +#endif /* _AM53C974_H_ */ Added: head/sys/dev/esp/esp_pci.c ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/sys/dev/esp/esp_pci.c Tue Nov 1 21:26:57 2011 (r227006) @@ -0,0 +1,654 @@ +/*- + * Copyright (c) 2011 Marius Strobl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +/* $NetBSD: pcscp.c,v 1.45 2010/11/13 13:52:08 uebayasi Exp $ */ + +/*- + * Copyright (c) 1997, 1998, 1999 The NetBSD Foundation, Inc. + * All rights reserved. + * + * This code is derived from software contributed to The NetBSD Foundation + * by Jason R. Thorpe of the Numerical Aerospace Simulation Facility, + * NASA Ames Research Center; Izumi Tsutsui. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * esp_pci.c: device dependent code for AMD Am53c974 (PCscsi-PCI) + * written by Izumi Tsutsui + * + * Technical manual available at + * http://www.amd.com/files/connectivitysolutions/networking/archivednetworking/19113.pdf + */ + +#include +__FBSDID("$FreeBSD$"); + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include +#include +#include + +#include +#include + +#include +#include + +#include + +#define PCI_DEVICE_ID_AMD53C974 0x20201022 + +struct esp_pci_softc { + struct ncr53c9x_softc sc_ncr53c9x; /* glue to MI code */ + struct device *sc_dev; + + struct resource *sc_res[2]; +#define ESP_PCI_RES_INTR 0 +#define ESP_PCI_RES_IO 1 + + bus_dma_tag_t sc_pdmat; + + bus_dma_tag_t sc_xferdmat; /* DMA tag for transfers */ + bus_dmamap_t sc_xferdmam; /* DMA map for transfers */ + + void *sc_ih; /* interrupt handler */ + + size_t sc_dmasize; /* DMA size */ + void **sc_dmaaddr; /* DMA address */ + size_t *sc_dmalen; /* DMA length */ + int sc_active; /* DMA state */ + int sc_datain; /* DMA Data Direction */ +}; + +static struct resource_spec esp_pci_res_spec[] = { + { SYS_RES_IRQ, 0, RF_SHAREABLE | RF_ACTIVE }, /* ESP_PCI_RES_INTR */ + { SYS_RES_IOPORT, PCIR_BAR(0), RF_ACTIVE }, /* ESP_PCI_RES_IO */ + { -1, 0 } +}; + +#define READ_DMAREG(sc, reg) \ + bus_read_4((sc)->sc_res[ESP_PCI_RES_IO], (reg)) +#define WRITE_DMAREG(sc, reg, var) \ + bus_write_4((sc)->sc_res[ESP_PCI_RES_IO], (reg), (var)) + +#define READ_ESPREG(sc, reg) \ + bus_read_1((sc)->sc_res[ESP_PCI_RES_IO], (reg) << 2) +#define WRITE_ESPREG(sc, reg, val) \ + bus_write_1((sc)->sc_res[ESP_PCI_RES_IO], (reg) << 2, (val)) + +static int esp_pci_probe(device_t); +static int esp_pci_attach(device_t); +static int esp_pci_detach(device_t); +static int esp_pci_suspend(device_t); +static int esp_pci_resume(device_t); + +static device_method_t esp_pci_methods[] = { + DEVMETHOD(device_probe, esp_pci_probe), + DEVMETHOD(device_attach, esp_pci_attach), + DEVMETHOD(device_detach, esp_pci_detach), + DEVMETHOD(device_suspend, esp_pci_suspend), + DEVMETHOD(device_resume, esp_pci_resume), + + KOBJMETHOD_END +}; + +static driver_t esp_pci_driver = { + "esp", + esp_pci_methods, + sizeof(struct esp_pci_softc) +}; + +DRIVER_MODULE(esp, pci, esp_pci_driver, esp_devclass, 0, 0); +MODULE_DEPEND(esp, pci, 1, 1, 1); + +/* + * Functions and the switch for the MI code + */ +static void esp_pci_dma_go(struct ncr53c9x_softc *); +static int esp_pci_dma_intr(struct ncr53c9x_softc *); +static int esp_pci_dma_isactive(struct ncr53c9x_softc *); + +static int esp_pci_dma_isintr(struct ncr53c9x_softc *); +static void esp_pci_dma_reset(struct ncr53c9x_softc *); +static int esp_pci_dma_setup(struct ncr53c9x_softc *, void **, size_t *, + int, size_t *); +static void esp_pci_dma_stop(struct ncr53c9x_softc *); +static void esp_pci_write_reg(struct ncr53c9x_softc *, int, uint8_t); +static uint8_t esp_pci_read_reg(struct ncr53c9x_softc *, int); +static void esp_pci_xfermap(void *arg, bus_dma_segment_t *segs, int nseg, + int error); + +static struct ncr53c9x_glue esp_pci_glue = { + esp_pci_read_reg, + esp_pci_write_reg, + esp_pci_dma_isintr, + esp_pci_dma_reset, + esp_pci_dma_intr, + esp_pci_dma_setup, + esp_pci_dma_go, + esp_pci_dma_stop, + esp_pci_dma_isactive, +}; + +static int +esp_pci_probe(device_t dev) +{ + + if (pci_get_devid(dev) == PCI_DEVICE_ID_AMD53C974) { + device_set_desc(dev, "AMD Am53C974 Fast-SCSI"); + return (BUS_PROBE_DEFAULT); + } + + return (ENXIO); +} + +/* + * Attach this instance, and then all the sub-devices + */ +static int +esp_pci_attach(device_t dev) +{ + struct esp_pci_softc *esc; + struct ncr53c9x_softc *sc; + int error; + + esc = device_get_softc(dev); + sc = &esc->sc_ncr53c9x; + + NCR_LOCK_INIT(sc); + + esc->sc_dev = dev; + sc->sc_glue = &esp_pci_glue; + + pci_enable_busmaster(dev); + + error = bus_alloc_resources(dev, esp_pci_res_spec, esc->sc_res); + if (error != 0) { + device_printf(dev, "failed to allocate resources\n"); + bus_release_resources(dev, esp_pci_res_spec, esc->sc_res); + return (error); + } + + error = bus_dma_tag_create(bus_get_dma_tag(dev), 1, 0, + BUS_SPACE_MAXADDR_32BIT, BUS_SPACE_MAXADDR, NULL, NULL, + BUS_SPACE_MAXSIZE_32BIT, BUS_SPACE_UNRESTRICTED, + BUS_SPACE_MAXSIZE_32BIT, 0, NULL, NULL, &esc->sc_pdmat); + if (error != 0) { + device_printf(dev, "cannot create parent DMA tag\n"); + goto fail_res; + } + + /* + * XXX More of this should be in ncr53c9x_attach(), but + * XXX should we really poke around the chip that much in + * XXX the MI code? Think about this more... + */ + + /* + * Set up static configuration info. + * + * XXX we should read the configuration from the EEPROM. + */ + sc->sc_id = 7; + sc->sc_cfg1 = sc->sc_id | NCRCFG1_PARENB; + sc->sc_cfg2 = NCRCFG2_SCSI2 | NCRCFG2_FE; + sc->sc_cfg3 = NCRAMDCFG3_IDM | NCRAMDCFG3_FCLK; + sc->sc_cfg4 = NCRAMDCFG4_GE12NS | NCRAMDCFG4_RADE; + sc->sc_rev = NCR_VARIANT_AM53C974; + sc->sc_features = NCR_F_FASTSCSI | NCR_F_DMASELECT; + sc->sc_cfg3_fscsi = NCRAMDCFG3_FSCSI; + sc->sc_freq = 40; /* MHz */ + + /* + * This is the value used to start sync negotiations + * Note that the NCR register "SYNCTP" is programmed + * in "clocks per byte", and has a minimum value of 4. + * The SCSI period used in negotiation is one-fourth + * of the time (in nanoseconds) needed to transfer one byte. + * Since the chip's clock is given in MHz, we have the following + * formula: 4 * period = (1000 / freq) * 4 + */ + sc->sc_minsync = 1000 / sc->sc_freq; + + sc->sc_maxxfer = DFLTPHYS; /* see below */ + sc->sc_maxoffset = 15; + sc->sc_extended_geom = 1; + +#define MDL_SEG_SIZE 0x1000 /* 4kbyte per segment */ + + /* + * Create the DMA tag and map for the data transfers. + * + * Note: given that bus_dma(9) only adheres to the requested alignment + * for the first segment (and that also only for bus_dmamem_alloc()ed + * DMA maps) we can't use the Memory Descriptor List. However, also + * when not using the MDL, the maximum transfer size apparently is + * limited to 4k so we have to split transfers up, which plain sucks. + */ + error = bus_dma_tag_create(esc->sc_pdmat, PAGE_SIZE, 0, + BUS_SPACE_MAXADDR_32BIT, BUS_SPACE_MAXADDR, NULL, NULL, + MDL_SEG_SIZE, 1, MDL_SEG_SIZE, BUS_DMA_ALLOCNOW, + busdma_lock_mutex, &sc->sc_lock, &esc->sc_xferdmat); + if (error != 0) { + device_printf(dev, "cannot create transfer DMA tag\n"); + goto fail_pdmat; + } + error = bus_dmamap_create(esc->sc_xferdmat, 0, &esc->sc_xferdmam); + if (error != 0) { + device_printf(dev, "cannnot create transfer DMA map\n"); + goto fail_xferdmat; + } + + error = bus_setup_intr(dev, esc->sc_res[ESP_PCI_RES_INTR], + INTR_MPSAFE | INTR_TYPE_CAM, NULL, ncr53c9x_intr, sc, + &esc->sc_ih); + if (error != 0) { + device_printf(dev, "cannot set up interrupt\n"); + goto fail_xferdmam; + } + + /* Do the common parts of attachment. */ + sc->sc_dev = esc->sc_dev; + error = ncr53c9x_attach(sc); + if (error != 0) { + device_printf(esc->sc_dev, "ncr53c9x_attach failed\n"); + goto fail_intr; + } + + return (0); + + fail_intr: + bus_teardown_intr(esc->sc_dev, esc->sc_res[ESP_PCI_RES_INTR], + esc->sc_ih); + fail_xferdmam: + bus_dmamap_destroy(esc->sc_xferdmat, esc->sc_xferdmam); + fail_xferdmat: + bus_dma_tag_destroy(esc->sc_xferdmat); + fail_pdmat: + bus_dma_tag_destroy(esc->sc_pdmat); + fail_res: + bus_release_resources(dev, esp_pci_res_spec, esc->sc_res); + NCR_LOCK_DESTROY(sc); + + return (error); +} + +static int +esp_pci_detach(device_t dev) +{ + struct ncr53c9x_softc *sc; + struct esp_pci_softc *esc; + int error; + + esc = device_get_softc(dev); + sc = &esc->sc_ncr53c9x; + + bus_teardown_intr(esc->sc_dev, esc->sc_res[ESP_PCI_RES_INTR], + esc->sc_ih); + error = ncr53c9x_detach(sc); + if (error != 0) + return (error); + bus_dmamap_destroy(esc->sc_xferdmat, esc->sc_xferdmam); + bus_dma_tag_destroy(esc->sc_xferdmat); + bus_dma_tag_destroy(esc->sc_pdmat); + bus_release_resources(dev, esp_pci_res_spec, esc->sc_res); + NCR_LOCK_DESTROY(sc); + + return (0); +} + +static int +esp_pci_suspend(device_t dev) +{ + + return (ENXIO); +} + +static int +esp_pci_resume(device_t dev) +{ + + return (ENXIO); +} + +static void +esp_pci_xfermap(void *arg, bus_dma_segment_t *segs, int nsegs, int error) +{ + struct esp_pci_softc *esc = (struct esp_pci_softc *)arg; + + if (error != 0) + return; + + KASSERT(nsegs == 1, ("%s: bad transfer segment count %d", __func__, + nsegs)); + KASSERT(segs[0].ds_len <= MDL_SEG_SIZE, + ("%s: bad transfer segment length %ld", __func__, + (long)segs[0].ds_len)); + + /* Program the DMA Starting Physical Address. */ + WRITE_DMAREG(esc, DMA_SPA, segs[0].ds_addr); +} + +/* + * Glue functions + */ + +static uint8_t +esp_pci_read_reg(struct ncr53c9x_softc *sc, int reg) +{ + struct esp_pci_softc *esc = (struct esp_pci_softc *)sc; + + return (READ_ESPREG(esc, reg)); +} + +static void +esp_pci_write_reg(struct ncr53c9x_softc *sc, int reg, uint8_t v) +{ + struct esp_pci_softc *esc = (struct esp_pci_softc *)sc; + + WRITE_ESPREG(esc, reg, v); +} + +static int +esp_pci_dma_isintr(struct ncr53c9x_softc *sc) +{ + struct esp_pci_softc *esc = (struct esp_pci_softc *)sc; + + return (READ_ESPREG(esc, NCR_STAT) & NCRSTAT_INT) != 0; +} + +static void +esp_pci_dma_reset(struct ncr53c9x_softc *sc) +{ + struct esp_pci_softc *esc = (struct esp_pci_softc *)sc; + + WRITE_DMAREG(esc, DMA_CMD, DMACMD_IDLE); + + esc->sc_active = 0; +} + +static int +esp_pci_dma_intr(struct ncr53c9x_softc *sc) +{ + struct esp_pci_softc *esc = (struct esp_pci_softc *)sc; + bus_dma_tag_t xferdmat; + bus_dmamap_t xferdmam; + size_t dmasize; + int datain, i, resid, trans; + uint32_t dmastat; + char *p = NULL; + + xferdmat = esc->sc_xferdmat; + xferdmam = esc->sc_xferdmam; + datain = esc->sc_datain; + + dmastat = READ_DMAREG(esc, DMA_STAT); + + if ((dmastat & DMASTAT_ERR) != 0) { + /* XXX not tested... */ + WRITE_DMAREG(esc, DMA_CMD, DMACMD_ABORT | (datain != 0 ? + DMACMD_DIR : 0)); + + device_printf(esc->sc_dev, "DMA error detected; Aborting.\n"); + bus_dmamap_sync(xferdmat, xferdmam, datain != 0 ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); + bus_dmamap_unload(xferdmat, xferdmam); + return (-1); + } + + if ((dmastat & DMASTAT_ABT) != 0) { + /* XXX what should be done? */ + device_printf(esc->sc_dev, "DMA aborted.\n"); + WRITE_DMAREG(esc, DMA_CMD, DMACMD_IDLE | (datain != 0 ? + DMACMD_DIR : 0)); + esc->sc_active = 0; + return (0); + } + + KASSERT(esc->sc_active != 0, ("%s: DMA wasn't active", __func__)); + + /* DMA has stopped. */ + + esc->sc_active = 0; + + dmasize = esc->sc_dmasize; + if (dmasize == 0) { + /* A "Transfer Pad" operation completed. */ + NCR_DMA(("%s: discarded %d bytes (tcl=%d, tcm=%d)\n", + __func__, READ_ESPREG(esc, NCR_TCL) | + (READ_ESPREG(esc, NCR_TCM) << 8), + READ_ESPREG(esc, NCR_TCL), READ_ESPREG(esc, NCR_TCM))); + return (0); + } + + resid = 0; + /* + * If a transfer onto the SCSI bus gets interrupted by the device + * (e.g. for a SAVEPOINTER message), the data in the FIFO counts + * as residual since the ESP counter registers get decremented as + * bytes are clocked into the FIFO. + */ + if (datain == 0 && + (resid = (READ_ESPREG(esc, NCR_FFLAG) & NCRFIFO_FF)) != 0) + NCR_DMA(("%s: empty esp FIFO of %d ", __func__, resid)); + + if ((sc->sc_espstat & NCRSTAT_TC) == 0) { + /* + * "Terminal count" is off, so read the residue + * out of the ESP counter registers. + */ + if (datain != 0) { + resid = READ_ESPREG(esc, NCR_FFLAG) & NCRFIFO_FF; + while (resid > 1) + resid = + READ_ESPREG(esc, NCR_FFLAG) & NCRFIFO_FF; + WRITE_DMAREG(esc, DMA_CMD, DMACMD_BLAST | DMACMD_DIR); + + for (i = 0; i < 0x8000; i++) /* XXX 0x8000 ? */ + if ((READ_DMAREG(esc, DMA_STAT) & + DMASTAT_BCMP) != 0) + break; + + /* See the below comments... */ + if (resid != 0) + p = *esc->sc_dmaaddr; + } + + resid += READ_ESPREG(esc, NCR_TCL) | + (READ_ESPREG(esc, NCR_TCM) << 8) | + (READ_ESPREG(esc, NCR_TCH) << 16); + } else + while ((dmastat & DMASTAT_DONE) == 0) + dmastat = READ_DMAREG(esc, DMA_STAT); + + WRITE_DMAREG(esc, DMA_CMD, DMACMD_IDLE | (datain != 0 ? + DMACMD_DIR : 0)); + + /* Sync the transfer buffer. */ + bus_dmamap_sync(xferdmat, xferdmam, datain != 0 ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); + bus_dmamap_unload(xferdmat, xferdmam); + + trans = dmasize - resid; + + /* + * From the technical manual notes: + * + * "In some odd byte conditions, one residual byte will be left + * in the SCSI FIFO, and the FIFO flags will never count to 0. + * When this happens, the residual byte should be retrieved + * via PIO following completion of the BLAST operation." + */ + if (p != NULL) { + p += trans; + *p = READ_ESPREG(esc, NCR_FIFO); + trans++; + } + + if (trans < 0) { /* transferred < 0 ? */ +#if 0 + /* + * This situation can happen in perfectly normal operation + * if the ESP is reselected while using DMA to select + * another target. As such, don't print the warning. + */ + device_printf(dev, "xfer (%d) > req (%d)\n", trans, dmasize); +#endif + trans = dmasize; + } + + NCR_DMA(("%s: tcl=%d, tcm=%d, tch=%d; trans=%d, resid=%d\n", __func__, + READ_ESPREG(esc, NCR_TCL), READ_ESPREG(esc, NCR_TCM), + READ_ESPREG(esc, NCR_TCH), trans, resid)); + + *esc->sc_dmalen -= trans; + *esc->sc_dmaaddr = (char *)*esc->sc_dmaaddr + trans; + + return (0); +} + +static int +esp_pci_dma_setup(struct ncr53c9x_softc *sc, void **addr, size_t *len, + int datain, size_t *dmasize) +{ + struct esp_pci_softc *esc = (struct esp_pci_softc *)sc; + int error; + + WRITE_DMAREG(esc, DMA_CMD, DMACMD_IDLE | (datain != 0 ? DMACMD_DIR : + 0)); + + *dmasize = esc->sc_dmasize = ulmin(*dmasize, MDL_SEG_SIZE); + esc->sc_dmaaddr = addr; + esc->sc_dmalen = len; + esc->sc_datain = datain; + + /* + * There's no need to set up DMA for a "Transfer Pad" operation. + */ + if (*dmasize == 0) + return (0); + + /* Set the transfer length. */ + WRITE_DMAREG(esc, DMA_STC, *dmasize); + + /* + * Load the transfer buffer and program the DMA address. + * Note that the NCR53C9x core can't handle EINPROGRESS so we set + * BUS_DMA_NOWAIT. + */ + error = bus_dmamap_load(esc->sc_xferdmat, esc->sc_xferdmam, + *esc->sc_dmaaddr, *dmasize, esp_pci_xfermap, sc, BUS_DMA_NOWAIT); + + return (error); +} + +static void +esp_pci_dma_go(struct ncr53c9x_softc *sc) +{ + struct esp_pci_softc *esc = (struct esp_pci_softc *)sc; + int datain; + + datain = esc->sc_datain; + + /* No DMA transfer for a "Transfer Pad" operation */ + if (esc->sc_dmasize == 0) + return; + + /* Sync the transfer buffer. */ + bus_dmamap_sync(esc->sc_xferdmat, esc->sc_xferdmam, datain != 0 ? + BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE); + + /* Set the DMA engine to the IDLE state. */ + /* XXX DMA Transfer Interrupt Enable bit is broken? */ + WRITE_DMAREG(esc, DMA_CMD, DMACMD_IDLE | /* DMACMD_INTE | */ + (datain != 0 ? DMACMD_DIR : 0)); + + /* Issue a DMA start command. */ + WRITE_DMAREG(esc, DMA_CMD, DMACMD_START | /* DMACMD_INTE | */ + (datain != 0 ? DMACMD_DIR : 0)); + + esc->sc_active = 1; +} + +static void +esp_pci_dma_stop(struct ncr53c9x_softc *sc) +{ + struct esp_pci_softc *esc = (struct esp_pci_softc *)sc; + + /* DMA stop */ + /* XXX what should we do here ? */ + WRITE_DMAREG(esc, DMA_CMD, + DMACMD_ABORT | (esc->sc_datain != 0 ? DMACMD_DIR : 0)); + bus_dmamap_unload(esc->sc_xferdmat, esc->sc_xferdmam); + + esc->sc_active = 0; +} + +static int +esp_pci_dma_isactive(struct ncr53c9x_softc *sc) +{ + struct esp_pci_softc *esc = (struct esp_pci_softc *)sc; + + /* XXX should we check esc->sc_active? */ + if ((READ_DMAREG(esc, DMA_CMD) & DMACMD_CMD) != DMACMD_IDLE) + return (1); + + return (0); +} Modified: head/sys/i386/conf/GENERIC ============================================================================== --- head/sys/i386/conf/GENERIC Tue Nov 1 21:21:36 2011 (r227005) +++ head/sys/i386/conf/GENERIC Tue Nov 1 21:26:57 2011 (r227006) @@ -110,7 +110,7 @@ options AHC_REG_PRETTY_PRINT # Print re device ahd # AHA39320/29320 and onboard AIC79xx devices options AHD_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~215k to driver. -device amd # AMD 53C974 (Tekram DC-390(T)) +device esp # AMD Am53C974 (Tekram DC-390(T)) device hptiop # Highpoint RocketRaid 3xxx series device isp # Qlogic family #device ispfw # Firmware for QLogic HBAs- normally a module Modified: head/sys/modules/Makefile ============================================================================== --- head/sys/modules/Makefile Tue Nov 1 21:21:36 2011 (r227005) +++ head/sys/modules/Makefile Tue Nov 1 21:26:57 2011 (r227006) @@ -89,6 +89,7 @@ SUBDIR= ${_3dfx} \ en \ ${_ep} \ ${_epic} \ + esp \ ${_et} \ ${_ex} \ ${_exca} \ Modified: head/sys/modules/esp/Makefile ============================================================================== --- head/sys/modules/esp/Makefile Tue Nov 1 21:21:36 2011 (r227005) +++ head/sys/modules/esp/Makefile Tue Nov 1 21:26:57 2011 (r227006) @@ -3,7 +3,8 @@ .PATH: ${.CURDIR}/../../dev/esp KMOD= esp -SRCS= device_if.h ${esp_sbus} bus_if.h ncr53c9x.c ${ofw_bus_if} opt_cam.h +SRCS= device_if.h esp_pci.c ${esp_sbus} bus_if.h ncr53c9x.c ${ofw_bus_if} +SRCS+= opt_cam.h pci_if.h .if ${MACHINE} == "sparc64" ofw_bus_if= ofw_bus_if.h Modified: head/sys/pc98/conf/GENERIC ============================================================================== --- head/sys/pc98/conf/GENERIC Tue Nov 1 21:21:36 2011 (r227005) +++ head/sys/pc98/conf/GENERIC Tue Nov 1 21:26:57 2011 (r227006) @@ -101,7 +101,7 @@ device siis # SiliconImage SiI3124/SiI # SCSI Controllers device adv # Advansys SCSI adapters device ahc # AHA2940 and onboard AIC7xxx devices -device amd # AMD 53C974 (Tekram DC-390(T)) +device esp # AMD Am53C974 (Tekram DC-390(T)) device isp # Qlogic family #device ncr # NCR/Symbios Logic device sym # NCR/Symbios Logic (newer chipsets + those of `ncr') Modified: head/sys/sparc64/conf/GENERIC ============================================================================== --- head/sys/sparc64/conf/GENERIC Tue Nov 1 21:21:36 2011 (r227005) +++ head/sys/sparc64/conf/GENERIC Tue Nov 1 21:26:57 2011 (r227006) @@ -103,11 +103,11 @@ device ahc # AHA2940 and onboard AIC7x options AHC_ALLOW_MEMIO # Attempt to use memory mapped I/O options AHC_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~128k to driver. +device esp # AMD Am53C974, Sun ESP and FAS families device isp # Qlogic family device ispfw # Firmware module for Qlogic host adapters device mpt # LSI-Logic MPT-Fusion device sym # NCR/Symbios/LSI Logic 53C8XX/53C1010/53C1510D -device esp # NCR53c9x (FEPS/FAS366) # ATA/SCSI peripherals device scbus # SCSI bus (required for ATA/SCSI) _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-scsi@FreeBSD.ORG Wed Nov 2 08:56:30 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 05000106564A for ; Wed, 2 Nov 2011 08:56:30 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.171]) by mx1.freebsd.org (Postfix) with ESMTP id A41808FC0C for ; Wed, 2 Nov 2011 08:56:29 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mrbap0) with ESMTP (Nemesis) id 0MWhTP-1RSWZ91Dsx-00XIsw; Wed, 02 Nov 2011 09:43:52 +0100 Message-ID: <4EB102C7.8080401@brockmann-consult.de> Date: Wed, 02 Nov 2011 09:43:51 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11 MIME-Version: 1.0 To: Jason Wolfe References: <4EAEF431.7090108@brockmann-consult.de> In-Reply-To: X-Enigmail-Version: 1.1.2 X-Provags-ID: V02:K0:l9N7rDkQkC+AsK40qVaA1cTE/ku/nKfJ0okSl1Qynrs Ka2sNOCjWC1hyonoMbaQpXymtJ2LtwiwMBSuVq7vs921YGoT26 Z8ys2XphzaR+0Liq/4uHWdt16gvXMCYlUm/6fHjoMrl7he8Cbk vgTIF77H3yUDH/0PRDhgxmIaUTbxWkdwCu8uyVIpST82509kWG 0oiLWhcvNao78rhX3f+dynv4tmFKOAQJw5p1zrnnIIc0aSGFll 5Rmh6LdrRTJv4xlwReOI2fFU4vXY3tznUq4L5uj+jVcarzejFp jJDa0fpZGNefckAoAs2ny1Lb7ST9xpafxr1Mc4q0f0WaTKU1Ik 4wHAQsIyyhaGb9fbuVmbMqgH+3DhIcGlGuP1xQfsL Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-scsi@freebsd.org Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with upped disk tags X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Nov 2011 08:56:30 -0000 On 11/01/2011 09:32 PM, Jason Wolfe wrote: > On Mon, Oct 31, 2011 at 12:17 PM, Peter Maloney > > wrote: > > Dear Jason, > > I get a simlar problem on a system with an LSI 9211-8i with 20 SATA > disks attached (2 SSDs and 18 spnning disks). My system doesn't hang, > panic, or reset though. I just lose access to one disk, which is then > considered FAULTED in my zpool status (with the ZFS file system). If I > physically remove the FAULTED disk and run "gpart recover da0", I > get a > panic. Otherwise, the system keeps running in a degraded state. > When I > reboot and resilver, some data is found damaged and repaired, not just > refreshed with the latest state. The server has 1 HBA and 2 > backplanes, > and I have the 2 mirrored root disks on different backplanes. > Maybe that > is why mine runs degraded and yours hang. > > This happened twice so far (in around a month or two), and both > times it > was one of the mirrored root disks (SSDs) that faulted. > > My tags are set to 255. I will try reproducing it as you said, and > then > if it fails, rebooting and trying again setting tags to 2 as you > suggested. > > And *thank you very much for this information*. This is the last > outstanding issue with this server. I hope this workaround helps. > > # camcontrol tags /dev/da0 > (pass0:mps0:0:7:0): device openings: 255 > > > Peter, > > This happens 'randomly' for you, or do you have some automated process > running smartctl that trips the drives up occasionally? It appears to be completely random, but it could be something specific going on that I just didn't think of. I don't know how to trigger it. I wrote a script once that looped over the disks once with smartctl (which I installed from ports) and recorded the device id, size of the disks, etc.. But it didn't cause a crash, and I didn't try looping it constantly to crash it. The system uses "zfs send" to send the whole pool to another machine. It uses rsync to back up some servers on to it. It serves a bunch of data over NFS and has samba online also but not in use. The primary user of the NFS shares is VMWare ESXi, which has a terrible problem with synchronous writes, which might put a heavier load on the system. > The way I'm getting around it currently is to just move > /usr/local/sbin/smartctl elsewhere, and replacing it with a wrapper > that simply drops the tags to 1, executes to the new smartctl location > with the options passed, then moves the tags back to whatever you > prefer. There will obviously be a small detriment here, but it should > be fairly quick and hopefully not even noticeable in your case. In my reading, I found that people think that reducing the io queues (via kernel parameters) for zfs actually improves performance (moving the queue to the OS I guess), so if the tags is similar, then I wasn't thinking there would be too much of a drop. And also luckily, this system of mine is not a performance machine... just a huge file server. So if it is slower but more stable that way, I will leave tags set to 2 forever. > > If smartctl is not triggering these events for you, any idea what is? I have no real clue, but my guess is that some NFS shares are using the ZIL (zfs log device) a lot, and since that device is horribly inefficient (scoring like 1500 iops during ZIL use on a disk that scores 50-140k on other tests), it causes the IO system to be overloaded, and trigger the failure, purely based on load rather than something particular like smartctl. So for now, I disabled my ZIL to see if it still crashes. Also on my list of things to try is: -change to the IT firmware instead of IR, since ZFS prefers to have no RAID in there at all. -change the tags to 2 -try the LSI driver for the 9210-8i http://www.lsi.com/products/storagec...AS9210-8i.aspx Here is my forum thread about it: http://forums.freebsd.org/showthread.php?t=26656 Are you using ZFS? Is your root volume in hardware RAID or software RAID? I am curious because you say your systems hang, and mine just runs degraded. > > Jason Peter -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de -------------------------------------------- From owner-freebsd-scsi@FreeBSD.ORG Wed Nov 2 16:47:57 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 97504106564A for ; Wed, 2 Nov 2011 16:47:57 +0000 (UTC) (envelope-from nitroboost@gmail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 4DA3B8FC1C for ; Wed, 2 Nov 2011 16:47:56 +0000 (UTC) Received: by vcbfk26 with SMTP id fk26so516273vcb.13 for ; Wed, 02 Nov 2011 09:47:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=0+4Fu6u6ezyfLm6dY/R8fLWiwLgF7i3LKqnWpE5Tzt4=; b=ouR0+9TPNC18HcKMs6XtJlPaRJTiLtWtTaTn2/RimveYNIlhAFa599445UbrvvYb/S 1pcvXOL2sn4qK67g9eAC6ETNXwFmwQ0ddYsE+scpHiWKYwO9SvowqEkfJPCDPxyFd/Sz jGeuyIVyugWv4+Mi2cT7TvtNDQJbYd4T8Qs58= MIME-Version: 1.0 Received: by 10.182.59.5 with SMTP id v5mr1032159obq.78.1320252476286; Wed, 02 Nov 2011 09:47:56 -0700 (PDT) Received: by 10.182.35.193 with HTTP; Wed, 2 Nov 2011 09:47:56 -0700 (PDT) In-Reply-To: <4EB102C7.8080401@brockmann-consult.de> References: <4EAEF431.7090108@brockmann-consult.de> <4EB102C7.8080401@brockmann-consult.de> Date: Wed, 2 Nov 2011 09:47:56 -0700 Message-ID: From: Jason Wolfe To: Peter Maloney Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-scsi@freebsd.org Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with upped disk tags X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Nov 2011 16:47:57 -0000 On Wed, Nov 2, 2011 at 1:43 AM, Peter Maloney < peter.maloney@brockmann-consult.de> wrote: > > Are you using ZFS? Is your root volume in hardware RAID or software RAID? > I am curious because you say your systems hang, and mine just runs degraded. > Peter, I'm running UFS and no RAID, so yes that likely explains why my systems hangs as it loses its boot disk. The controller itself resets on some less common occasions, so if you see that ever, I'll bet your system would hang too as it looses all root devices. I have the official 8.2-RELEASE driver from LSI I'll be testing today to see if I can reproduce the hangs. Jason From owner-freebsd-scsi@FreeBSD.ORG Wed Nov 2 18:05:47 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 052B910657BF for ; Wed, 2 Nov 2011 18:05:47 +0000 (UTC) (envelope-from nitroboost@gmail.com) Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com [209.85.215.182]) by mx1.freebsd.org (Postfix) with ESMTP id 81EE18FC3C for ; Wed, 2 Nov 2011 18:05:46 +0000 (UTC) Received: by eyd10 with SMTP id 10so564610eyd.13 for ; Wed, 02 Nov 2011 11:05:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=YAh7VKceP+hZP6peyXqr0OXPQM7Jg3Sw6qQlfJzITNo=; b=efjfBifr+2k0APqYap7yT3wkbhCXzMCKZGfrGSf8a2jWt7RFP5mlXG3PMWGtLCSgqh 4igXugaPKf0Fw7/4VoJAeTkDHgxOshjPjunAGpQ1e2d1ph7x6lgS3BtLAQbd0TwCSC5U d4su8Ck+SRCXwCmLfP0wImLWco4CugsdDj+UU= MIME-Version: 1.0 Received: by 10.182.17.103 with SMTP id n7mr1100067obd.68.1320257145101; Wed, 02 Nov 2011 11:05:45 -0700 (PDT) Received: by 10.182.35.193 with HTTP; Wed, 2 Nov 2011 11:05:44 -0700 (PDT) In-Reply-To: References: Date: Wed, 2 Nov 2011 11:05:44 -0700 Message-ID: From: Jason Wolfe To: freebsd-scsi@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with upped disk tags X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Nov 2011 18:05:47 -0000 On Tue, Nov 1, 2011 at 11:13 AM, Jason Wolfe wrote: > Luckily remote syslogging is enabled, so while nothing is kept locally, we > see these messages similar to these transmitted before the server hangs, > requiring a power cycle: > > (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID > 510 > (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID > 713 > (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID > 942 > (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID > 356 > (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID > 492 > (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID > 976 > (da11:mps0:0:12:0): SCSI command timeout on device handle 0x0015 SMID > 339 > (da11:mps0:0:12:0): SCSI command timeout on device handle 0x0015 SMID > 746 > (da5:mps0:0:6:0): SCSI command timeout on device handle 0x000f SMID 74 > (da6:mps0:0:7:0): SCSI command timeout on device handle 0x0010 SMID > 613 > (da2:mps0:0:3:0): SCSI command timeout on device handle 0x000c SMID 16 > (da10:mps0:0:11:0): SCSI command timeout on device handle 0x0014 SMID > 305 > (da1:mps0:0:2:0): SCSI command timeout on device handle 0x000b SMID 74 > (da6:mps0:0:7:0): SCSI command timeout on device handle 0x0010 SMID > 594 > > In some cases that would be followed by this, which would usually be the > last transmission, though we don't see this in all cases. It may just be > the system isn't always alive long enough to transmit: > > kernel: mps0: IOC Fault 0x40006003, Resetting > > Hello, Testing with the LSI supplied driver, it appears they have a code path for this condition that causes our driver to crash. Here are 2 sets of messages: mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff80003fb000 cm 0xffffff800040bdf8 (da0:mpslsi0:0:8:0): WRITE(10). CDB: 2a 0 55 bf 5a 3f 0 1 0 0 length 131072 SMID 97 command timeout cm 0xffffff800040bdf8 ccb 0xffffff00 mpslsi0: mpssas_alloc_tm freezing simq mpslsi0: timedout cm 0xffffff800040bdf8 allocated tm 0xffffff8000409070 (da0:mpslsi0:0:8:0): READ(10). CDB: 28 0 55 96 48 7f 0 0 80 0 length 65536 SMID 171 completed cm 0xffffff80004105a8 ccb 0xffffff03c3443y (da0:mpslsi0:0:8:0): READ(10). CDB: 28 0 54 f8 a4 3f 0 0 80 0 length 65536 SMID 762 completed cm 0xffffff8000434230 ccb 0xffffff001317ay (da0:mpslsi0:0:8:0): WRITE(10). CDB: 2a 0 55 bf 5a 3f 0 1 0 0 length 131072 SMID 97 completed timedout cm 0xffffff800040bdf8 ccb 0xffff1 (noperiph:mpslsi0:0:8:0): SMID 50 finished recovery after aborting TaskMID 97 mpslsi0: mpssas_free_tm releasing simq mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff80003fb000 cm 0xffffff8000441e18 (da7:mpslsi0:0:15:0): WRITE(10). CDB: 2a 0 33 76 29 ef 0 1 0 0 length 131072 SMID 989 command timeout cm 0xffffff8000441e18 ccb 0xfffff0 mpslsi0: mpssas_alloc_tm freezing simq mpslsi0: timedout cm 0xffffff8000441e18 allocated tm 0xffffff80004063e0 (da7:mpslsi0:0:15:0): READ(10). CDB: 28 0 71 14 a1 4f 0 1 0 0 length 131072 SMID 857 completed cm 0xffffff8000439e38 ccb 0xffffff001316y (da7:mpslsi0:0:15:0): READ(10). CDB: 28 0 71 e4 98 57 0 0 80 0 length 65536 SMID 300 completed cm 0xffffff80004182a0 ccb 0xffffff0392f0y (da7:mpslsi0:0:15:0): WRITE(10). CDB: 2a 0 33 76 29 ef 0 1 0 0 length 131072 SMID 989 completed timedout cm 0xffffff8000441e18 ccb 0xff1 (noperiph:mpslsi0:0:15:0): SMID 4 finished recovery after aborting TaskMID 989 mpslsi0: mpssas_free_tm releasing simq The server ran for 10 minutes with these happening every 10-30 seconds, with our community driver the first instance of commands timing out during this smartctl storm would cause the server to hang and sometimes the controller to reset. Hopefully this is helpful to someone. Jason From owner-freebsd-scsi@FreeBSD.ORG Thu Nov 3 10:31:42 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AF028106564A for ; Thu, 3 Nov 2011 10:31:42 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.17.9]) by mx1.freebsd.org (Postfix) with ESMTP id 5CFD08FC16 for ; Thu, 3 Nov 2011 10:31:42 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis) id 0MaE2a-1RfdK904c4-00K1wW; Thu, 03 Nov 2011 11:31:41 +0100 Message-ID: <4EB26D8B.1090804@brockmann-consult.de> Date: Thu, 03 Nov 2011 11:31:39 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:j9R91YIvAcdO5yG4kFGHNpADIxxh0zjDlwLoz5RyZDt dxykdEygGq3v0xYZfDRgMvFPg51uo7sbjDaWP1U6DmDqaaOMPD qJnUooAJ+1l/k5H7bV+fWx0osCv0fRfGLnCePnbCpGjQjjjqQu thnnZAnqpviM0YNgXrAk40Dg8lfIhG+xQdbOoKpzWy1VR2w/o3 WEoa66QCW2XdA8BpZ8YyuOMbf21UJWHBJ5CESbaKCy/kc+bT3w UAScDXs+6F77BQhC0mBsHKDaFGRlmpbxxGDlckgYFULcdwbGH7 8XUY7PqF+KP/vrPjMOHbe+FzPWY8d4O1cwGNGbyg3yV6L3GH5H v3nltVs+zF8EooMysUOjJdA66Rr9GtFCG7S+AAVwd Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with upped disk tags X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Nov 2011 10:31:42 -0000 Dear Jason, On 11/02/2011 07:05 PM, Jason Wolfe wrote: > Hello, > Testing with the LSI supplied driver, it appears they have a code path for > this condition that causes our driver to crash. Here are 2 sets of > messages: > > mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff80003fb000 cm > 0xffffff800040bdf8 > (da0:mpslsi0:0:8:0): WRITE(10). CDB: 2a 0 55 bf 5a 3f 0 1 0 0 length 131072 > SMID 97 command timeout cm 0xffffff800040bdf8 ccb 0xffffff00 > mpslsi0: mpssas_alloc_tm freezing simq > mpslsi0: timedout cm 0xffffff800040bdf8 allocated tm 0xffffff8000409070 > (da0:mpslsi0:0:8:0): READ(10). CDB: 28 0 55 96 48 7f 0 0 80 0 length 65536 > SMID 171 completed cm 0xffffff80004105a8 ccb 0xffffff03c3443y > (da0:mpslsi0:0:8:0): READ(10). CDB: 28 0 54 f8 a4 3f 0 0 80 0 length 65536 > SMID 762 completed cm 0xffffff8000434230 ccb 0xffffff001317ay > (da0:mpslsi0:0:8:0): WRITE(10). CDB: 2a 0 55 bf 5a 3f 0 1 0 0 length 131072 > SMID 97 completed timedout cm 0xffffff800040bdf8 ccb 0xffff1 > (noperiph:mpslsi0:0:8:0): SMID 50 finished recovery after aborting TaskMID > 97 > mpslsi0: mpssas_free_tm releasing simq > > > mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff80003fb000 cm > 0xffffff8000441e18 > (da7:mpslsi0:0:15:0): WRITE(10). CDB: 2a 0 33 76 29 ef 0 1 0 0 length > 131072 SMID 989 command timeout cm 0xffffff8000441e18 ccb 0xfffff0 > mpslsi0: mpssas_alloc_tm freezing simq > mpslsi0: timedout cm 0xffffff8000441e18 allocated tm 0xffffff80004063e0 > (da7:mpslsi0:0:15:0): READ(10). CDB: 28 0 71 14 a1 4f 0 1 0 0 length 131072 > SMID 857 completed cm 0xffffff8000439e38 ccb 0xffffff001316y > (da7:mpslsi0:0:15:0): READ(10). CDB: 28 0 71 e4 98 57 0 0 80 0 length 65536 > SMID 300 completed cm 0xffffff80004182a0 ccb 0xffffff0392f0y > (da7:mpslsi0:0:15:0): WRITE(10). CDB: 2a 0 33 76 29 ef 0 1 0 0 length > 131072 SMID 989 completed timedout cm 0xffffff8000441e18 ccb 0xff1 > (noperiph:mpslsi0:0:15:0): SMID 4 finished recovery after aborting TaskMID > 989 > mpslsi0: mpssas_free_tm releasing simq > > The server ran for 10 minutes with these happening every 10-30 seconds, > with our community driver the first instance of commands timing out during > this smartctl storm would cause the server to hang and sometimes the > controller to reset. Hopefully this is helpful to someone. > Does this mean it didn't hang? or it ran your smartctl -a test for 10 minutes before a hang? I am also trying the mpslsi driver now, but I couldn't reproduce the problem using "smartctl -a" (also tried -A, -h and -i) with the mps driver. Tags was set to 255 on all disks. I only tried it on the backup server, which didn't crash randomly on its own either. So I will just have to assume it works if it doesn't do the same thing in a month or two. However, with the mpslsi driver, during a scrub on the backup server (probably during smartctl -a), I got these messages (including what looks like a controller reset), and no disks were lost, with no read errors reported in zpool status. But I can't get it to happen a second time. So I hope that means our problems are over. Nov 3 09:17:10 bcnas1bak kernel: mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff800f629000 cm 0xffffff800f65f698 Nov 3 09:17:10 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 717 command timeout cm 0xffffff800f65f698 ccb 0xffffff0026bbb800 Nov 3 09:17:10 bcnas1bak kernel: mpslsi0: mpssas_alloc_tm freezing simq Nov 3 09:17:10 bcnas1bak kernel: mpslsi0: timedout cm 0xffffff800f65f698 allocated tm 0xffffff800f6340f8 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 2c f3 be e2 0 0 2a 0 length 21504 SMID 261 completed cm 0xffffff800f643cd8 ccb 0xffffff0026bd1000 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 2c f3 be e2 0 0 2a 0 length 21504 SMID 261 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 52 1e 2 e3 0 0 2b 0 length 22016 SMID 534 completed cm 0xffffff800f654550 ccb 0xffffff0026b96000 during recovery i oc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 52 1e 2 e3 0 0 2b 0 length 22016 SMID 534 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 3a 5 14 a3 0 0 2b 0 length 22016 SMID 798 completed cm 0xffffff800f664510 ccb 0xffffff003d438000 during recovery i oc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 3a 5 14 a3 0 0 2b 0 length 22016 SMID 798 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 39 81 86 6f 0 0 2b 0 length 22016 SMID 590 completed cm 0xffffff800f657b90 ccb 0xffffff00314ce800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 39 81 86 6f 0 0 2b 0 length 22016 SMID 590 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 39 47 e8 2c 0 0 2a 0 length 21504 SMID 634 completed cm 0xffffff800f65a630 ccb 0xffffff0026ba1800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 39 47 e8 2c 0 0 2a 0 length 21504 SMID 634 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 2d 8b 96 af 0 0 2b 0 length 22016 SMID 707 completed cm 0xffffff800f65ece8 ccb 0xffffff0026bb1800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 2d 8b 96 af 0 0 2b 0 length 22016 SMID 707 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 717 completed timedout cm 0xffffff800f65f698 ccb 0xffffff0026bbb800 during recov(da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 1c dc 68 73 0 0 2b 0 length 22016 SMID 690 completed cm 0xffffff800f65dc70 ccb 0xffffff0026bea800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 1c dc 68 73 0 0 2b 0 length 22016 SMID 690 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 58 d da 33 0 0 2b 0 length 22016 SMID 947 completed cm 0xffffff800f66d568 ccb 0xffffff0026bf9000 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 58 d da 33 0 0 2b 0 length 22016 SMID 947 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 4b 30 d1 80 0 0 2a 0 length 21504 SMID 683 completed cm 0xffffff800f65d5a8 ccb 0xffffff003d47f800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 4b 30 d1 80 0 0 2a 0 length 21504 SMID 683 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 4a d 10 d0 0 0 2b 0 length 22016 SMID 219 completed cm 0xffffff800f641428 ccb 0xffffff0031536000 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 4a d 10 d0 0 0 2b 0 length 22016 SMID 219 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 41 1e 9a 58 0 0 2a 0 length 21504 SMID 169 completed cm 0xffffff800f63e3b8 ccb 0xffffff00314ec800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 41 1e 9a 58 0 0 2a 0 length 21504 SMID 169 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 139 completed cm 0xffffff800f63c6a8 ccb 0xffffff0026a89000 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 139 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 876 completed cm 0xffffff800f6690a0 ccb 0xffffff00314c8800 during recovery ioc 8(pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 876 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 661 completed cm 0xffffff800f65c058 ccb 0xffffff0026b7d000 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 661 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 471 completed cm 0xffffff800f650848 ccb 0xffffff0026be7800 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 471 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 215 completed cm 0xffffff800f641048 ccb 0xffffff0026bef800 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 215 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 203 completed cm 0xffffff800f6404a8 ccb 0xffffff0026bb6000 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 203 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 546 completed cm 0xffffff800f6550f0 ccb 0xffffff003d447800 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 546 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 513 completed cm 0xffffff800f6530f8 ccb 0xffffff0026bcb800 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 513 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (noperiph:mpslsi0:0:10:0): SMID 1 abort TaskMID 717 status 0x0 code 0x0 count 20 Nov 3 09:17:11 bcnas1bak kernel: (noperiph:mpslsi0:0:10:0): SMID 1 finished recovery after aborting TaskMID 717 Nov 3 09:17:11 bcnas1bak kernel: mpslsi0: mpssas_free_tm releasing simq Nov 3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 41 1e 9a 58 0 0 2a 0 Nov 3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): CAM status: SCSI Status Error Nov 3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): SCSI status: Check Condition Nov 3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) Peter > Jason > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de -------------------------------------------- From owner-freebsd-scsi@FreeBSD.ORG Thu Nov 3 11:22:05 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB8C1106564A; Thu, 3 Nov 2011 11:22:05 +0000 (UTC) (envelope-from Karli.Sjoberg@slu.se) Received: from Edge1-3.slu.se (edge1-3.slu.se [193.10.100.98]) by mx1.freebsd.org (Postfix) with ESMTP id AE9F18FC18; Thu, 3 Nov 2011 11:22:04 +0000 (UTC) Received: from Exchange2.ad.slu.se (193.10.100.95) by Edge1-3.slu.se (193.10.100.98) with Microsoft SMTP Server (TLS) id 8.3.213.0; Thu, 3 Nov 2011 12:22:01 +0100 Received: from exmbx3.ad.slu.se ([193.10.100.93]) by Exchange2.ad.slu.se ([193.10.100.95]) with mapi; Thu, 3 Nov 2011 12:22:01 +0100 From: =?iso-8859-1?Q?Karli_Sj=F6berg?= To: "Kenneth D. Merry" Date: Thu, 3 Nov 2011 12:21:59 +0100 Thread-Topic: AOC-USAS2-L8i zfs panics and SCSI errors in messages Thread-Index: AcyaGs4yiVLBqLRGSh6I0qMEfXlyoQ== Message-ID: <666756B5-218E-48D6-99A7-56C7FB0D2E33@slu.se> References: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se> <20111025193302.GA30409@nargothrond.kdm.org> In-Reply-To: <20111025193302.GA30409@nargothrond.kdm.org> Accept-Language: sv-SE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: sv-SE, en-US MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-scsi@freebsd.org" , "fs@freebsd.org" Subject: Re: AOC-USAS2-L8i zfs panics and SCSI errors in messages X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Nov 2011 11:22:05 -0000 Hi, I=B4m not alone! By complete chance I was reading another thread on the forum and it turns o= ut that peetaur also has the exact same problem as me with timeouts and som= etimes losing disks. His hardware is very different from mine, except that = we both have LSI controllers and are running 8.2-STABLE. He has tried both = the mps-driver in FreeBSD and the mps-driver that LSI provides (phase 11), = and still gets these timeouts. peetaur=B4s system: 4HE Chassis from Supermicro 847E16-R1400LPB with 2 1400 Watt red. power and 36x HotSwap for SAS or SATA Motherboard from Supermicro - Intel=AE 5520 (Tylersburg) Chipset - 12 DIMM memory slots (max. 192GB DDR3) - 2x 100/1000Base TX Gigabit Ethernet Port (Dual Intel=AE 82576 Gigabit Eth= ernet) - 6x SATA (3 Gbps) Ports via ICH10R Controller - PCI Slots: 7x (x8) PCI-E 2.0 (in x16 slots) - Integrated IPMI 2.0 with Dedicated LAN - Integrated Matrox G200eW Graphics CPU - 2x E5620 Intel Xeon (Westmere) Quad Core CPU, (80W) 2,40 GHz, 12 MB L3 Ca= che RAM - 48 GB (6x 8GB) DDR3 1333 DIMM, REG, ECC SAS HBA - 9211-8i Network - 10G Card with Dual-port Intel=AE 82598EB (CX4) Disks - 9x HDD 3TB SATA from Hitachi, 7.2k UPM, 64 MB Cache - 9x HDD 3TB SATA from Seagate, 7.2k UPM, 64 MB Cache - 2x consumer SSDs (boot, root, zil, cache) #uname -a FreeBSD bcnas1.bc.local 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Sep 29 15:06:= 03 CEST 2011 root@bcnas1.bc.local:/usr/obj/usr/src/sys/GENERIC amd64 and a extract from /var/log/messages when using FreeBSD=B4s mps: Oct 4 08:57:05 bcnas1 kernel: (da3:mps0:0:0:0): SCSI command timeout on de= vice handle 0x000a SMID 568 Oct 4 08:57:05 bcnas1 kernel: (da3:mps0:0:0:0): SCSI command timeout on de= vice handle 0x000a SMID 998 Oct 4 08:57:13 bcnas1 kernel: mps0: (0:0:0) terminated ioc 804b scsi 0 sta= te c xfer 0 Oct 4 08:57:13 bcnas1 kernel: mps0: mpssas_abort_complete: abort request o= n handle 0x0a SMID 568 complete Oct 4 08:57:13 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending de= ferred task management request for handle 0x0a SMID 998 Oct 4 08:57:13 bcnas1 kernel: mps0: mpssas_abort_complete: abort request o= n handle 0x0a SMID 998 complete Oct 4 08:58:13 bcnas1 kernel: (da3:mps0:0:0:0): SCSI command timeout on de= vice handle 0x000a SMID 973 Oct 4 08:58:13 bcnas1 kernel: (da3:mps0:0:0:0): SCSI command timeout on de= vice handle 0x000a SMID 981 Oct 4 08:58:21 bcnas1 kernel: mps0: (0:0:0) terminated ioc 804b scsi 0 sta= te c xfer 0 Oct 4 08:58:21 bcnas1 kernel: mps0: mpssas_abort_complete: abort request o= n handle 0x0a SMID 973 complete Oct 4 08:58:21 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending de= ferred task management request for handle 0x0a SMID 981 Oct 4 08:58:21 bcnas1 kernel: mps0: mpssas_abort_complete: abort request o= n handle 0x0a SMID 981 complete Oct 4 08:58:24 bcnas1 kernel: (da3:mps0:0:0:0): READ(6). CDB: 8 0 0 0 80 0 Oct 4 08:58:24 bcnas1 kernel: (da3:mps0:0:0:0): CAM status: SCSI Status Er= ror Oct 4 08:58:24 bcnas1 kernel: (da3:mps0:0:0:0): SCSI status: Check Conditi= on Oct 4 08:58:24 bcnas1 kernel: (da3:mps0:0:0:0): SCSI sense: UNIT ATTENTION= asc:29,0 (Power on, reset, or bus device reset occurred) Oct 4 09:00:14 bcnas1 kernel: mps0: mpssas_remove_complete on target 0x000= 0, IOCStatus=3D 0x0 Oct 4 09:00:14 bcnas1 kernel: (da3:mps0:0:0:0): lost device and a extract from /var/log/messages when using LSI=B4s mps: Nov 3 09:17:10 bcnas1bak kernel: mpslsi0: mpssas_scsiio_timeout checking s= c 0xffffff800f629000 cm 0xffffff800f65f698 Nov 3 09:17:10 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS = THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 717 c= ommand timeout cm 0xffffff800f65f698 ccb 0xffffff0026bbb800 Nov 3 09:17:10 bcnas1bak kernel: mpslsi0: mpssas_alloc_tm freezing simq Nov 3 09:17:10 bcnas1bak kernel: mpslsi0: timedout cm 0xffffff800f65f698 a= llocated tm 0xffffff800f6340f8 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 2c f3 be e2 0 0 2a 0 length 21504 SMID 261 completed cm 0xffffff800f643cd8= ccb 0xffffff0026bd1000 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 2c f3 be e2 0 0 2a 0 length 21504 SMID 261 terminated ioc 804b scsi 0 stat= e c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 52 1e 2 e3 0 0 2b 0 length 22016 SMID 534 completed cm 0xffffff800f654550 = ccb 0xffffff0026b96000 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 52 1e 2 e3 0 0 2b 0 length 22016 SMID 534 terminated ioc 804b scsi 0 state= c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 3a 5 14 a3 0 0 2b 0 length 22016 SMID 798 completed cm 0xffffff800f664510 = ccb 0xffffff003d438000 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 3a 5 14 a3 0 0 2b 0 length 22016 SMID 798 terminated ioc 804b scsi 0 state= c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 39 81 86 6f 0 0 2b 0 length 22016 SMID 590 completed cm 0xffffff800f657b90= ccb 0xffffff00314ce800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 39 81 86 6f 0 0 2b 0 length 22016 SMID 590 terminated ioc 804b scsi 0 stat= e c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 39 47 e8 2c 0 0 2a 0 length 21504 SMID 634 completed cm 0xffffff800f65a630= ccb 0xffffff0026ba1800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 39 47 e8 2c 0 0 2a 0 length 21504 SMID 634 terminated ioc 804b scsi 0 stat= e c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 2d 8b 96 af 0 0 2b 0 length 22016 SMID 707 completed cm 0xffffff800f65ece8= ccb 0xffffff0026bb1800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 2d 8b 96 af 0 0 2b 0 length 22016 SMID 707 terminated ioc 804b scsi 0 stat= e c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS = THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 717 c= ompleted timedout cm 0xffffff800f65f698 ccb 0xffffff0026bbb800 during recov= (da0:mpslsi0:0:10:0): R$ Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 1c dc 68 73 0 0 2b 0 length 22016 SMID 690 terminated ioc 804b scsi 0 stat= e c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 58 d da 33 0 0 2b 0 length 22016 SMID 947 completed cm 0xffffff800f66d568 = ccb 0xffffff0026bf9000 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 58 d da 33 0 0 2b 0 length 22016 SMID 947 terminated ioc 804b scsi 0 state= c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 4b 30 d1 80 0 0 2a 0 length 21504 SMID 683 completed cm 0xffffff800f65d5a8= ccb 0xffffff003d47f800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 4b 30 d1 80 0 0 2a 0 length 21504 SMID 683 terminated ioc 804b scsi 0 stat= e c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 4a d 10 d0 0 0 2b 0 length 22016 SMID 219 completed cm 0xffffff800f641428 = ccb 0xffffff0031536000 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 4a d 10 d0 0 0 2b 0 length 22016 SMID 219 terminated ioc 804b scsi 0 state= c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 41 1e 9a 58 0 0 2a 0 length 21504 SMID 169 completed cm 0xffffff800f63e3b8= ccb 0xffffff00314ec800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 41 1e 9a 58 0 0 2a 0 length 21504 SMID 169 terminated ioc 804b scsi 0 stat= e c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS = THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 139 = completed cm 0xffffff800f63c6a8 ccb 0xffffff0026a89000 during recovery ioc = (pass0:mpslsi0:0:10:0):$ Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS = THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 876 c= ompleted cm 0xffffff800f6690a0 ccb 0xffffff00314c8800 during recovery ioc 8= (pass0:mpslsi0:0:10:0):$ Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS = THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 661 = completed cm 0xffffff800f65c058 ccb 0xffffff0026b7d000 during recovery ioc = (pass0:mpslsi0:0:10:0):$ Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS = THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 471 = completed cm 0xffffff800f650848 ccb 0xffffff0026be7800 during recovery ioc = (pass0:mpslsi0:0:10:0):$ Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS = THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 215 = completed cm 0xffffff800f641048 ccb 0xffffff0026bef800 during recovery ioc = (pass0:mpslsi0:0:10:0):$ Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS = THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 203 = completed cm 0xffffff800f6404a8 ccb 0xffffff0026bb6000 during recovery ioc = (pass0:mpslsi0:0:10:0):$ Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS = THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 546 = completed cm 0xffffff800f6550f0 ccb 0xffffff003d447800 during recovery ioc = (pass0:mpslsi0:0:10:0):$ Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS = THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 513 = completed cm 0xffffff800f6530f8 ccb 0xffffff0026bcb800 during recovery ioc = (pass0:mpslsi0:0:10:0):$ Nov 3 09:17:11 bcnas1bak kernel: (noperiph:mpslsi0:0:10:0): SMID 1 abort T= askMID 717 status 0x0 code 0x0 count 20 Nov 3 09:17:11 bcnas1bak kernel: (noperiph:mpslsi0:0:10:0): SMID 1 finishe= d recovery after aborting TaskMID 717 Nov 3 09:17:11 bcnas1bak kernel: mpslsi0: mpssas_free_tm releasing simq Nov 3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0= 41 1e 9a 58 0 0 2a 0 Nov 3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): CAM status: SCSI St= atus Error Nov 3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): SCSI status: Check = Condition Nov 3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): SCSI sense: UNIT AT= TENTION asc:29,0 (Power on, reset, or bus device reset occurred) /Karli 25 okt 2011 kl. 21.33 skrev Kenneth D. Merry: On Thu, Oct 20, 2011 at 13:28:17 +0200, Karli Sj?berg wrote: Hi, I?m in the process of vacating a Sun/Oracle system to a another Supermicro/= FreeBSD system, doing zfs send/recv between. Two times now, the system has = panicked while not doing anything at all, and it?s throwing alot of SCSI/CA= M-related errors while doing IO-intensive operations, like send/recv, resil= ver, and zpool has sometimes reported read/write errors on the hard drives.= Best part is that the errors in messages are about all hard drives at one = time or another, and they are connected with separate cables, controllers a= nd caddies. Specs: HW: 1x Supermicro X8SIL-F 2x Supermicro AOC-USAS2-L8i 2x Supermicro CSE-M35T-1B 1x Intel Core i5 650 3,2GHz 4x 2GB 1333MHZ DDR3 ECC UDIMM 10x SAMSUNG HD204UI (in a raidz2 zpool) 1x OCZ Vertex 3 240GB (L2ARC) SW: # uname -a FreeBSD server 8.2-STABLE FreeBSD 8.2-STABLE #0: Mon Oct 10 09:12:25 UTC 20= 11 root@server:/usr/obj/usr/src/sys/GENERIC amd64 # zpool get version pool1 NAME PROPERTY VALUE SOURCE pool1 version 28 default[/CODE] I got the panic from the IPMI KVM: http://i55.tinypic.com/synpzk.png In looking at the panic, this is a ZFS panic. Nothing the disks do should be able to cause ZFS to panic. ZFS is panicing in avl_add(): /* * This is unfortunate. We want to call panic() here, even for * non-DEBUG kernels. In userland, however, we can't depend on anything * in libc or else the rtld build process gets confused. So, all we can * do in userland is resort to a normal ASSERT(). */ if (avl_find(tree, new_node, &where) !=3D NULL) #ifdef _KERNEL panic("avl_find() succeeded inside avl_add()"); #else ASSERT(0); #endif There are certainly timeouts and two terminated IOCs in the log below. Tha= t does suggest a hardware or driver problem, but it isn't very obvious what it might be. I have seen bad behavior with SATA drives behind 3Gb Maxim expanders talking to 6GB LSI controllers, but your particular configuration does not involve any expanders, and therefore is not that particular STP issue. My best guess, and it is a guess, is that either the drives are misbehaving (i.e. firmware type problem) or you've got a cabling issue. If you have more hardware available, you might try swapping out the cables and/or drives to see if you can reproduce the drive errors with a different setup. If you swap the drives, I would use a different brand if you've got them available. I'm CCing the fs list, perhaps someone there can look at the stack trace above and figure out what ZFS might be doing. Again, ZFS should survive any errors from the drives, and the panic above looks like ZFS is flagging a logic bug somewhere. And an extract from /var/log/messages: Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): WRITE(10). CDB: 2a 0 6 13 6= 6 f 0 0 f 0 Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): CAM status: SCSI Status Err= or Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI status: Check Conditio= n Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI sense: UNIT ATTENTION = asc:29,0 (Power on, reset, or bus device reset occurred) Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): WRITE(6). CDB: a 0 1 b2 2 0 Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): CAM status: SCSI Status Err= or Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI status: Check Conditio= n Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI sense: UNIT ATTENTION = asc:29,0 (Power on, reset, or bus device reset occurred) Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev= ice handle 0x000c SMID 859 Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev= ice handle 0x000c SMID 495 Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev= ice handle 0x000c SMID 725 Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev= ice handle 0x000c SMID 722 Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev= ice handle 0x000c SMID 438 Oct 19 17:40:38 fs2-7 kernel: mps1: (1:4:0) terminated ioc 804b scsi 0 stat= e c xfer 0 Oct 19 17:40:38 fs2-7 last message repeated 3 times Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on= handle 0x0c SMID 859 complete Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0c SMID 495 Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on= handle 0x0c SMID 495 complete Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0c SMID 725 Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on= handle 0x0c SMID 725 complete Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0c SMID 722 Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on= handle 0x0c SMID 722 complete Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0c SMID 438 Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on= handle 0x0c SMID 438 complete Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): WRITE(10). CDB: 2a 0 6 25 4= f 75 0 0 b 0 Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): CAM status: SCSI Status Err= or Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI status: Check Conditio= n Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI sense: UNIT ATTENTION = asc:29,0 (Power on, reset, or bus device reset occurred) Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): WRITE(10). CDB: 2a 0 2d a5 = 10 ca 0 0 80 0 Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): CAM status: SCSI Status Err= or Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI status: Check Conditio= n Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI sense: UNIT ATTENTION = asc:29,0 (Power on, reset, or bus device reset occurred) Oct 19 17:45:40 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on dev= ice handle 0x000a SMID 976 Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on dev= ice handle 0x000a SMID 636 Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on dev= ice handle 0x000a SMID 888 Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on dev= ice handle 0x000a SMID 983 Oct 19 17:45:41 fs2-7 kernel: mps0: (0:1:0) terminated ioc 804b scsi 0 stat= e c xfer 0 Oct 19 17:45:41 fs2-7 last message repeated 2 times Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on= handle 0x0a SMID 976 complete Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0a SMID 636 Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on= handle 0x0a SMID 636 complete Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0a SMID 888 Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on= handle 0x0a SMID 888 complete Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_complete_tm_request: sending def= erred task management request for handle 0x0a SMID 983 Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on= handle 0x0a SMID 983 complete Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): WRITE(10). CDB: 2a 0 6 40 a= 7 2 0 0 3 0 Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): CAM status: SCSI Status Err= or Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI status: Check Conditio= n Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI sense: UNIT ATTENTION = asc:29,0 (Power on, reset, or bus device reset occurred) Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): WRITE(10). CDB: 2a 0 6 40 b= 0 9 0 0 9 0 Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): CAM status: SCSI Status Err= or Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): SCSI status: Check Conditio= n Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): SCSI sense: UNIT ATTENTION = asc:29,0 (Power on, reset, or bus device reset occurred) What?s going on? Regards Karli Sj?berg_______________________________________________ freebsd-scsi@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-scsi To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" Ken -- Kenneth Merry ken@FreeBSD.ORG Med V=E4nliga H=E4lsningar ---------------------------------------------------------------------------= ---- Karli Sj=F6berg Swedish University of Agricultural Sciences Box 7079 (Visiting Address Kron=E5sv=E4gen 8) S-750 07 Uppsala, Sweden Phone: +46-(0)18-67 15 66 karli.sjoberg@slu.se From owner-freebsd-scsi@FreeBSD.ORG Thu Nov 3 23:53:27 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB91B106564A for ; Thu, 3 Nov 2011 23:53:27 +0000 (UTC) (envelope-from chuck@tuffli.net) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id A7B8B8FC12 for ; Thu, 3 Nov 2011 23:53:27 +0000 (UTC) Received: by qadb12 with SMTP id b12so185330qad.13 for ; Thu, 03 Nov 2011 16:53:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.7.66 with SMTP id h2mr483342oba.14.1320362947450; Thu, 03 Nov 2011 16:29:07 -0700 (PDT) Received: by 10.182.116.102 with HTTP; Thu, 3 Nov 2011 16:29:07 -0700 (PDT) Date: Thu, 3 Nov 2011 16:29:07 -0700 Message-ID: From: Chuck Tuffli To: freebsd-scsi Content-Type: text/plain; charset=ISO-8859-1 Subject: how to abort an ATIO/INOT X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Nov 2011 23:53:27 -0000 Hi - I'm implementing a target mode driver using the scsi_target as a back-end, and am seeing scsi_target hang sometimes when exiting. When it hangs, the call stack appears to be abort_all_pending targdisable targioctl(TARGIOCDISABLE) with the "hang" due to the msleep on the pending_ccb_queue. If I understand the code correctly (which I may not), the msleep is to wait asynchronously for CCBs to abort. But what about cases where the CCB completes prior to the msleep? For example, some drivers call xpt_done on ATIO/INOT CCBs and then return CAM_REQ_CMP for the abort (I copied this in my driver). I believe this results in the hang as the abort request completes (status == CAM_REQ_CMP) triggering the msleep, but the xpt_done that could wake up anything sleeping on the pending_ccb_queue has already run. So, should target drivers not return CAM_REQ_CMP unless a CCB needs to be asynchronously aborted? What about CTIO? Does that have a potential race? ---chuck