From owner-freebsd-scsi@FreeBSD.ORG  Mon Oct 31 11:07:12 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 09A90106574B
	for <freebsd-scsi@FreeBSD.org>; Mon, 31 Oct 2011 11:07:12 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id E41EF8FC28
	for <freebsd-scsi@FreeBSD.org>; Mon, 31 Oct 2011 11:07:11 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9VB7BTU056865
	for <freebsd-scsi@FreeBSD.org>; Mon, 31 Oct 2011 11:07:11 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9VB7Bf4056863
	for freebsd-scsi@FreeBSD.org; Mon, 31 Oct 2011 11:07:11 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 31 Oct 2011 11:07:11 GMT
Message-Id: <201110311107.p9VB7Bf4056863@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-scsi@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Oct 2011 11:07:12 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/161809  scsi       [cam] [patch] set kern.cam.boot_delay via build option
o kern/159412  scsi       [ciss] 7.3 RELEASE: ciss0 ADAPTER HEARTBEAT FAILED err
o kern/157770  scsi       [iscsi] [panic] iscsi_initiator panic
o kern/154432  scsi       [xpt] run_interrupt_driven_hooks: still waiting after 
o kern/153514  scsi       [cam] [panic] CAM related panic
o kern/153361  scsi       [ciss] Smart Array 5300 boot/detect drive problem
o kern/152250  scsi       [ciss] [patch] Kernel panic when hw.ciss.expose_hidden
o kern/151564  scsi       [ciss] ciss(4) should increase  CISS_MAX_LOGICAL to 10
o docs/151336  scsi       Missing documentation of scsi_ and ata_ functions in c
s kern/149927  scsi       [cam] hard drive not stopped before removing power dur
o kern/148083  scsi       [aac] Strange device reporting
o kern/147704  scsi       [mpt] sys/dev/mpt: new chip revision, partially unsupp
o kern/146287  scsi       [ciss] ciss(4) cannot see more than one SmartArray con
o kern/145768  scsi       [mpt] can't perform I/O on SAS based SAN disk in freeb
o kern/144648  scsi       [aac] Strange values of speed and bus width in dmesg
o kern/144301  scsi       [ciss] [hang] HP proliant server locks when using ciss
o kern/142351  scsi       [mpt] LSILogic driver performance problems
o kern/141934  scsi       [cam] [patch] add support for SEAGATE DAT Scopion 130
o kern/134488  scsi       [mpt] MPT SCSI driver probes max. 8 LUNs per device
o kern/132250  scsi       [ciss] ciss driver does not support more then 15 drive
o kern/132206  scsi       [mpt] system panics on boot when mirroring and 2nd dri
o kern/130621  scsi       [mpt] tranfer rate is inscrutable slow when use lsi213
o kern/129602  scsi       [ahd] ahd(4) gets confused and wedges SCSI bus
o kern/128452  scsi       [sa] [panic] Accessing SCSI tape drive randomly crashe
o kern/128245  scsi       [scsi] "inquiry data fails comparison at DV1 step" [re
o kern/127927  scsi       [isp] isp(4) target driver crashes kernel when set up 
o kern/127717  scsi       [ata] [patch] [request] - support write cache toggling
o kern/124667  scsi       [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi
o kern/123674  scsi       [ahc] ahc driver dumping
o kern/123520  scsi       [ahd] unable to boot from net while using ahd
o sparc/121676 scsi       [iscsi] iscontrol do not connect iscsi-target on sparc
o kern/120487  scsi       [sg] scsi_sg incompatible with scanners
o kern/120247  scsi       [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s 
o kern/114597  scsi       [sym] System hangs at SCSI bus reset with dual HBAs
o kern/110847  scsi       [ahd] Tyan U320 onboard problem with more than 3 disks
o kern/99954   scsi       [ahc] reading from DVD failes on 6.x [regression]
o kern/92798   scsi       [ahc] SCSI problem with timeouts
o kern/90282   scsi       [sym] SCSI bus resets cause loss of ch device
o kern/76178   scsi       [ahd] Problem with ahd and large SCSI Raid system
o kern/74627   scsi       [ahc] [hang] Adaptec 2940U2W Can't boot 5.3
s kern/61165   scsi       [panic] kernel page fault after calling cam_send_ccb
o kern/60641   scsi       [sym] Sporadic SCSI bus resets with 53C810 under load
o kern/60598   scsi       wire down of scsi devices conflicts with config
s kern/57398   scsi       [mly] Current fails to install on mly(4) based RAID di
o bin/57088    scsi       [cam] [patch] for a possible fd leak in libcam.c
o kern/52638   scsi       [panic] SCSI U320 on SMP server won't run faster than 
o kern/44587   scsi       dev/dpt/dpt.h is missing defines required for DPT_HAND
o kern/39388   scsi       ncr/sym drivers fail with 53c810 and more than 256MB m
o kern/35234   scsi       World access to /dev/pass? (for scanner) requires acce

49 problems total.


From owner-freebsd-scsi@FreeBSD.ORG  Tue Nov  1 18:42:03 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7879D106564A
	for <freebsd-scsi@freebsd.org>; Tue,  1 Nov 2011 18:42:03 +0000 (UTC)
	(envelope-from nitroboost@gmail.com)
Received: from mail-dy0-f54.google.com (mail-dy0-f54.google.com
	[209.85.220.54])
	by mx1.freebsd.org (Postfix) with ESMTP id EE6068FC16
	for <freebsd-scsi@freebsd.org>; Tue,  1 Nov 2011 18:42:02 +0000 (UTC)
Received: by dye36 with SMTP id 36so397915dye.13
	for <freebsd-scsi@freebsd.org>; Tue, 01 Nov 2011 11:42:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:content-type;
	bh=3trq6LFWsnHWqs3tuc6y7zF9rP+zyikr5hs04y/L/rs=;
	b=Fugmkxctt2nkyllbtSUgysM7z1Fu7JrL3e6rypghXRVVmM+Hs5UuUuFuYODecwqyzM
	GRetBFz+3GoAL3pRibYtB2RN1dbc+fsEZSOIJCxgJmL0HdN8j4OJgyJ+U8g9GvF+qvxW
	s2Y1hkd8bYvz2r17AHIWWO0Y6y7XNIWxp6r+M=
MIME-Version: 1.0
Received: by 10.182.115.40 with SMTP id jl8mr157403obb.8.1320171197190; Tue,
	01 Nov 2011 11:13:17 -0700 (PDT)
Received: by 10.182.35.193 with HTTP; Tue, 1 Nov 2011 11:13:17 -0700 (PDT)
Date: Tue, 1 Nov 2011 11:13:17 -0700
Message-ID: <CAAAm0r2-pXLEZVoG7g_dkym6MzLJXggjOQh3a8t5QO90vPJvfw@mail.gmail.com>
From: Jason Wolfe <nitroboost@gmail.com>
To: freebsd-scsi@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: mps/LSI SAS2008 controller crashes when smartctl is run with upped
 disk tags
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Nov 2011 18:42:03 -0000

Hello,

I have an issue with the mps driver on 8.2 where running 'smartctl -a'
rarely causes the controller to freak out when disk tags are > 2.  I've
confirmed settings the tags to 1 resolves this crash, so that surely is a
clue in the right direction..  I'm using Seagate 1TB SAS drives -
ST91000640SS, and these are SuperMicro X8DTT-H chasis.  This happens across
over a thousand servers, so it surely not flaky hardware.  It could
obviously be some interoperability with these model drives and the mps
controller, but unfortunately I don't have any other drives deployed on
these cards to test that theory out :/

Luckily remote syslogging is enabled, so while nothing is kept locally, we
see these messages similar to these transmitted before the server hangs,
requiring a power cycle:

(da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
510
(da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
713
(da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
942
(da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
356
(da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
492
(da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
976
(da11:mps0:0:12:0): SCSI command timeout on device handle 0x0015 SMID
339
(da11:mps0:0:12:0): SCSI command timeout on device handle 0x0015 SMID
746
(da5:mps0:0:6:0): SCSI command timeout on device handle 0x000f SMID 74
(da6:mps0:0:7:0): SCSI command timeout on device handle 0x0010 SMID
613
(da2:mps0:0:3:0): SCSI command timeout on device handle 0x000c SMID 16
(da10:mps0:0:11:0): SCSI command timeout on device handle 0x0014 SMID
305
(da1:mps0:0:2:0): SCSI command timeout on device handle 0x000b SMID 74
(da6:mps0:0:7:0): SCSI command timeout on device handle 0x0010 SMID
594

In some cases that would be followed by this, which would usually be the
last transmission, though we don't see this in all cases.  It may just be
the system isn't always alive long enough to transmit:

kernel: mps0: IOC Fault 0x40006003, Resetting


I'm able to reproduce fairly easily within a minute or two by heavily
loading the disks up by whatever means, and running smartctl -a in a loop:

#!/bin/sh -x

disks=`sysctl -n kern.disks|xargs -n1|grep ^da`

for disk in $disks; do
camcontrol tags $disk -N 4
done

for z in `yes|head -100`; do
for disk in $disks; do
smartctl -s on -a /dev/$disk
done
done

mps0: <LSI SAS2008> port 0xe000-0xe0ff mem
0xfbd3c000-0xfbd3ffff,0xfbd40000-0xfbd7ffff irq 26 at device 0.0 on pci4
mps0: Firmware: 07.00.00.00
mps0: IOCCapabilities:
1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps0: [ITHREAD]
da0 at mps0 bus 0 scbus0 target 1 lun 0
da1 at mps0 bus 0 scbus0 target 2 lun 0
da2 at mps0 bus 0 scbus0 target 3 lun 0
da3 at mps0 bus 0 scbus0 target 4 lun 0
da4 at mps0 bus 0 scbus0 target 5 lun 0
da5 at mps0 bus 0 scbus0 target 6 lun 0
da6 at mps0 bus 0 scbus0 target 7 lun 0
da7 at mps0 bus 0 scbus0 target 8 lun 0
da8 at mps0 bus 0 scbus0 target 9 lun 0
da9 at mps0 bus 0 scbus0 target 10 lun 0
da10 at mps0 bus 0 scbus0 target 11 lun 0
da11 at mps0 bus 0 scbus0 target 12 lun 0
ses0 at mps0 bus 0 scbus0 target 13 lun 0

mps0@pci0:4:0:0: class=0x010700 card=0x040015d9 chip=0x00721000 rev=0x02
hdr=0x00
vendor = 'LSI Logic (Was: Symbios Logic, NCR)'
class = mass storage
subclass = SAS

<SEAGATE ST91000640SS 0001> at scbus0 target 1 lun 0 (pass0,da0)
<SEAGATE ST91000640SS 0001> at scbus0 target 2 lun 0 (pass1,da1)
<SEAGATE ST91000640SS 0001> at scbus0 target 3 lun 0 (pass2,da2)
<SEAGATE ST91000640SS 0001> at scbus0 target 4 lun 0 (pass3,da3)
<SEAGATE ST91000640SS 0001> at scbus0 target 5 lun 0 (pass4,da4)
<SEAGATE ST91000640SS 0001> at scbus0 target 6 lun 0 (pass5,da5)
<SEAGATE ST91000640SS 0001> at scbus0 target 7 lun 0 (pass6,da6)
<SEAGATE ST91000640SS 0001> at scbus0 target 8 lun 0 (pass7,da7)
<SEAGATE ST91000640SS 0001> at scbus0 target 9 lun 0 (pass8,da8)
<SEAGATE ST91000640SS 0001> at scbus0 target 10 lun 0 (pass9,da9)
<SEAGATE ST91000640SS 0001> at scbus0 target 11 lun 0 (pass10,da10)
<SEAGATE ST91000640SS 0001> at scbus0 target 12 lun 0 (pass11,da11)
<LSI CORP SAS2X28 0717> at scbus0 target 13 lun 0 (ses0,pass12)

Thank you sirs,

Jason Wolfe

From owner-freebsd-scsi@FreeBSD.ORG  Tue Nov  1 19:18:07 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C5846106566C
	for <freebsd-scsi@freebsd.org>; Tue,  1 Nov 2011 19:18:07 +0000 (UTC)
	(envelope-from peter.maloney@brockmann-consult.de)
Received: from mo-p05-ob6.rzone.de (mo-p05-ob6.rzone.de
	[IPv6:2a01:238:20a:202:53f5::1])
	by mx1.freebsd.org (Postfix) with ESMTP id C91908FC18
	for <freebsd-scsi@freebsd.org>; Tue,  1 Nov 2011 19:18:06 +0000 (UTC)
X-RZG-AUTH: :LWIKdA2leu0bPbLmhzXgqn0MTG6qiKEwQRWfNxSw4HzYIwjsnvdDt2oX8drk23mpKMZH7NA=
X-RZG-CLASS-ID: mo05
Received: from [192.168.179.42]
	(hmbg-5f7606d1.pool.mediaWays.net [95.118.6.209])
	by post.strato.de (mrclete mo57) (RZmta 26.10 AUTH)
	with (DHE-RSA-AES128-SHA encrypted) ESMTPA id w01ed2nA1IP5pQ
	for <freebsd-scsi@freebsd.org>; Tue, 1 Nov 2011 20:17:45 +0100 (MET)
Message-ID: <4EAEF431.7090108@brockmann-consult.de>
Date: Mon, 31 Oct 2011 20:17:05 +0100
From: Peter Maloney <peter.maloney@brockmann-consult.de>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
	rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
MIME-Version: 1.0
To: freebsd-scsi@freebsd.org
References: <CAAAm0r2-pXLEZVoG7g_dkym6MzLJXggjOQh3a8t5QO90vPJvfw@mail.gmail.com>
In-Reply-To: <CAAAm0r2-pXLEZVoG7g_dkym6MzLJXggjOQh3a8t5QO90vPJvfw@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with
 upped disk tags
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Nov 2011 19:18:07 -0000

Dear Jason,

Am 01.11.2011 19:13, schrieb Jason Wolfe:
> Hello,
>
> I have an issue with the mps driver on 8.2 where running 'smartctl -a'
> rarely causes the controller to freak out when disk tags are > 2.  I've
> confirmed settings the tags to 1 resolves this crash, so that surely is a
> clue in the right direction..  I'm using Seagate 1TB SAS drives -
> ST91000640SS, and these are SuperMicro X8DTT-H chasis.  This happens across
> over a thousand servers, so it surely not flaky hardware.  It could
> obviously be some interoperability with these model drives and the mps
> controller, but unfortunately I don't have any other drives deployed on
> these cards to test that theory out :/
I get a simlar problem on a system with an LSI 9211-8i with 20 SATA
disks attached (2 SSDs and 18 spnning disks). My system doesn't hang,
panic, or reset though. I just lose access to one disk, which is then
considered FAULTED in my zpool status (with the ZFS file system). If I
physically remove the FAULTED disk and run "gpart recover da0", I get a
panic. Otherwise, the system keeps running in a degraded state.  When I
reboot and resilver, some data is found damaged and repaired, not just
refreshed with the latest state. The server has 1 HBA and 2 backplanes,
and I have the 2 mirrored root disks on different backplanes. Maybe that
is why mine runs degraded and yours hang.

This happened twice so far (in around a month or two), and both times it
was one of the mirrored root disks (SSDs) that faulted.

My tags are set to 255. I will try reproducing it as you said, and then
if it fails, rebooting and trying again setting tags to 2 as you suggested.

And *thank you very much for this information*. This is the last
outstanding issue with this server. I hope this workaround helps.

# camcontrol tags /dev/da0
(pass0:mps0:0:7:0): device openings: 255

>
> Luckily remote syslogging is enabled, so while nothing is kept locally, we
> see these messages similar to these transmitted before the server hangs,
> requiring a power cycle:
>
> (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
> 510
> (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
> 713
> (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
> 942
> (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
> 356
> (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
> 492
> (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
> 976
> (da11:mps0:0:12:0): SCSI command timeout on device handle 0x0015 SMID
> 339
> (da11:mps0:0:12:0): SCSI command timeout on device handle 0x0015 SMID
> 746
> (da5:mps0:0:6:0): SCSI command timeout on device handle 0x000f SMID 74
> (da6:mps0:0:7:0): SCSI command timeout on device handle 0x0010 SMID
> 613
> (da2:mps0:0:3:0): SCSI command timeout on device handle 0x000c SMID 16
> (da10:mps0:0:11:0): SCSI command timeout on device handle 0x0014 SMID
> 305
> (da1:mps0:0:2:0): SCSI command timeout on device handle 0x000b SMID 74
> (da6:mps0:0:7:0): SCSI command timeout on device handle 0x0010 SMID
> 594
>
> In some cases that would be followed by this, which would usually be the
> last transmission, though we don't see this in all cases.  It may just be
> the system isn't always alive long enough to transmit:
>
> kernel: mps0: IOC Fault 0x40006003, Resetting
>
>
> I'm able to reproduce fairly easily within a minute or two by heavily
> loading the disks up by whatever means, and running smartctl -a in a loop:
>
> #!/bin/sh -x
>
> disks=`sysctl -n kern.disks|xargs -n1|grep ^da`
>
> for disk in $disks; do
> camcontrol tags $disk -N 4
> done
>
> for z in `yes|head -100`; do
> for disk in $disks; do
> smartctl -s on -a /dev/$disk
> done
> done
>
> mps0: <LSI SAS2008> port 0xe000-0xe0ff mem
> 0xfbd3c000-0xfbd3ffff,0xfbd40000-0xfbd7ffff irq 26 at device 0.0 on pci4
> mps0: Firmware: 07.00.00.00
> mps0: IOCCapabilities:
> 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
> mps0: [ITHREAD]
> da0 at mps0 bus 0 scbus0 target 1 lun 0
> da1 at mps0 bus 0 scbus0 target 2 lun 0
> da2 at mps0 bus 0 scbus0 target 3 lun 0
> da3 at mps0 bus 0 scbus0 target 4 lun 0
> da4 at mps0 bus 0 scbus0 target 5 lun 0
> da5 at mps0 bus 0 scbus0 target 6 lun 0
> da6 at mps0 bus 0 scbus0 target 7 lun 0
> da7 at mps0 bus 0 scbus0 target 8 lun 0
> da8 at mps0 bus 0 scbus0 target 9 lun 0
> da9 at mps0 bus 0 scbus0 target 10 lun 0
> da10 at mps0 bus 0 scbus0 target 11 lun 0
> da11 at mps0 bus 0 scbus0 target 12 lun 0
> ses0 at mps0 bus 0 scbus0 target 13 lun 0
>
> mps0@pci0:4:0:0: class=0x010700 card=0x040015d9 chip=0x00721000 rev=0x02
> hdr=0x00
> vendor = 'LSI Logic (Was: Symbios Logic, NCR)'
> class = mass storage
> subclass = SAS
>
> <SEAGATE ST91000640SS 0001> at scbus0 target 1 lun 0 (pass0,da0)
> <SEAGATE ST91000640SS 0001> at scbus0 target 2 lun 0 (pass1,da1)
> <SEAGATE ST91000640SS 0001> at scbus0 target 3 lun 0 (pass2,da2)
> <SEAGATE ST91000640SS 0001> at scbus0 target 4 lun 0 (pass3,da3)
> <SEAGATE ST91000640SS 0001> at scbus0 target 5 lun 0 (pass4,da4)
> <SEAGATE ST91000640SS 0001> at scbus0 target 6 lun 0 (pass5,da5)
> <SEAGATE ST91000640SS 0001> at scbus0 target 7 lun 0 (pass6,da6)
> <SEAGATE ST91000640SS 0001> at scbus0 target 8 lun 0 (pass7,da7)
> <SEAGATE ST91000640SS 0001> at scbus0 target 9 lun 0 (pass8,da8)
> <SEAGATE ST91000640SS 0001> at scbus0 target 10 lun 0 (pass9,da9)
> <SEAGATE ST91000640SS 0001> at scbus0 target 11 lun 0 (pass10,da10)
> <SEAGATE ST91000640SS 0001> at scbus0 target 12 lun 0 (pass11,da11)
> <LSI CORP SAS2X28 0717> at scbus0 target 13 lun 0 (ses0,pass12)
>
> Thank you sirs,
>
> Jason Wolfe
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"


And my logs to compare:

(note my root, swap, zfs cache, and zfs log is on the disk that fails).
)

root@bcnas1:/var/log# swapinfo
Device          1K-blocks     Used    Avail Capacity
/dev/gpt/swap0     524288     5840   518448     1%
/dev/gpt/swap1     524288     5640   518648     1%
Total             1048576    11480  1037096     1%

When it starts happening, it looks like this:
Oct 29 00:02:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 220
Oct 29 00:02:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 87
Oct 29 00:02:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 795
Oct 29 00:02:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 423
Oct 29 00:02:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 338
Oct 29 02:19:12 bcnas1 kernel: :9:0): SCSI command timeout on device
handle 0x0016 SMID 170
Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 637
Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 335
Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 798
Oct 29 02:19:12 bcnas1 kernel: mps0: (0:9:0) terminated ioc 804b scsi 0
state c xfer 0
Oct 29 02:19:12 bcnas1 last message repeated 14 times
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 991 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 4
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 4 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 227
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 227 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 652
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 652 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 125
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 125 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 101
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 1017 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 100
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 1004 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 487
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 487 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 279
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 279 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 929
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 929 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 346
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 346 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 817
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 817 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 170
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 170 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 637
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 637 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 335
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 335 complete
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 798
Oct 29 02:19:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort
request on handle 0x16 SMID 798 complete
Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 757
Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 833
Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 804
Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 464
Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 144
Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 912
Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 753
Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 422
Oct 29 02:19:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout
on device handle 0x0016 SMID 241

And then just before I rebooted it, basically looked the same, with the
different messages mixed together:

ct 31 07:52:12 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 807
ct 31 07:52:12 bcnas1 kernel: mps0: mpssas_abort_complete: abort request
on handle 0x16 SMID 807 complete
ct 31 07:53:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on
device handle 0x0016 SMID 1006
ct 31 07:53:12 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on
device handle 0x0016 SMID 111
ct 31 07:53:20 bcnas1 kernel: mps0: (0:9:0) terminated ioc 804b scsi 0
state c xfer 0
ct 31 07:53:20 bcnas1 kernel: mps0: mpssas_abort_complete: abort request
on handle 0x16 SMID 1006 complete
ct 31 07:53:20 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 111
ct 31 07:53:20 bcnas1 kernel: mps0: mpssas_abort_complete: abort request
on handle 0x16 SMID 111 complete
ct 31 07:54:20 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on
device handle 0x0016 SMID 669
ct 31 07:54:20 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on
device handle 0x0016 SMID 912
ct 31 07:54:28 bcnas1 kernel: mps0: (0:9:0) terminated ioc 804b scsi 0
state c xfer 0
ct 31 07:54:28 bcnas1 kernel: mps0: mpssas_abort_complete: abort request
on handle 0x16 SMID 669 complete
ct 31 07:54:28 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 912
ct 31 07:54:28 bcnas1 kernel: mps0: mpssas_abort_complete: abort request
on handle 0x16 SMID 912 complete
ct 31 07:55:29 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on
device handle 0x0016 SMID 804
ct 31 07:55:29 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on
device handle 0x0016 SMID 1001
ct 31 07:55:36 bcnas1 kernel: mps0: (0:9:0) terminated ioc 804b scsi 0
state c xfer 0
ct 31 07:55:36 bcnas1 kernel: mps0: mpssas_abort_complete: abort request
on handle 0x16 SMID 804 complete
ct 31 07:55:36 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 1001
ct 31 07:55:36 bcnas1 kernel: mps0: mpssas_abort_complete: abort request
on handle 0x16 SMID 1001 complete
ct 31 07:56:36 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on
device handle 0x0016 SMID 389
ct 31 07:56:36 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on
device handle 0x0016 SMID 885
ct 31 07:56:44 bcnas1 kernel: mps0: (0:9:0) terminated ioc 804b scsi 0
state c xfer 0
ct 31 07:56:44 bcnas1 kernel: mps0: mpssas_abort_complete: abort request
on handle 0x16 SMID 389 complete
ct 31 07:56:44 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending
deferred task management request for handle 0x16 SMID 885
ct 31 07:56:44 bcnas1 kernel: swap_pager: I/O error - pageout failed;
blkno 131393,size 65536, error 5
ct 31 07:56:44 bcnas1 kernel: mps0: mpssas_abort_complete: abort request
on handle 0x16 SMID 885 complete
ct 31 07:57:45 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on
device handle 0x0016 SMID 442
ct 31 07:57:48 bcnas1 kernel: mps0: mpssas_abort_complete: abort request
on handle 0x16 SMID 442 complete
ct 31 07:58:49 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on
device handle 0x0016 SMID 413
ct 31 07:58:52 bcnas1 kernel: mps0: mpssas_abort_complete: abort request
on handle 0x16 SMID 413 complete
ct 31 07:59:53 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on
device handle 0x0016 SMID 90
ct 31 07:59:56 bcnas1 kernel: mps0: mpssas_abort_complete: abort request
on handle 0x16 SMID 90 complete
ct 31 08:00:56 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on
device handle 0x0016 SMID 504
ct 31 08:01:00 bcnas1 kernel: mps0: mpssas_abort_complete: abort request
on handle 0x16 SMID 504 complete
ct 31 08:02:01 bcnas1 kernel: (da10:mps0:0:9:0): SCSI command timeout on
device handle 0x0016 SMID 861
ct 31 08:02:04 bcnas1 kernel: mps0: swap_pager: I/O error - pageout
failed; blkno 131409,size 49152, error 5mpssas_abort_complete: abort
request on handle 0x16
SMID 861 complete


From owner-freebsd-scsi@FreeBSD.ORG  Tue Nov  1 20:32:04 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 76867106566B
	for <freebsd-scsi@freebsd.org>; Tue,  1 Nov 2011 20:32:04 +0000 (UTC)
	(envelope-from nitroboost@gmail.com)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 004EC8FC16
	for <freebsd-scsi@freebsd.org>; Tue,  1 Nov 2011 20:32:03 +0000 (UTC)
Received: by bkbzs2 with SMTP id zs2so5225899bkb.13
	for <freebsd-scsi@freebsd.org>; Tue, 01 Nov 2011 13:32:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=+Fn/8sZCdbJlV92LzMXlrggIA4HOQJixyp4tB0xttk8=;
	b=YJSajKwityjlkE+LWXxvjnx90odRVY0+VAOF5/nzH8MS2SIe6pqAzX4xcAhEm/xpOB
	hEXVsJJHVCgv/4CV+Vsrs/jH7EB9OHmHwjdcdur9tt0Y8HTH9Yzx34Bed6sAAHYSgI8v
	OhKwBlFyocOQIW6nDVF3D4PaatHOgMuEesiTc=
MIME-Version: 1.0
Received: by 10.182.74.41 with SMTP id q9mr257137obv.28.1320179522178; Tue, 01
	Nov 2011 13:32:02 -0700 (PDT)
Received: by 10.182.35.193 with HTTP; Tue, 1 Nov 2011 13:32:01 -0700 (PDT)
In-Reply-To: <4EAEF431.7090108@brockmann-consult.de>
References: <CAAAm0r2-pXLEZVoG7g_dkym6MzLJXggjOQh3a8t5QO90vPJvfw@mail.gmail.com>
	<4EAEF431.7090108@brockmann-consult.de>
Date: Tue, 1 Nov 2011 13:32:01 -0700
Message-ID: <CAAAm0r1T1ifTQt5A5O+jwUoKoGjzcbho606wCt4SpM3AQ-WM3Q@mail.gmail.com>
From: Jason Wolfe <nitroboost@gmail.com>
To: Peter Maloney <peter.maloney@brockmann-consult.de>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-scsi@freebsd.org
Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with
 upped disk tags
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Nov 2011 20:32:04 -0000

On Mon, Oct 31, 2011 at 12:17 PM, Peter Maloney <
peter.maloney@brockmann-consult.de> wrote:

> Dear Jason,
>
> I get a simlar problem on a system with an LSI 9211-8i with 20 SATA
> disks attached (2 SSDs and 18 spnning disks). My system doesn't hang,
> panic, or reset though. I just lose access to one disk, which is then
> considered FAULTED in my zpool status (with the ZFS file system). If I
> physically remove the FAULTED disk and run "gpart recover da0", I get a
> panic. Otherwise, the system keeps running in a degraded state.  When I
> reboot and resilver, some data is found damaged and repaired, not just
> refreshed with the latest state. The server has 1 HBA and 2 backplanes,
> and I have the 2 mirrored root disks on different backplanes. Maybe that
> is why mine runs degraded and yours hang.
>
> This happened twice so far (in around a month or two), and both times it
> was one of the mirrored root disks (SSDs) that faulted.
>
> My tags are set to 255. I will try reproducing it as you said, and then
> if it fails, rebooting and trying again setting tags to 2 as you suggested.
>
> And *thank you very much for this information*. This is the last
> outstanding issue with this server. I hope this workaround helps.
>
> # camcontrol tags /dev/da0
> (pass0:mps0:0:7:0): device openings: 255
>

Peter,

This happens 'randomly' for you, or do you have some automated process
running smartctl that trips the drives up occasionally? The way I'm getting
around it currently is to just move /usr/local/sbin/smartctl elsewhere, and
replacing it with a wrapper that simply drops the tags to 1, executes to
the new smartctl location with the options passed, then moves the tags back
to whatever you prefer. There will obviously be a small detriment here, but
it should be fairly quick and hopefully not even noticeable in your case.

If smartctl is not triggering these events for you, any idea what is?

Jason

From owner-freebsd-scsi@FreeBSD.ORG  Tue Nov  1 21:30:15 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9FCC9106566C
	for <freebsd-scsi@hub.freebsd.org>;
	Tue,  1 Nov 2011 21:30:15 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 8E71F8FC0C
	for <freebsd-scsi@hub.freebsd.org>;
	Tue,  1 Nov 2011 21:30:15 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id pA1LUFkF011387
	for <freebsd-scsi@freefall.freebsd.org>; Tue, 1 Nov 2011 21:30:15 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id pA1LUFel011384;
	Tue, 1 Nov 2011 21:30:15 GMT (envelope-from gnats)
Date: Tue, 1 Nov 2011 21:30:15 GMT
Message-Id: <201111012130.pA1LUFel011384@freefall.freebsd.org>
To: freebsd-scsi@FreeBSD.org
From: dfilter@FreeBSD.ORG (dfilter service)
Cc: 
Subject: Re: kern/124667: commit references a PR
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: dfilter service <dfilter@FreeBSD.ORG>
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Nov 2011 21:30:15 -0000

The following reply was made to PR kern/124667; it has been noted by GNATS.

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/124667: commit references a PR
Date: Tue,  1 Nov 2011 21:27:08 +0000 (UTC)

 Author: marius
 Date: Tue Nov  1 21:26:57 2011
 New Revision: 227006
 URL: http://svn.freebsd.org/changeset/base/227006
 
 Log:
   Add a PCI front-end to esp(4) allowing it to support AMD Am53C974 and
   replace amd(4) with the former in the amd64, i386 and pc98 GENERIC kernel
   configuration files. Besides duplicating functionality, amd(4), which
   previously also supported the AMD Am53C974, unlike esp(4) is no longer
   maintained and has accumulated enough bit rot over time to always cause
   a panic during boot as long as at least one target is attached to it
   (see PR 124667).
   
   PR:		124667
   Obtained from:	NetBSD (based on)
   MFC after:	3 days
 
 Added:
   head/sys/dev/esp/am53c974reg.h   (contents, props changed)
   head/sys/dev/esp/esp_pci.c   (contents, props changed)
 Modified:
   head/UPDATING
   head/sys/amd64/conf/GENERIC
   head/sys/conf/NOTES
   head/sys/conf/files
   head/sys/i386/conf/GENERIC
   head/sys/modules/Makefile
   head/sys/modules/esp/Makefile
   head/sys/pc98/conf/GENERIC
   head/sys/sparc64/conf/GENERIC
 
 Modified: head/UPDATING
 ==============================================================================
 --- head/UPDATING	Tue Nov  1 21:21:36 2011	(r227005)
 +++ head/UPDATING	Tue Nov  1 21:26:57 2011	(r227006)
 @@ -22,6 +22,10 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 10
  	machines to maximize performance.  (To disable malloc debugging, run
  	ln -s aj /etc/malloc.conf.)
  
 +20111101:
 +	The broken amd(4) driver has been replaced with esp(4) in the amd64,
 +	i386 and pc98 GENERIC kernel configuration files.
 +
  20110930:
  	sysinstall has been removed
  
 
 Modified: head/sys/amd64/conf/GENERIC
 ==============================================================================
 --- head/sys/amd64/conf/GENERIC	Tue Nov  1 21:21:36 2011	(r227005)
 +++ head/sys/amd64/conf/GENERIC	Tue Nov  1 21:26:57 2011	(r227006)
 @@ -107,7 +107,7 @@ options 	AHC_REG_PRETTY_PRINT	# Print re
  device		ahd		# AHA39320/29320 and onboard AIC79xx devices
  options 	AHD_REG_PRETTY_PRINT	# Print register bitfields in debug
  					# output.  Adds ~215k to driver.
 -device		amd		# AMD 53C974 (Tekram DC-390(T))
 +device		esp		# AMD Am53C974 (Tekram DC-390(T))
  device		hptiop		# Highpoint RocketRaid 3xxx series
  device		isp		# Qlogic family
  #device		ispfw		# Firmware for QLogic HBAs- normally a module
 
 Modified: head/sys/conf/NOTES
 ==============================================================================
 --- head/sys/conf/NOTES	Tue Nov  1 21:21:36 2011	(r227005)
 +++ head/sys/conf/NOTES	Tue Nov  1 21:26:57 2011	(r227006)
 @@ -1459,7 +1459,9 @@ options 	TEKEN_UTF8		# UTF-8 output hand
  #      such as the Tekram DC-390(T).
  # bt:  Most Buslogic controllers: including BT-445, BT-54x, BT-64x, BT-74x,
  #      BT-75x, BT-946, BT-948, BT-956, BT-958, SDC3211B, SDC3211F, SDC3222F
 -# esp: NCR53c9x.  Only for SBUS hardware right now.
 +# esp: Emulex ESP, NCR 53C9x and QLogic FAS families based controllers
 +#      including the AMD Am53C974 (found on devices such as the Tekram
 +#      DC-390(T)) and the Sun ESP and FAS families of controllers
  # isp: Qlogic ISP 1020, 1040 and 1040B PCI SCSI host adapters,
  #      ISP 1240 Dual Ultra SCSI, ISP 1080 and 1280 (Dual) Ultra2,
  #      ISP 12160 Ultra3 SCSI,
 
 Modified: head/sys/conf/files
 ==============================================================================
 --- head/sys/conf/files	Tue Nov  1 21:21:36 2011	(r227005)
 +++ head/sys/conf/files	Tue Nov  1 21:26:57 2011	(r227006)
 @@ -1064,6 +1064,7 @@ dev/ep/if_ep_eisa.c		optional ep eisa
  dev/ep/if_ep_isa.c		optional ep isa
  dev/ep/if_ep_mca.c		optional ep mca
  dev/ep/if_ep_pccard.c		optional ep pccard
 +dev/esp/esp_pci.c		optional esp pci
  dev/esp/ncr53c9x.c		optional esp
  dev/ex/if_ex.c			optional ex
  dev/ex/if_ex_isa.c		optional ex isa
 
 Added: head/sys/dev/esp/am53c974reg.h
 ==============================================================================
 --- /dev/null	00:00:00 1970	(empty, because file is newly added)
 +++ head/sys/dev/esp/am53c974reg.h	Tue Nov  1 21:26:57 2011	(r227006)
 @@ -0,0 +1,72 @@
 +/*	$NetBSD: pcscpreg.h,v 1.2 2008/04/28 20:23:55 martin Exp $	*/
 +
 +/*-
 + * Copyright (c) 1998 The NetBSD Foundation, Inc.
 + * All rights reserved.
 + *
 + * This code is derived from software contributed to The NetBSD Foundation
 + * by Izumi Tsutsui.
 + *
 + * Redistribution and use in source and binary forms, with or without
 + * modification, are permitted provided that the following conditions
 + * are met:
 + * 1. Redistributions of source code must retain the above copyright
 + *    notice, this list of conditions and the following disclaimer.
 + * 2. Redistributions in binary form must reproduce the above copyright
 + *    notice, this list of conditions and the following disclaimer in the
 + *    documentation and/or other materials provided with the distribution.
 + *
 + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
 + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
 + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 + * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
 + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 + * POSSIBILITY OF SUCH DAMAGE.
 + */
 +
 +/* $FreeBSD$ */
 +
 +#ifndef _AM53C974_H_
 +#define	_AM53C974_H_
 +
 +/*
 + * Am53c974 DMA engine registers
 + */
 +
 +#define	DMA_CMD		0x40 		/* Command */
 +#define	 DMACMD_RSVD	0xFFFFFF28	/* reserved */
 +#define	 DMACMD_DIR	0x00000080	/* Transfer Direction (read:1) */
 +#define	 DMACMD_INTE	0x00000040	/* DMA Interrupt Enable	*/
 +#define	 DMACMD_MDL	0x00000010	/* Map to Memory Description List */
 +#define	 DMACMD_DIAG	0x00000004	/* Diagnostic */
 +#define	 DMACMD_CMD	0x00000003	/* Command Code Bit */
 +#define	  DMACMD_IDLE	0x00000000	/*  Idle */
 +#define	  DMACMD_BLAST	0x00000001	/*  Blast */
 +#define	  DMACMD_ABORT	0x00000002	/*  Abort */
 +#define	  DMACMD_START	0x00000003	/*  Start */
 +
 +#define	DMA_STC		0x44		/* Start Transfer Count */
 +#define	DMA_SPA		0x48		/* Start Physical Address */
 +#define	DMA_WBC		0x4C		/* Working Byte Counter */
 +#define	DMA_WAC		0x50		/* Working Address Counter */
 +
 +#define	DMA_STAT	0x54		/* Status Register */
 +#define	 DMASTAT_RSVD	0xFFFFFF80	/* reserved */
 +#define	 DMASTAT_PABT	0x00000040	/* PCI master/target Abort */
 +#define	 DMASTAT_BCMP	0x00000020	/* BLAST Complete */
 +#define	 DMASTAT_SINT	0x00000010	/* SCSI Interrupt */
 +#define	 DMASTAT_DONE	0x00000008	/* DMA Transfer Terminated */
 +#define	 DMASTAT_ABT	0x00000004	/* DMA Transfer Aborted */
 +#define	 DMASTAT_ERR	0x00000002	/* DMA Transfer Error */
 +#define	 DMASTAT_PWDN	0x00000001	/* Power Down Indicator */
 +
 +#define	DMA_SMDLA	0x58	/* Starting Memory Descpritor List Address */
 +#define	DMA_WMAC	0x5C	/* Working MDL Counter */
 +#define	DMA_SBAC	0x70	/* SCSI Bus and Control */
 +
 +#endif /* _AM53C974_H_ */
 
 Added: head/sys/dev/esp/esp_pci.c
 ==============================================================================
 --- /dev/null	00:00:00 1970	(empty, because file is newly added)
 +++ head/sys/dev/esp/esp_pci.c	Tue Nov  1 21:26:57 2011	(r227006)
 @@ -0,0 +1,654 @@
 +/*-
 + * Copyright (c) 2011 Marius Strobl <marius@FreeBSD.org>
 + * All rights reserved.
 + *
 + * Redistribution and use in source and binary forms, with or without
 + * modification, are permitted provided that the following conditions
 + * are met:
 + * 1. Redistributions of source code must retain the above copyright
 + *    notice, this list of conditions and the following disclaimer.
 + * 2. Redistributions in binary form must reproduce the above copyright
 + *    notice, this list of conditions and the following disclaimer in the
 + *    documentation and/or other materials provided with the distribution.
 + *
 + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 + * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 + * SUCH DAMAGE.
 + */
 +
 +/*	$NetBSD: pcscp.c,v 1.45 2010/11/13 13:52:08 uebayasi Exp $	*/
 +
 +/*-
 + * Copyright (c) 1997, 1998, 1999 The NetBSD Foundation, Inc.
 + * All rights reserved.
 + *
 + * This code is derived from software contributed to The NetBSD Foundation
 + * by Jason R. Thorpe of the Numerical Aerospace Simulation Facility,
 + * NASA Ames Research Center; Izumi Tsutsui.
 + *
 + * Redistribution and use in source and binary forms, with or without
 + * modification, are permitted provided that the following conditions
 + * are met:
 + * 1. Redistributions of source code must retain the above copyright
 + *    notice, this list of conditions and the following disclaimer.
 + * 2. Redistributions in binary form must reproduce the above copyright
 + *    notice, this list of conditions and the following disclaimer in the
 + *    documentation and/or other materials provided with the distribution.
 + *
 + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
 + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
 + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 + * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
 + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 + * POSSIBILITY OF SUCH DAMAGE.
 + */
 +
 +/*
 + * esp_pci.c: device dependent code for AMD Am53c974 (PCscsi-PCI)
 + * written by Izumi Tsutsui <tsutsui@NetBSD.org>
 + *
 + * Technical manual available at
 + * http://www.amd.com/files/connectivitysolutions/networking/archivednetworking/19113.pdf
 + */
 +
 +#include <sys/cdefs.h>
 +__FBSDID("$FreeBSD$");
 +
 +#include <sys/param.h>
 +#include <sys/systm.h>
 +#include <sys/bus.h>
 +#include <sys/endian.h>
 +#include <sys/kernel.h>
 +#include <sys/lock.h>
 +#include <sys/module.h>
 +#include <sys/mutex.h>
 +#include <sys/resource.h>
 +#include <sys/rman.h>
 +
 +#include <machine/bus.h>
 +#include <machine/resource.h>
 +
 +#include <cam/cam.h>
 +#include <cam/cam_ccb.h>
 +#include <cam/scsi/scsi_all.h>
 +
 +#include <dev/pci/pcireg.h>
 +#include <dev/pci/pcivar.h>
 +
 +#include <dev/esp/ncr53c9xreg.h>
 +#include <dev/esp/ncr53c9xvar.h>
 +
 +#include <dev/esp/am53c974reg.h>
 +
 +#define	PCI_DEVICE_ID_AMD53C974	0x20201022
 +
 +struct esp_pci_softc {
 +	struct ncr53c9x_softc	sc_ncr53c9x;	/* glue to MI code */
 +	struct device		*sc_dev;
 +
 +	struct resource *sc_res[2];
 +#define	ESP_PCI_RES_INTR	0
 +#define	ESP_PCI_RES_IO		1
 +
 +	bus_dma_tag_t		sc_pdmat;
 +
 +	bus_dma_tag_t		sc_xferdmat;	/* DMA tag for transfers */
 +	bus_dmamap_t		sc_xferdmam;	/* DMA map for transfers */
 +
 +	void			*sc_ih;		/* interrupt handler */
 +
 +	size_t			sc_dmasize;	/* DMA size */
 +	void			**sc_dmaaddr;	/* DMA address */
 +	size_t			*sc_dmalen;	/* DMA length */
 +	int			sc_active;	/* DMA state */
 +	int			sc_datain;	/* DMA Data Direction */
 +};
 +
 +static struct resource_spec esp_pci_res_spec[] = {
 +	{ SYS_RES_IRQ, 0, RF_SHAREABLE | RF_ACTIVE },	/* ESP_PCI_RES_INTR */
 +	{ SYS_RES_IOPORT, PCIR_BAR(0), RF_ACTIVE },	/* ESP_PCI_RES_IO */
 +	{ -1, 0 }
 +};
 +
 +#define	READ_DMAREG(sc, reg)						\
 +	bus_read_4((sc)->sc_res[ESP_PCI_RES_IO], (reg))
 +#define	WRITE_DMAREG(sc, reg, var)					\
 +	bus_write_4((sc)->sc_res[ESP_PCI_RES_IO], (reg), (var))
 +
 +#define	READ_ESPREG(sc, reg)						\
 +	bus_read_1((sc)->sc_res[ESP_PCI_RES_IO], (reg) << 2)
 +#define	WRITE_ESPREG(sc, reg, val)					\
 +	bus_write_1((sc)->sc_res[ESP_PCI_RES_IO], (reg) << 2, (val))
 +
 +static int	esp_pci_probe(device_t);
 +static int	esp_pci_attach(device_t);
 +static int	esp_pci_detach(device_t);
 +static int	esp_pci_suspend(device_t);
 +static int	esp_pci_resume(device_t);
 +
 +static device_method_t esp_pci_methods[] = {
 +	DEVMETHOD(device_probe,		esp_pci_probe),
 +	DEVMETHOD(device_attach,	esp_pci_attach),
 +	DEVMETHOD(device_detach,	esp_pci_detach),
 +	DEVMETHOD(device_suspend,	esp_pci_suspend),
 +	DEVMETHOD(device_resume,	esp_pci_resume),
 +
 +	KOBJMETHOD_END
 +};
 +
 +static driver_t esp_pci_driver = {
 +	"esp",
 +	esp_pci_methods,
 +	sizeof(struct esp_pci_softc)
 +};
 +
 +DRIVER_MODULE(esp, pci, esp_pci_driver, esp_devclass, 0, 0);
 +MODULE_DEPEND(esp, pci, 1, 1, 1);
 +
 +/*
 + * Functions and the switch for the MI code
 + */
 +static void	esp_pci_dma_go(struct ncr53c9x_softc *);
 +static int	esp_pci_dma_intr(struct ncr53c9x_softc *);
 +static int	esp_pci_dma_isactive(struct ncr53c9x_softc *);
 +
 +static int	esp_pci_dma_isintr(struct ncr53c9x_softc *);
 +static void	esp_pci_dma_reset(struct ncr53c9x_softc *);
 +static int	esp_pci_dma_setup(struct ncr53c9x_softc *, void **, size_t *,
 +		    int, size_t *);
 +static void	esp_pci_dma_stop(struct ncr53c9x_softc *);
 +static void	esp_pci_write_reg(struct ncr53c9x_softc *, int, uint8_t);
 +static uint8_t	esp_pci_read_reg(struct ncr53c9x_softc *, int);
 +static void	esp_pci_xfermap(void *arg, bus_dma_segment_t *segs, int nseg,
 +		    int error);
 +
 +static struct ncr53c9x_glue esp_pci_glue = {
 +	esp_pci_read_reg,
 +	esp_pci_write_reg,
 +	esp_pci_dma_isintr,
 +	esp_pci_dma_reset,
 +	esp_pci_dma_intr,
 +	esp_pci_dma_setup,
 +	esp_pci_dma_go,
 +	esp_pci_dma_stop,
 +	esp_pci_dma_isactive,
 +};
 +
 +static int
 +esp_pci_probe(device_t dev)
 +{
 +
 +	if (pci_get_devid(dev) == PCI_DEVICE_ID_AMD53C974) {
 +		device_set_desc(dev, "AMD Am53C974 Fast-SCSI");
 +		return (BUS_PROBE_DEFAULT);
 +	}
 +
 +	return (ENXIO);
 +}
 +
 +/*
 + * Attach this instance, and then all the sub-devices
 + */
 +static int
 +esp_pci_attach(device_t dev)
 +{
 +	struct esp_pci_softc *esc;
 +	struct ncr53c9x_softc *sc;
 +	int error;
 +
 +	esc = device_get_softc(dev);
 +	sc = &esc->sc_ncr53c9x;
 +
 +	NCR_LOCK_INIT(sc);
 +
 +	esc->sc_dev = dev;
 +	sc->sc_glue = &esp_pci_glue;
 +
 +	pci_enable_busmaster(dev);
 +
 +	error = bus_alloc_resources(dev, esp_pci_res_spec, esc->sc_res);
 +	if (error != 0) {
 +		device_printf(dev, "failed to allocate resources\n");
 +		bus_release_resources(dev, esp_pci_res_spec, esc->sc_res);
 +		return (error);
 +	}
 +
 +	error = bus_dma_tag_create(bus_get_dma_tag(dev), 1, 0,
 +	    BUS_SPACE_MAXADDR_32BIT, BUS_SPACE_MAXADDR, NULL, NULL,
 +	    BUS_SPACE_MAXSIZE_32BIT, BUS_SPACE_UNRESTRICTED,
 +	    BUS_SPACE_MAXSIZE_32BIT, 0, NULL, NULL, &esc->sc_pdmat);
 +	if (error != 0) {
 +		device_printf(dev, "cannot create parent DMA tag\n");
 +		goto fail_res;
 +	}
 +
 +	/*
 +	 * XXX More of this should be in ncr53c9x_attach(), but
 +	 * XXX should we really poke around the chip that much in
 +	 * XXX the MI code?  Think about this more...
 +	 */
 +
 +	/*
 +	 * Set up static configuration info.
 +	 *
 +	 * XXX we should read the configuration from the EEPROM.
 +	 */
 +	sc->sc_id = 7;
 +	sc->sc_cfg1 = sc->sc_id | NCRCFG1_PARENB;
 +	sc->sc_cfg2 = NCRCFG2_SCSI2 | NCRCFG2_FE;
 +	sc->sc_cfg3 = NCRAMDCFG3_IDM | NCRAMDCFG3_FCLK;
 +	sc->sc_cfg4 = NCRAMDCFG4_GE12NS | NCRAMDCFG4_RADE;
 +	sc->sc_rev = NCR_VARIANT_AM53C974;
 +	sc->sc_features = NCR_F_FASTSCSI | NCR_F_DMASELECT;
 +	sc->sc_cfg3_fscsi = NCRAMDCFG3_FSCSI;
 +	sc->sc_freq = 40; /* MHz */
 +
 +	/*
 +	 * This is the value used to start sync negotiations
 +	 * Note that the NCR register "SYNCTP" is programmed
 +	 * in "clocks per byte", and has a minimum value of 4.
 +	 * The SCSI period used in negotiation is one-fourth
 +	 * of the time (in nanoseconds) needed to transfer one byte.
 +	 * Since the chip's clock is given in MHz, we have the following
 +	 * formula: 4 * period = (1000 / freq) * 4
 +	 */
 +	sc->sc_minsync = 1000 / sc->sc_freq;
 +
 +	sc->sc_maxxfer = DFLTPHYS;	/* see below */
 +	sc->sc_maxoffset = 15;
 +	sc->sc_extended_geom = 1;
 +
 +#define	MDL_SEG_SIZE	0x1000	/* 4kbyte per segment */
 +
 +	/*
 +	 * Create the DMA tag and map for the data transfers.
 +	 *
 +	 * Note: given that bus_dma(9) only adheres to the requested alignment
 +	 * for the first segment (and that also only for bus_dmamem_alloc()ed
 +	 * DMA maps) we can't use the Memory Descriptor List.  However, also
 +	 * when not using the MDL, the maximum transfer size apparently is
 +	 * limited to 4k so we have to split transfers up, which plain sucks.
 +	 */
 +	error = bus_dma_tag_create(esc->sc_pdmat, PAGE_SIZE, 0,
 +	    BUS_SPACE_MAXADDR_32BIT, BUS_SPACE_MAXADDR, NULL, NULL,
 +	    MDL_SEG_SIZE, 1, MDL_SEG_SIZE, BUS_DMA_ALLOCNOW,
 +	    busdma_lock_mutex, &sc->sc_lock, &esc->sc_xferdmat);
 +	if (error != 0) {
 +		device_printf(dev, "cannot create transfer DMA tag\n");
 +		goto fail_pdmat;
 +	}
 +	error = bus_dmamap_create(esc->sc_xferdmat, 0, &esc->sc_xferdmam);
 +	if (error != 0) {
 +		device_printf(dev, "cannnot create transfer DMA map\n");
 +		goto fail_xferdmat;
 +	}
 +
 +	error = bus_setup_intr(dev, esc->sc_res[ESP_PCI_RES_INTR],
 +	    INTR_MPSAFE | INTR_TYPE_CAM, NULL, ncr53c9x_intr, sc,
 +	    &esc->sc_ih);
 +	if (error != 0) {
 +		device_printf(dev, "cannot set up interrupt\n");
 +		goto fail_xferdmam;
 +	}
 +
 +	/* Do the common parts of attachment. */
 +	sc->sc_dev = esc->sc_dev;
 +	error = ncr53c9x_attach(sc);
 +	if (error != 0) {
 +		device_printf(esc->sc_dev, "ncr53c9x_attach failed\n");
 +		goto fail_intr;
 +	}
 +
 +	return (0);
 +
 + fail_intr:
 +	 bus_teardown_intr(esc->sc_dev, esc->sc_res[ESP_PCI_RES_INTR],
 +	    esc->sc_ih);
 + fail_xferdmam:
 +	bus_dmamap_destroy(esc->sc_xferdmat, esc->sc_xferdmam);
 + fail_xferdmat:
 +	bus_dma_tag_destroy(esc->sc_xferdmat);
 + fail_pdmat:
 +	bus_dma_tag_destroy(esc->sc_pdmat);
 + fail_res:
 +	bus_release_resources(dev, esp_pci_res_spec, esc->sc_res);
 +	NCR_LOCK_DESTROY(sc);
 +
 +	return (error);
 +}
 +
 +static int
 +esp_pci_detach(device_t dev)
 +{
 +	struct ncr53c9x_softc *sc;
 +	struct esp_pci_softc *esc;
 +	int error;
 +
 +	esc = device_get_softc(dev);
 +	sc = &esc->sc_ncr53c9x;
 +
 +	bus_teardown_intr(esc->sc_dev, esc->sc_res[ESP_PCI_RES_INTR],
 +	    esc->sc_ih);
 +	error = ncr53c9x_detach(sc);
 +	if (error != 0)
 +		return (error);
 +	bus_dmamap_destroy(esc->sc_xferdmat, esc->sc_xferdmam);
 +	bus_dma_tag_destroy(esc->sc_xferdmat);
 +	bus_dma_tag_destroy(esc->sc_pdmat);
 +	bus_release_resources(dev, esp_pci_res_spec, esc->sc_res);
 +	NCR_LOCK_DESTROY(sc);
 +
 +	return (0);
 +}
 +
 +static int
 +esp_pci_suspend(device_t dev)
 +{
 +
 +	return (ENXIO);
 +}
 +
 +static int
 +esp_pci_resume(device_t dev)
 +{
 +
 +	return (ENXIO);
 +}
 +
 +static void
 +esp_pci_xfermap(void *arg, bus_dma_segment_t *segs, int nsegs, int error)
 +{
 +	struct esp_pci_softc *esc = (struct esp_pci_softc *)arg;
 +
 +	if (error != 0)
 +		return;
 +
 +	KASSERT(nsegs == 1, ("%s: bad transfer segment count %d", __func__,
 +	    nsegs));
 +	KASSERT(segs[0].ds_len <= MDL_SEG_SIZE,
 +	    ("%s: bad transfer segment length %ld", __func__,
 +	    (long)segs[0].ds_len));
 +
 +	/* Program the DMA Starting Physical Address. */
 +	WRITE_DMAREG(esc, DMA_SPA, segs[0].ds_addr);
 +}
 +
 +/*
 + * Glue functions
 + */
 +
 +static uint8_t
 +esp_pci_read_reg(struct ncr53c9x_softc *sc, int reg)
 +{
 +	struct esp_pci_softc *esc = (struct esp_pci_softc *)sc;
 +
 +	return (READ_ESPREG(esc, reg));
 +}
 +
 +static void
 +esp_pci_write_reg(struct ncr53c9x_softc *sc, int reg, uint8_t v)
 +{
 +	struct esp_pci_softc *esc = (struct esp_pci_softc *)sc;
 +
 +	WRITE_ESPREG(esc, reg, v);
 +}
 +
 +static int
 +esp_pci_dma_isintr(struct ncr53c9x_softc *sc)
 +{
 +	struct esp_pci_softc *esc = (struct esp_pci_softc *)sc;
 +
 +	return (READ_ESPREG(esc, NCR_STAT) & NCRSTAT_INT) != 0;
 +}
 +
 +static void
 +esp_pci_dma_reset(struct ncr53c9x_softc *sc)
 +{
 +	struct esp_pci_softc *esc = (struct esp_pci_softc *)sc;
 +
 +	WRITE_DMAREG(esc, DMA_CMD, DMACMD_IDLE);
 +
 +	esc->sc_active = 0;
 +}
 +
 +static int
 +esp_pci_dma_intr(struct ncr53c9x_softc *sc)
 +{
 +	struct esp_pci_softc *esc = (struct esp_pci_softc *)sc;
 +	bus_dma_tag_t xferdmat;
 +	bus_dmamap_t xferdmam;
 +	size_t dmasize;
 +	int datain, i, resid, trans;
 +	uint32_t dmastat;
 +	char *p = NULL;
 +
 +	xferdmat = esc->sc_xferdmat;
 +	xferdmam = esc->sc_xferdmam;
 +	datain = esc->sc_datain;
 +
 +	dmastat = READ_DMAREG(esc, DMA_STAT);
 +
 +	if ((dmastat & DMASTAT_ERR) != 0) {
 +		/* XXX not tested... */
 +		WRITE_DMAREG(esc, DMA_CMD, DMACMD_ABORT | (datain != 0 ?
 +		    DMACMD_DIR : 0));
 +
 +		device_printf(esc->sc_dev, "DMA error detected; Aborting.\n");
 +		bus_dmamap_sync(xferdmat, xferdmam, datain != 0 ?
 +		    BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
 +		bus_dmamap_unload(xferdmat, xferdmam);
 +		return (-1);
 +	}
 +
 +	if ((dmastat & DMASTAT_ABT) != 0) {
 +		/* XXX what should be done? */
 +		device_printf(esc->sc_dev, "DMA aborted.\n");
 +		WRITE_DMAREG(esc, DMA_CMD, DMACMD_IDLE | (datain != 0 ?
 +		    DMACMD_DIR : 0));
 +		esc->sc_active = 0;
 +		return (0);
 +	}
 +
 +	KASSERT(esc->sc_active != 0, ("%s: DMA wasn't active", __func__));
 +
 +	/* DMA has stopped. */
 +
 +	esc->sc_active = 0;
 +
 +	dmasize = esc->sc_dmasize;
 +	if (dmasize == 0) {
 +		/* A "Transfer Pad" operation completed. */
 +		NCR_DMA(("%s: discarded %d bytes (tcl=%d, tcm=%d)\n",
 +		    __func__, READ_ESPREG(esc, NCR_TCL) |
 +		    (READ_ESPREG(esc, NCR_TCM) << 8),
 +		    READ_ESPREG(esc, NCR_TCL), READ_ESPREG(esc, NCR_TCM)));
 +		return (0);
 +	}
 +
 +	resid = 0;
 +	/*
 +	 * If a transfer onto the SCSI bus gets interrupted by the device
 +	 * (e.g. for a SAVEPOINTER message), the data in the FIFO counts
 +	 * as residual since the ESP counter registers get decremented as
 +	 * bytes are clocked into the FIFO.
 +	 */
 +	if (datain == 0 &&
 +	    (resid = (READ_ESPREG(esc, NCR_FFLAG) & NCRFIFO_FF)) != 0)
 +		NCR_DMA(("%s: empty esp FIFO of %d ", __func__, resid));
 +
 +	if ((sc->sc_espstat & NCRSTAT_TC) == 0) {
 +		/*
 +		 * "Terminal count" is off, so read the residue
 +		 * out of the ESP counter registers.
 +		 */
 +		if (datain != 0) {
 +			resid = READ_ESPREG(esc, NCR_FFLAG) & NCRFIFO_FF;
 +			while (resid > 1)
 +				resid =
 +				    READ_ESPREG(esc, NCR_FFLAG) & NCRFIFO_FF;
 +			WRITE_DMAREG(esc, DMA_CMD, DMACMD_BLAST | DMACMD_DIR);
 +
 +			for (i = 0; i < 0x8000; i++) /* XXX 0x8000 ? */
 +				if ((READ_DMAREG(esc, DMA_STAT) &
 +				    DMASTAT_BCMP) != 0)
 +					break;
 +
 +			/* See the below comments... */
 +			if (resid != 0)
 +				p = *esc->sc_dmaaddr;
 +		}
 +
 +		resid += READ_ESPREG(esc, NCR_TCL) |
 +		    (READ_ESPREG(esc, NCR_TCM) << 8) |
 +		    (READ_ESPREG(esc, NCR_TCH) << 16);
 +	} else
 +		while ((dmastat & DMASTAT_DONE) == 0)
 +			dmastat = READ_DMAREG(esc, DMA_STAT);
 +
 +	WRITE_DMAREG(esc, DMA_CMD, DMACMD_IDLE | (datain != 0 ?
 +	    DMACMD_DIR : 0));
 +
 +	/* Sync the transfer buffer. */
 +	bus_dmamap_sync(xferdmat, xferdmam, datain != 0 ?
 +	    BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
 +	bus_dmamap_unload(xferdmat, xferdmam);
 +
 +	trans = dmasize - resid;
 +
 +	/*
 +	 * From the technical manual notes:
 +	 *
 +	 * "In some odd byte conditions, one residual byte will be left
 +	 *  in the SCSI FIFO, and the FIFO flags will never count to 0.
 +	 *  When this happens, the residual byte should be retrieved
 +	 *  via PIO following completion of the BLAST operation."
 +	 */
 +	if (p != NULL) {
 +		p += trans;
 +		*p = READ_ESPREG(esc, NCR_FIFO);
 +		trans++;
 +	}
 +
 +	if (trans < 0) {			/* transferred < 0 ? */
 +#if 0
 +		/*
 +		 * This situation can happen in perfectly normal operation
 +		 * if the ESP is reselected while using DMA to select
 +		 * another target.  As such, don't print the warning.
 +		 */
 +		device_printf(dev, "xfer (%d) > req (%d)\n", trans, dmasize);
 +#endif
 +		trans = dmasize;
 +	}
 +
 +	NCR_DMA(("%s: tcl=%d, tcm=%d, tch=%d; trans=%d, resid=%d\n", __func__,
 +	    READ_ESPREG(esc, NCR_TCL), READ_ESPREG(esc, NCR_TCM),
 +	    READ_ESPREG(esc, NCR_TCH), trans, resid));
 +
 +	*esc->sc_dmalen -= trans;
 +	*esc->sc_dmaaddr = (char *)*esc->sc_dmaaddr + trans;
 +
 +	return (0);
 +}
 +
 +static int
 +esp_pci_dma_setup(struct ncr53c9x_softc *sc, void **addr, size_t *len,
 +    int datain, size_t *dmasize)
 +{
 +	struct esp_pci_softc *esc = (struct esp_pci_softc *)sc;
 +	int error;
 +
 +	WRITE_DMAREG(esc, DMA_CMD, DMACMD_IDLE | (datain != 0 ? DMACMD_DIR :
 +	    0));
 +
 +	*dmasize = esc->sc_dmasize = ulmin(*dmasize, MDL_SEG_SIZE);
 +	esc->sc_dmaaddr = addr;
 +	esc->sc_dmalen = len;
 +	esc->sc_datain = datain;
 +
 +	/*
 +	 * There's no need to set up DMA for a "Transfer Pad" operation.
 +	 */
 +	if (*dmasize == 0)
 +		return (0);
 +
 +	/* Set the transfer length. */
 +	WRITE_DMAREG(esc, DMA_STC, *dmasize);
 +
 +	/*
 +	 * Load the transfer buffer and program the DMA address.
 +	 * Note that the NCR53C9x core can't handle EINPROGRESS so we set
 +	 * BUS_DMA_NOWAIT.
 +	 */
 +	error = bus_dmamap_load(esc->sc_xferdmat, esc->sc_xferdmam,
 +	    *esc->sc_dmaaddr, *dmasize, esp_pci_xfermap, sc, BUS_DMA_NOWAIT);
 +
 +	return (error);
 +}
 +
 +static void
 +esp_pci_dma_go(struct ncr53c9x_softc *sc)
 +{
 +	struct esp_pci_softc *esc = (struct esp_pci_softc *)sc;
 +	int datain;
 +
 +	datain = esc->sc_datain;
 +
 +	/* No DMA transfer for a "Transfer Pad" operation */
 +	if (esc->sc_dmasize == 0)
 +		return;
 +
 +	/* Sync the transfer buffer. */
 +	bus_dmamap_sync(esc->sc_xferdmat, esc->sc_xferdmam, datain != 0 ?
 +	    BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE);
 +
 +	/* Set the DMA engine to the IDLE state. */
 +	/* XXX DMA Transfer Interrupt Enable bit is broken? */
 +	WRITE_DMAREG(esc, DMA_CMD, DMACMD_IDLE | /* DMACMD_INTE | */
 +	    (datain != 0 ? DMACMD_DIR : 0));
 +
 +	/* Issue a DMA start command. */
 +	WRITE_DMAREG(esc, DMA_CMD, DMACMD_START | /* DMACMD_INTE | */
 +	    (datain != 0 ? DMACMD_DIR : 0));
 +
 +	esc->sc_active = 1;
 +}
 +
 +static void
 +esp_pci_dma_stop(struct ncr53c9x_softc *sc)
 +{
 +	struct esp_pci_softc *esc = (struct esp_pci_softc *)sc;
 +
 +	/* DMA stop */
 +	/* XXX what should we do here ? */
 +	WRITE_DMAREG(esc, DMA_CMD,
 +	    DMACMD_ABORT | (esc->sc_datain != 0 ? DMACMD_DIR : 0));
 +	bus_dmamap_unload(esc->sc_xferdmat, esc->sc_xferdmam);
 +
 +	esc->sc_active = 0;
 +}
 +
 +static int
 +esp_pci_dma_isactive(struct ncr53c9x_softc *sc)
 +{
 +	struct esp_pci_softc *esc = (struct esp_pci_softc *)sc;
 +
 +	/* XXX should we check esc->sc_active? */
 +	if ((READ_DMAREG(esc, DMA_CMD) & DMACMD_CMD) != DMACMD_IDLE)
 +		return (1);
 +
 +	return (0);
 +}
 
 Modified: head/sys/i386/conf/GENERIC
 ==============================================================================
 --- head/sys/i386/conf/GENERIC	Tue Nov  1 21:21:36 2011	(r227005)
 +++ head/sys/i386/conf/GENERIC	Tue Nov  1 21:26:57 2011	(r227006)
 @@ -110,7 +110,7 @@ options 	AHC_REG_PRETTY_PRINT	# Print re
  device		ahd		# AHA39320/29320 and onboard AIC79xx devices
  options 	AHD_REG_PRETTY_PRINT	# Print register bitfields in debug
  					# output.  Adds ~215k to driver.
 -device		amd		# AMD 53C974 (Tekram DC-390(T))
 +device		esp		# AMD Am53C974 (Tekram DC-390(T))
  device		hptiop		# Highpoint RocketRaid 3xxx series
  device		isp		# Qlogic family
  #device		ispfw		# Firmware for QLogic HBAs- normally a module
 
 Modified: head/sys/modules/Makefile
 ==============================================================================
 --- head/sys/modules/Makefile	Tue Nov  1 21:21:36 2011	(r227005)
 +++ head/sys/modules/Makefile	Tue Nov  1 21:26:57 2011	(r227006)
 @@ -89,6 +89,7 @@ SUBDIR=	${_3dfx} \
  	en \
  	${_ep} \
  	${_epic} \
 +	esp \
  	${_et} \
  	${_ex} \
  	${_exca} \
 
 Modified: head/sys/modules/esp/Makefile
 ==============================================================================
 --- head/sys/modules/esp/Makefile	Tue Nov  1 21:21:36 2011	(r227005)
 +++ head/sys/modules/esp/Makefile	Tue Nov  1 21:26:57 2011	(r227006)
 @@ -3,7 +3,8 @@
  .PATH: ${.CURDIR}/../../dev/esp
  
  KMOD=	esp
 -SRCS=	device_if.h ${esp_sbus} bus_if.h ncr53c9x.c ${ofw_bus_if} opt_cam.h
 +SRCS=	device_if.h esp_pci.c ${esp_sbus} bus_if.h ncr53c9x.c ${ofw_bus_if}
 +SRCS+=	opt_cam.h pci_if.h
  
  .if ${MACHINE} == "sparc64"
  ofw_bus_if=	ofw_bus_if.h
 
 Modified: head/sys/pc98/conf/GENERIC
 ==============================================================================
 --- head/sys/pc98/conf/GENERIC	Tue Nov  1 21:21:36 2011	(r227005)
 +++ head/sys/pc98/conf/GENERIC	Tue Nov  1 21:26:57 2011	(r227006)
 @@ -101,7 +101,7 @@ device		siis		# SiliconImage SiI3124/SiI
  # SCSI Controllers
  device		adv		# Advansys SCSI adapters
  device		ahc		# AHA2940 and onboard AIC7xxx devices
 -device		amd		# AMD 53C974 (Tekram DC-390(T))
 +device		esp		# AMD Am53C974 (Tekram DC-390(T))
  device		isp		# Qlogic family
  #device		ncr		# NCR/Symbios Logic
  device		sym		# NCR/Symbios Logic (newer chipsets + those of `ncr')
 
 Modified: head/sys/sparc64/conf/GENERIC
 ==============================================================================
 --- head/sys/sparc64/conf/GENERIC	Tue Nov  1 21:21:36 2011	(r227005)
 +++ head/sys/sparc64/conf/GENERIC	Tue Nov  1 21:26:57 2011	(r227006)
 @@ -103,11 +103,11 @@ device		ahc		# AHA2940 and onboard AIC7x
  options 	AHC_ALLOW_MEMIO	# Attempt to use memory mapped I/O
  options 	AHC_REG_PRETTY_PRINT	# Print register bitfields in debug
  					# output.  Adds ~128k to driver.
 +device		esp		# AMD Am53C974, Sun ESP and FAS	families
  device		isp		# Qlogic family
  device		ispfw		# Firmware module for Qlogic host adapters
  device		mpt		# LSI-Logic MPT-Fusion
  device		sym		# NCR/Symbios/LSI Logic 53C8XX/53C1010/53C1510D
 -device		esp		# NCR53c9x (FEPS/FAS366)
  
  # ATA/SCSI peripherals
  device		scbus		# SCSI bus (required for ATA/SCSI)
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From owner-freebsd-scsi@FreeBSD.ORG  Wed Nov  2 08:56:30 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 05000106564A
	for <freebsd-scsi@freebsd.org>; Wed,  2 Nov 2011 08:56:30 +0000 (UTC)
	(envelope-from peter.maloney@brockmann-consult.de)
Received: from moutng.kundenserver.de (moutng.kundenserver.de
	[212.227.126.171])
	by mx1.freebsd.org (Postfix) with ESMTP id A41808FC0C
	for <freebsd-scsi@freebsd.org>; Wed,  2 Nov 2011 08:56:29 +0000 (UTC)
Received: from [10.3.0.26] ([141.4.215.32])
	by mrelayeu.kundenserver.de (node=mrbap0) with ESMTP (Nemesis)
	id 0MWhTP-1RSWZ91Dsx-00XIsw; Wed, 02 Nov 2011 09:43:52 +0100
Message-ID: <4EB102C7.8080401@brockmann-consult.de>
Date: Wed, 02 Nov 2011 09:43:51 +0100
From: Peter Maloney <peter.maloney@brockmann-consult.de>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11
MIME-Version: 1.0
To: Jason Wolfe <nitroboost@gmail.com>
References: <CAAAm0r2-pXLEZVoG7g_dkym6MzLJXggjOQh3a8t5QO90vPJvfw@mail.gmail.com>	<4EAEF431.7090108@brockmann-consult.de>
	<CAAAm0r1T1ifTQt5A5O+jwUoKoGjzcbho606wCt4SpM3AQ-WM3Q@mail.gmail.com>
In-Reply-To: <CAAAm0r1T1ifTQt5A5O+jwUoKoGjzcbho606wCt4SpM3AQ-WM3Q@mail.gmail.com>
X-Enigmail-Version: 1.1.2
X-Provags-ID: V02:K0:l9N7rDkQkC+AsK40qVaA1cTE/ku/nKfJ0okSl1Qynrs
	Ka2sNOCjWC1hyonoMbaQpXymtJ2LtwiwMBSuVq7vs921YGoT26
	Z8ys2XphzaR+0Liq/4uHWdt16gvXMCYlUm/6fHjoMrl7he8Cbk
	vgTIF77H3yUDH/0PRDhgxmIaUTbxWkdwCu8uyVIpST82509kWG
	0oiLWhcvNao78rhX3f+dynv4tmFKOAQJw5p1zrnnIIc0aSGFll
	5Rmh6LdrRTJv4xlwReOI2fFU4vXY3tznUq4L5uj+jVcarzejFp
	jJDa0fpZGNefckAoAs2ny1Lb7ST9xpafxr1Mc4q0f0WaTKU1Ik
	4wHAQsIyyhaGb9fbuVmbMqgH+3DhIcGlGuP1xQfsL
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-scsi@freebsd.org
Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with
 upped disk tags
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Nov 2011 08:56:30 -0000

On 11/01/2011 09:32 PM, Jason Wolfe wrote:
> On Mon, Oct 31, 2011 at 12:17 PM, Peter Maloney
> <peter.maloney@brockmann-consult.de
> <mailto:peter.maloney@brockmann-consult.de>> wrote:
>
>     Dear Jason,
>
>     I get a simlar problem on a system with an LSI 9211-8i with 20 SATA
>     disks attached (2 SSDs and 18 spnning disks). My system doesn't hang,
>     panic, or reset though. I just lose access to one disk, which is then
>     considered FAULTED in my zpool status (with the ZFS file system). If I
>     physically remove the FAULTED disk and run "gpart recover da0", I
>     get a
>     panic. Otherwise, the system keeps running in a degraded state.
>      When I
>     reboot and resilver, some data is found damaged and repaired, not just
>     refreshed with the latest state. The server has 1 HBA and 2
>     backplanes,
>     and I have the 2 mirrored root disks on different backplanes.
>     Maybe that
>     is why mine runs degraded and yours hang.
>
>     This happened twice so far (in around a month or two), and both
>     times it
>     was one of the mirrored root disks (SSDs) that faulted.
>
>     My tags are set to 255. I will try reproducing it as you said, and
>     then
>     if it fails, rebooting and trying again setting tags to 2 as you
>     suggested.
>
>     And *thank you very much for this information*. This is the last
>     outstanding issue with this server. I hope this workaround helps.
>
>     # camcontrol tags /dev/da0
>     (pass0:mps0:0:7:0): device openings: 255
>
>
> Peter,
>
> This happens 'randomly' for you, or do you have some automated process
> running smartctl that trips the drives up occasionally?
It appears to be completely random, but it could be something specific
going on that I just didn't think of. I don't know how to trigger it. I
wrote a script once that looped over the disks once with smartctl (which
I installed from ports) and recorded the device id, size of the disks,
etc.. But it didn't cause a crash, and I didn't try looping it
constantly to crash it.

The system uses "zfs send" to send the whole pool to another machine. It
uses rsync to back up some servers on to it. It serves a bunch of data
over NFS and has samba online also but not in use. The primary user of
the NFS shares is VMWare ESXi, which has a terrible problem with
synchronous writes, which might put a heavier load on the system.
> The way I'm getting around it currently is to just move
> /usr/local/sbin/smartctl elsewhere, and replacing it with a wrapper
> that simply drops the tags to 1, executes to the new smartctl location
> with the options passed, then moves the tags back to whatever you
> prefer. There will obviously be a small detriment here, but it should
> be fairly quick and hopefully not even noticeable in your case.
In my reading, I found that people think that reducing the io queues
(via kernel parameters) for zfs actually improves performance (moving
the queue to the OS I guess), so if the tags is similar, then I wasn't
thinking there would be too much of a drop. And also luckily, this
system of mine is not a performance machine... just a huge file server.
So if it is slower but more stable that way, I will leave tags set to 2
forever.
>
> If smartctl is not triggering these events for you, any idea what is?
I have no real clue, but my guess is that some NFS shares are using the
ZIL (zfs log device) a lot, and since that device is horribly
inefficient (scoring like 1500 iops during ZIL use on a disk that scores
50-140k on other tests), it causes the IO system to be overloaded, and
trigger the failure, purely based on load rather than something
particular like smartctl. So for now, I disabled my ZIL to see if it
still crashes.

Also on my list of things to try is:
-change to the IT firmware instead of IR, since ZFS prefers to have no
RAID in there at all.
-change the tags to 2
-try the LSI driver for the 9210-8i
http://www.lsi.com/products/storagec...AS9210-8i.aspx
<http://www.lsi.com/products/storagecomponents/Pages/LSISAS9210-8i.aspx>

Here is my forum thread about it:

http://forums.freebsd.org/showthread.php?t=26656

Are you using ZFS? Is your root volume in hardware RAID or software
RAID? I am curious because you say your systems hang, and mine just runs
degraded.
>
> Jason


Peter

-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------


From owner-freebsd-scsi@FreeBSD.ORG  Wed Nov  2 16:47:57 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 97504106564A
	for <freebsd-scsi@freebsd.org>; Wed,  2 Nov 2011 16:47:57 +0000 (UTC)
	(envelope-from nitroboost@gmail.com)
Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com
	[209.85.220.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 4DA3B8FC1C
	for <freebsd-scsi@freebsd.org>; Wed,  2 Nov 2011 16:47:56 +0000 (UTC)
Received: by vcbfk26 with SMTP id fk26so516273vcb.13
	for <freebsd-scsi@freebsd.org>; Wed, 02 Nov 2011 09:47:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=0+4Fu6u6ezyfLm6dY/R8fLWiwLgF7i3LKqnWpE5Tzt4=;
	b=ouR0+9TPNC18HcKMs6XtJlPaRJTiLtWtTaTn2/RimveYNIlhAFa599445UbrvvYb/S
	1pcvXOL2sn4qK67g9eAC6ETNXwFmwQ0ddYsE+scpHiWKYwO9SvowqEkfJPCDPxyFd/Sz
	jGeuyIVyugWv4+Mi2cT7TvtNDQJbYd4T8Qs58=
MIME-Version: 1.0
Received: by 10.182.59.5 with SMTP id v5mr1032159obq.78.1320252476286; Wed, 02
	Nov 2011 09:47:56 -0700 (PDT)
Received: by 10.182.35.193 with HTTP; Wed, 2 Nov 2011 09:47:56 -0700 (PDT)
In-Reply-To: <4EB102C7.8080401@brockmann-consult.de>
References: <CAAAm0r2-pXLEZVoG7g_dkym6MzLJXggjOQh3a8t5QO90vPJvfw@mail.gmail.com>
	<4EAEF431.7090108@brockmann-consult.de>
	<CAAAm0r1T1ifTQt5A5O+jwUoKoGjzcbho606wCt4SpM3AQ-WM3Q@mail.gmail.com>
	<4EB102C7.8080401@brockmann-consult.de>
Date: Wed, 2 Nov 2011 09:47:56 -0700
Message-ID: <CAAAm0r1sL7o+eunNv0Yk8o06LXNXjr74uGea1uSJihNOotfD3A@mail.gmail.com>
From: Jason Wolfe <nitroboost@gmail.com>
To: Peter Maloney <peter.maloney@brockmann-consult.de>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-scsi@freebsd.org
Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with
 upped disk tags
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Nov 2011 16:47:57 -0000

On Wed, Nov 2, 2011 at 1:43 AM, Peter Maloney <
peter.maloney@brockmann-consult.de> wrote:

>
> Are you using ZFS? Is your root volume in hardware RAID or software RAID?
> I am curious because you say your systems hang, and mine just runs degraded.
>

Peter,

I'm running UFS and no RAID, so yes that likely explains why my systems
hangs as it loses its boot disk.  The controller itself resets on some less
common occasions, so if you see that ever, I'll bet your system would hang
too as it looses all root devices.

I have the official 8.2-RELEASE driver from LSI I'll be testing today to
see if I can reproduce the hangs.

Jason

From owner-freebsd-scsi@FreeBSD.ORG  Wed Nov  2 18:05:47 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 052B910657BF
	for <freebsd-scsi@freebsd.org>; Wed,  2 Nov 2011 18:05:47 +0000 (UTC)
	(envelope-from nitroboost@gmail.com)
Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com
	[209.85.215.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 81EE18FC3C
	for <freebsd-scsi@freebsd.org>; Wed,  2 Nov 2011 18:05:46 +0000 (UTC)
Received: by eyd10 with SMTP id 10so564610eyd.13
	for <freebsd-scsi@freebsd.org>; Wed, 02 Nov 2011 11:05:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type; bh=YAh7VKceP+hZP6peyXqr0OXPQM7Jg3Sw6qQlfJzITNo=;
	b=efjfBifr+2k0APqYap7yT3wkbhCXzMCKZGfrGSf8a2jWt7RFP5mlXG3PMWGtLCSgqh
	4igXugaPKf0Fw7/4VoJAeTkDHgxOshjPjunAGpQ1e2d1ph7x6lgS3BtLAQbd0TwCSC5U
	d4su8Ck+SRCXwCmLfP0wImLWco4CugsdDj+UU=
MIME-Version: 1.0
Received: by 10.182.17.103 with SMTP id n7mr1100067obd.68.1320257145101; Wed,
	02 Nov 2011 11:05:45 -0700 (PDT)
Received: by 10.182.35.193 with HTTP; Wed, 2 Nov 2011 11:05:44 -0700 (PDT)
In-Reply-To: <CAAAm0r2-pXLEZVoG7g_dkym6MzLJXggjOQh3a8t5QO90vPJvfw@mail.gmail.com>
References: <CAAAm0r2-pXLEZVoG7g_dkym6MzLJXggjOQh3a8t5QO90vPJvfw@mail.gmail.com>
Date: Wed, 2 Nov 2011 11:05:44 -0700
Message-ID: <CAAAm0r2TDHEcdN43MATU-ERzoDr=2Hy029YUTjuxh+9CBni1vw@mail.gmail.com>
From: Jason Wolfe <nitroboost@gmail.com>
To: freebsd-scsi@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with
 upped disk tags
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Nov 2011 18:05:47 -0000

On Tue, Nov 1, 2011 at 11:13 AM, Jason Wolfe <nitroboost@gmail.com> wrote:

> Luckily remote syslogging is enabled, so while nothing is kept locally, we
> see these messages similar to these transmitted before the server hangs,
> requiring a power cycle:
>


> (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
> 510
>
(da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
> 713
> (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
> 942
> (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
> 356
> (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
> 492
> (da0:mps0:0:1:0): SCSI command timeout on device handle 0x000a SMID
> 976
> (da11:mps0:0:12:0): SCSI command timeout on device handle 0x0015 SMID
> 339
> (da11:mps0:0:12:0): SCSI command timeout on device handle 0x0015 SMID
> 746
> (da5:mps0:0:6:0): SCSI command timeout on device handle 0x000f SMID 74
> (da6:mps0:0:7:0): SCSI command timeout on device handle 0x0010 SMID
> 613
> (da2:mps0:0:3:0): SCSI command timeout on device handle 0x000c SMID 16
> (da10:mps0:0:11:0): SCSI command timeout on device handle 0x0014 SMID
> 305
> (da1:mps0:0:2:0): SCSI command timeout on device handle 0x000b SMID 74
> (da6:mps0:0:7:0): SCSI command timeout on device handle 0x0010 SMID
> 594
>
> In some cases that would be followed by this, which would usually be the
> last transmission, though we don't see this in all cases.  It may just be
> the system isn't always alive long enough to transmit:
>
> kernel: mps0: IOC Fault 0x40006003, Resetting
>
>
Hello,

Testing with the LSI supplied driver, it appears they have a code path for
this condition that causes our driver to crash.  Here are 2 sets of
messages:

mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff80003fb000 cm
0xffffff800040bdf8
(da0:mpslsi0:0:8:0): WRITE(10). CDB: 2a 0 55 bf 5a 3f 0 1 0 0 length 131072
SMID 97 command timeout cm 0xffffff800040bdf8 ccb 0xffffff00
mpslsi0: mpssas_alloc_tm freezing simq
mpslsi0: timedout cm 0xffffff800040bdf8 allocated tm 0xffffff8000409070
(da0:mpslsi0:0:8:0): READ(10). CDB: 28 0 55 96 48 7f 0 0 80 0 length 65536
SMID 171 completed cm 0xffffff80004105a8 ccb 0xffffff03c3443y
(da0:mpslsi0:0:8:0): READ(10). CDB: 28 0 54 f8 a4 3f 0 0 80 0 length 65536
SMID 762 completed cm 0xffffff8000434230 ccb 0xffffff001317ay
(da0:mpslsi0:0:8:0): WRITE(10). CDB: 2a 0 55 bf 5a 3f 0 1 0 0 length 131072
SMID 97 completed timedout cm 0xffffff800040bdf8 ccb 0xffff1
(noperiph:mpslsi0:0:8:0): SMID 50 finished recovery after aborting TaskMID
97
mpslsi0: mpssas_free_tm releasing simq


mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff80003fb000 cm
0xffffff8000441e18
(da7:mpslsi0:0:15:0): WRITE(10). CDB: 2a 0 33 76 29 ef 0 1 0 0 length
131072 SMID 989 command timeout cm 0xffffff8000441e18 ccb 0xfffff0
mpslsi0: mpssas_alloc_tm freezing simq
mpslsi0: timedout cm 0xffffff8000441e18 allocated tm 0xffffff80004063e0
(da7:mpslsi0:0:15:0): READ(10). CDB: 28 0 71 14 a1 4f 0 1 0 0 length 131072
SMID 857 completed cm 0xffffff8000439e38 ccb 0xffffff001316y
(da7:mpslsi0:0:15:0): READ(10). CDB: 28 0 71 e4 98 57 0 0 80 0 length 65536
SMID 300 completed cm 0xffffff80004182a0 ccb 0xffffff0392f0y
(da7:mpslsi0:0:15:0): WRITE(10). CDB: 2a 0 33 76 29 ef 0 1 0 0 length
131072 SMID 989 completed timedout cm 0xffffff8000441e18 ccb 0xff1
(noperiph:mpslsi0:0:15:0): SMID 4 finished recovery after aborting TaskMID
989
mpslsi0: mpssas_free_tm releasing simq

The server ran for 10 minutes with these happening every 10-30 seconds,
with our community driver the first instance of commands timing out during
this smartctl storm would cause the server to hang and sometimes the
controller to reset.  Hopefully this is helpful to someone.

Jason

From owner-freebsd-scsi@FreeBSD.ORG  Thu Nov  3 10:31:42 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AF028106564A
	for <freebsd-scsi@freebsd.org>; Thu,  3 Nov 2011 10:31:42 +0000 (UTC)
	(envelope-from peter.maloney@brockmann-consult.de)
Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.17.9])
	by mx1.freebsd.org (Postfix) with ESMTP id 5CFD08FC16
	for <freebsd-scsi@freebsd.org>; Thu,  3 Nov 2011 10:31:42 +0000 (UTC)
Received: from [10.3.0.26] ([141.4.215.32])
	by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis)
	id 0MaE2a-1RfdK904c4-00K1wW; Thu, 03 Nov 2011 11:31:41 +0100
Message-ID: <4EB26D8B.1090804@brockmann-consult.de>
Date: Thu, 03 Nov 2011 11:31:39 +0100
From: Peter Maloney <peter.maloney@brockmann-consult.de>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11
MIME-Version: 1.0
To: freebsd-scsi@freebsd.org
References: <CAAAm0r2-pXLEZVoG7g_dkym6MzLJXggjOQh3a8t5QO90vPJvfw@mail.gmail.com>
	<CAAAm0r2TDHEcdN43MATU-ERzoDr=2Hy029YUTjuxh+9CBni1vw@mail.gmail.com>
In-Reply-To: <CAAAm0r2TDHEcdN43MATU-ERzoDr=2Hy029YUTjuxh+9CBni1vw@mail.gmail.com>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Provags-ID: V02:K0:j9R91YIvAcdO5yG4kFGHNpADIxxh0zjDlwLoz5RyZDt
	dxykdEygGq3v0xYZfDRgMvFPg51uo7sbjDaWP1U6DmDqaaOMPD
	qJnUooAJ+1l/k5H7bV+fWx0osCv0fRfGLnCePnbCpGjQjjjqQu
	thnnZAnqpviM0YNgXrAk40Dg8lfIhG+xQdbOoKpzWy1VR2w/o3
	WEoa66QCW2XdA8BpZ8YyuOMbf21UJWHBJ5CESbaKCy/kc+bT3w
	UAScDXs+6F77BQhC0mBsHKDaFGRlmpbxxGDlckgYFULcdwbGH7
	8XUY7PqF+KP/vrPjMOHbe+FzPWY8d4O1cwGNGbyg3yV6L3GH5H
	v3nltVs+zF8EooMysUOjJdA66Rr9GtFCG7S+AAVwd
Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with
 upped disk tags
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Nov 2011 10:31:42 -0000

Dear Jason,

On 11/02/2011 07:05 PM, Jason Wolfe wrote:
> Hello,
> Testing with the LSI supplied driver, it appears they have a code path for
> this condition that causes our driver to crash.  Here are 2 sets of
> messages:
>
> mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff80003fb000 cm
> 0xffffff800040bdf8
> (da0:mpslsi0:0:8:0): WRITE(10). CDB: 2a 0 55 bf 5a 3f 0 1 0 0 length 131072
> SMID 97 command timeout cm 0xffffff800040bdf8 ccb 0xffffff00
> mpslsi0: mpssas_alloc_tm freezing simq
> mpslsi0: timedout cm 0xffffff800040bdf8 allocated tm 0xffffff8000409070
> (da0:mpslsi0:0:8:0): READ(10). CDB: 28 0 55 96 48 7f 0 0 80 0 length 65536
> SMID 171 completed cm 0xffffff80004105a8 ccb 0xffffff03c3443y
> (da0:mpslsi0:0:8:0): READ(10). CDB: 28 0 54 f8 a4 3f 0 0 80 0 length 65536
> SMID 762 completed cm 0xffffff8000434230 ccb 0xffffff001317ay
> (da0:mpslsi0:0:8:0): WRITE(10). CDB: 2a 0 55 bf 5a 3f 0 1 0 0 length 131072
> SMID 97 completed timedout cm 0xffffff800040bdf8 ccb 0xffff1
> (noperiph:mpslsi0:0:8:0): SMID 50 finished recovery after aborting TaskMID
> 97
> mpslsi0: mpssas_free_tm releasing simq
>
>
> mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff80003fb000 cm
> 0xffffff8000441e18
> (da7:mpslsi0:0:15:0): WRITE(10). CDB: 2a 0 33 76 29 ef 0 1 0 0 length
> 131072 SMID 989 command timeout cm 0xffffff8000441e18 ccb 0xfffff0
> mpslsi0: mpssas_alloc_tm freezing simq
> mpslsi0: timedout cm 0xffffff8000441e18 allocated tm 0xffffff80004063e0
> (da7:mpslsi0:0:15:0): READ(10). CDB: 28 0 71 14 a1 4f 0 1 0 0 length 131072
> SMID 857 completed cm 0xffffff8000439e38 ccb 0xffffff001316y
> (da7:mpslsi0:0:15:0): READ(10). CDB: 28 0 71 e4 98 57 0 0 80 0 length 65536
> SMID 300 completed cm 0xffffff80004182a0 ccb 0xffffff0392f0y
> (da7:mpslsi0:0:15:0): WRITE(10). CDB: 2a 0 33 76 29 ef 0 1 0 0 length
> 131072 SMID 989 completed timedout cm 0xffffff8000441e18 ccb 0xff1
> (noperiph:mpslsi0:0:15:0): SMID 4 finished recovery after aborting TaskMID
> 989
> mpslsi0: mpssas_free_tm releasing simq
>
> The server ran for 10 minutes with these happening every 10-30 seconds,
> with our community driver the first instance of commands timing out during
> this smartctl storm would cause the server to hang and sometimes the
> controller to reset.  Hopefully this is helpful to someone.
>

Does this mean it didn't hang? or it ran your smartctl -a test for 10
minutes before a hang?

I am also trying the mpslsi driver now, but I couldn't reproduce the
problem using "smartctl -a" (also tried -A, -h and -i) with the mps
driver. Tags was set to 255 on all disks. I only tried it on the backup
server, which didn't crash randomly on its own either. So I will just
have to assume it works if it doesn't do the same thing in a month or two.

However, with the mpslsi driver, during a scrub on the backup server
(probably during smartctl -a), I got these messages (including what
looks like a controller reset), and no disks were lost, with no read
errors reported in zpool status. But I can't get it to happen a second
time. So I hope that means our problems are over.

Nov  3 09:17:10 bcnas1bak kernel: mpslsi0: mpssas_scsiio_timeout
checking sc 0xffffff800f629000 cm 0xffffff800f65f698
Nov  3 09:17:10 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0
SMID 717 command timeout cm 0xffffff800f65f698 ccb
 0xffffff0026bbb800
Nov  3 09:17:10 bcnas1bak kernel: mpslsi0: mpssas_alloc_tm freezing simq
Nov  3 09:17:10 bcnas1bak kernel: mpslsi0: timedout cm
0xffffff800f65f698 allocated tm 0xffffff800f6340f8
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 2c f3 be e2 0 0 2a 0 length 21504 SMID 261 completed cm
0xffffff800f643cd8 ccb 0xffffff0026bd1000 during recovery
ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 2c f3 be e2 0 0 2a 0 length 21504 SMID 261 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 52 1e 2 e3 0 0 2b 0 length 22016 SMID 534 completed cm
0xffffff800f654550 ccb 0xffffff0026b96000 during recovery i
oc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 52 1e 2 e3 0 0 2b 0 length 22016 SMID 534 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 3a 5 14 a3 0 0 2b 0 length 22016 SMID 798 completed cm
0xffffff800f664510 ccb 0xffffff003d438000 during recovery i
oc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 3a 5 14 a3 0 0 2b 0 length 22016 SMID 798 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 39 81 86 6f 0 0 2b 0 length 22016 SMID 590 completed cm
0xffffff800f657b90 ccb 0xffffff00314ce800 during recovery
ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 39 81 86 6f 0 0 2b 0 length 22016 SMID 590 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 39 47 e8 2c 0 0 2a 0 length 21504 SMID 634 completed cm
0xffffff800f65a630 ccb 0xffffff0026ba1800 during recovery
ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 39 47 e8 2c 0 0 2a 0 length 21504 SMID 634 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 2d 8b 96 af 0 0 2b 0 length 22016 SMID 707 completed cm
0xffffff800f65ece8 ccb 0xffffff0026bb1800 during recovery
ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 2d 8b 96 af 0 0 2b 0 length 22016 SMID 707 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0
SMID 717 completed timedout cm 0xffffff800f65f698 ccb 0xffffff0026bbb800
during recov(da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 1c dc 68 73 0 0 2b
0 length 22016 SMID 690 completed cm 0xffffff800f65dc70 ccb
0xffffff0026bea800 during recovery ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 1c dc 68 73 0 0 2b 0 length 22016 SMID 690 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 58 d da 33 0 0 2b 0 length 22016 SMID 947 completed cm
0xffffff800f66d568 ccb 0xffffff0026bf9000 during recovery ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 58 d da 33 0 0 2b 0 length 22016 SMID 947 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 4b 30 d1 80 0 0 2a 0 length 21504 SMID 683 completed cm
0xffffff800f65d5a8 ccb 0xffffff003d47f800 during recovery ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 4b 30 d1 80 0 0 2a 0 length 21504 SMID 683 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 4a d 10 d0 0 0 2b 0 length 22016 SMID 219 completed cm
0xffffff800f641428 ccb 0xffffff0031536000 during recovery ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 4a d 10 d0 0 0 2b 0 length 22016 SMID 219 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 41 1e 9a 58 0 0 2a 0 length 21504 SMID 169 completed cm
0xffffff800f63e3b8 ccb 0xffffff00314ec800 during recovery ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 41 1e 9a 58 0 0 2a 0 length 21504 SMID 169 terminated ioc 804b scsi
0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512
SMID 139 completed cm 0xffffff800f63c6a8 ccb 0xffffff0026a89000 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 139 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0
SMID 876 completed cm 0xffffff800f6690a0 ccb 0xffffff00314c8800 during
recovery ioc 8(pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 876 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512
SMID 661 completed cm 0xffffff800f65c058 ccb 0xffffff0026b7d000 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 661 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512
SMID 471 completed cm 0xffffff800f650848 ccb 0xffffff0026be7800 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 471 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512
SMID 215 completed cm 0xffffff800f641048 ccb 0xffffff0026bef800 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 215 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512
SMID 203 completed cm 0xffffff800f6404a8 ccb 0xffffff0026bb6000 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 203 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512
SMID 546 completed cm 0xffffff800f6550f0 ccb 0xffffff003d447800 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 546 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND
PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512
SMID 513 completed cm 0xffffff800f6530f8 ccb 0xffffff0026bcb800 during
recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB:
85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 513 terminated ioc
804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (noperiph:mpslsi0:0:10:0): SMID 1
abort TaskMID 717 status 0x0 code 0x0 count 20
Nov  3 09:17:11 bcnas1bak kernel: (noperiph:mpslsi0:0:10:0): SMID 1
finished recovery after aborting TaskMID 717
Nov  3 09:17:11 bcnas1bak kernel: mpslsi0: mpssas_free_tm releasing simq
Nov  3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB:
28 0 41 1e 9a 58 0 0 2a 0
Nov  3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): CAM status: SCSI
Status Error
Nov  3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): SCSI status:
Check Condition
Nov  3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): SCSI sense: UNIT
ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)


Peter

> Jason
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"


-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------


From owner-freebsd-scsi@FreeBSD.ORG  Thu Nov  3 11:22:05 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BB8C1106564A;
	Thu,  3 Nov 2011 11:22:05 +0000 (UTC)
	(envelope-from Karli.Sjoberg@slu.se)
Received: from Edge1-3.slu.se (edge1-3.slu.se [193.10.100.98])
	by mx1.freebsd.org (Postfix) with ESMTP id AE9F18FC18;
	Thu,  3 Nov 2011 11:22:04 +0000 (UTC)
Received: from Exchange2.ad.slu.se (193.10.100.95) by Edge1-3.slu.se
	(193.10.100.98) with Microsoft SMTP Server (TLS) id 8.3.213.0;
	Thu, 3 Nov 2011 12:22:01 +0100
Received: from exmbx3.ad.slu.se ([193.10.100.93]) by Exchange2.ad.slu.se
	([193.10.100.95]) with mapi; Thu, 3 Nov 2011 12:22:01 +0100
From: =?iso-8859-1?Q?Karli_Sj=F6berg?= <Karli.Sjoberg@slu.se>
To: "Kenneth D. Merry" <ken@freebsd.org>
Date: Thu, 3 Nov 2011 12:21:59 +0100
Thread-Topic: AOC-USAS2-L8i zfs panics and SCSI errors in messages
Thread-Index: AcyaGs4yiVLBqLRGSh6I0qMEfXlyoQ==
Message-ID: <666756B5-218E-48D6-99A7-56C7FB0D2E33@slu.se>
References: <82B38DBF-DD3A-46CD-93F6-02CDB6506E05@slu.se>
	<20111025193302.GA30409@nargothrond.kdm.org>
In-Reply-To: <20111025193302.GA30409@nargothrond.kdm.org>
Accept-Language: sv-SE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
acceptlanguage: sv-SE, en-US
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: "freebsd-scsi@freebsd.org" <freebsd-scsi@freebsd.org>,
	"fs@freebsd.org" <fs@freebsd.org>
Subject: Re: AOC-USAS2-L8i zfs panics and SCSI errors in messages
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Nov 2011 11:22:05 -0000

Hi,

I=B4m not alone!

By complete chance I was reading another thread on the forum and it turns o=
ut that peetaur also has the exact same problem as me with timeouts and som=
etimes losing disks. His hardware is very different from mine, except that =
we both have LSI controllers and are running 8.2-STABLE. He has tried both =
the mps-driver in FreeBSD and the mps-driver that LSI provides (phase 11), =
and still gets these timeouts.

peetaur=B4s system:
4HE Chassis from Supermicro 847E16-R1400LPB
with 2 1400 Watt red. power and 36x HotSwap for SAS or SATA

Motherboard from Supermicro
- Intel=AE 5520 (Tylersburg) Chipset
- 12 DIMM memory slots (max. 192GB DDR3)
- 2x 100/1000Base TX Gigabit Ethernet Port (Dual Intel=AE 82576 Gigabit Eth=
ernet)
- 6x SATA (3 Gbps) Ports via ICH10R Controller
- PCI Slots: 7x (x8) PCI-E 2.0 (in x16 slots)
- Integrated IPMI 2.0 with Dedicated LAN
- Integrated Matrox G200eW Graphics
CPU
- 2x E5620 Intel Xeon (Westmere) Quad Core CPU, (80W) 2,40 GHz, 12 MB L3 Ca=
che
RAM
- 48 GB (6x 8GB) DDR3 1333 DIMM, REG, ECC
SAS HBA
- 9211-8i
Network
- 10G Card with Dual-port Intel=AE 82598EB (CX4)
Disks
- 9x HDD 3TB SATA from Hitachi, 7.2k UPM, 64 MB Cache
- 9x HDD 3TB SATA from Seagate, 7.2k UPM, 64 MB Cache
- 2x consumer SSDs (boot, root, zil, cache)

#uname -a
FreeBSD bcnas1.bc.local 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Sep 29 15:06:=
03 CEST 2011     root@bcnas1.bc.local:/usr/obj/usr/src/sys/GENERIC  amd64

and a extract from /var/log/messages when using FreeBSD=B4s mps:
Oct  4 08:57:05 bcnas1 kernel: (da3:mps0:0:0:0): SCSI command timeout on de=
vice handle 0x000a SMID 568
Oct  4 08:57:05 bcnas1 kernel: (da3:mps0:0:0:0): SCSI command timeout on de=
vice handle 0x000a SMID 998
Oct  4 08:57:13 bcnas1 kernel: mps0: (0:0:0) terminated ioc 804b scsi 0 sta=
te c xfer 0
Oct  4 08:57:13 bcnas1 kernel: mps0: mpssas_abort_complete: abort request o=
n handle 0x0a SMID 568 complete
Oct  4 08:57:13 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending de=
ferred task management request for handle 0x0a SMID 998
Oct  4 08:57:13 bcnas1 kernel: mps0: mpssas_abort_complete: abort request o=
n handle 0x0a SMID 998 complete
Oct  4 08:58:13 bcnas1 kernel: (da3:mps0:0:0:0): SCSI command timeout on de=
vice handle 0x000a SMID 973
Oct  4 08:58:13 bcnas1 kernel: (da3:mps0:0:0:0): SCSI command timeout on de=
vice handle 0x000a SMID 981
Oct  4 08:58:21 bcnas1 kernel: mps0: (0:0:0) terminated ioc 804b scsi 0 sta=
te c xfer 0
Oct  4 08:58:21 bcnas1 kernel: mps0: mpssas_abort_complete: abort request o=
n handle 0x0a SMID 973 complete
Oct  4 08:58:21 bcnas1 kernel: mps0: mpssas_complete_tm_request: sending de=
ferred task management request for handle 0x0a SMID 981
Oct  4 08:58:21 bcnas1 kernel: mps0: mpssas_abort_complete: abort request o=
n handle 0x0a SMID 981 complete
Oct  4 08:58:24 bcnas1 kernel: (da3:mps0:0:0:0): READ(6). CDB: 8 0 0 0 80 0
Oct  4 08:58:24 bcnas1 kernel: (da3:mps0:0:0:0): CAM status: SCSI Status Er=
ror
Oct  4 08:58:24 bcnas1 kernel: (da3:mps0:0:0:0): SCSI status: Check Conditi=
on
Oct  4 08:58:24 bcnas1 kernel: (da3:mps0:0:0:0): SCSI sense: UNIT ATTENTION=
 asc:29,0 (Power on, reset, or bus device reset occurred)
Oct  4 09:00:14 bcnas1 kernel: mps0: mpssas_remove_complete on target 0x000=
0, IOCStatus=3D 0x0
Oct  4 09:00:14 bcnas1 kernel: (da3:mps0:0:0:0): lost device

and a extract from /var/log/messages when using LSI=B4s mps:
Nov  3 09:17:10 bcnas1bak kernel: mpslsi0: mpssas_scsiio_timeout checking s=
c 0xffffff800f629000 cm 0xffffff800f65f698
Nov  3 09:17:10 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS =
THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 717 c=
ommand timeout cm 0xffffff800f65f698 ccb 0xffffff0026bbb800
Nov  3 09:17:10 bcnas1bak kernel: mpslsi0: mpssas_alloc_tm freezing simq
Nov  3 09:17:10 bcnas1bak kernel: mpslsi0: timedout cm 0xffffff800f65f698 a=
llocated tm 0xffffff800f6340f8
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 2c f3 be e2 0 0 2a 0 length 21504 SMID 261 completed cm 0xffffff800f643cd8=
 ccb 0xffffff0026bd1000 during recovery ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 2c f3 be e2 0 0 2a 0 length 21504 SMID 261 terminated ioc 804b scsi 0 stat=
e c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 52 1e 2 e3 0 0 2b 0 length 22016 SMID 534 completed cm 0xffffff800f654550 =
ccb 0xffffff0026b96000 during recovery ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 52 1e 2 e3 0 0 2b 0 length 22016 SMID 534 terminated ioc 804b scsi 0 state=
 c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 3a 5 14 a3 0 0 2b 0 length 22016 SMID 798 completed cm 0xffffff800f664510 =
ccb 0xffffff003d438000 during recovery ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 3a 5 14 a3 0 0 2b 0 length 22016 SMID 798 terminated ioc 804b scsi 0 state=
 c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 39 81 86 6f 0 0 2b 0 length 22016 SMID 590 completed cm 0xffffff800f657b90=
 ccb 0xffffff00314ce800 during recovery ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 39 81 86 6f 0 0 2b 0 length 22016 SMID 590 terminated ioc 804b scsi 0 stat=
e c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 39 47 e8 2c 0 0 2a 0 length 21504 SMID 634 completed cm 0xffffff800f65a630=
 ccb 0xffffff0026ba1800 during recovery ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 39 47 e8 2c 0 0 2a 0 length 21504 SMID 634 terminated ioc 804b scsi 0 stat=
e c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 2d 8b 96 af 0 0 2b 0 length 22016 SMID 707 completed cm 0xffffff800f65ece8=
 ccb 0xffffff0026bb1800 during recovery ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 2d 8b 96 af 0 0 2b 0 length 22016 SMID 707 terminated ioc 804b scsi 0 stat=
e c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS =
THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 717 c=
ompleted timedout cm 0xffffff800f65f698 ccb 0xffffff0026bbb800 during recov=
(da0:mpslsi0:0:10:0): R$
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 1c dc 68 73 0 0 2b 0 length 22016 SMID 690 terminated ioc 804b scsi 0 stat=
e c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 58 d da 33 0 0 2b 0 length 22016 SMID 947 completed cm 0xffffff800f66d568 =
ccb 0xffffff0026bf9000 during recovery ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 58 d da 33 0 0 2b 0 length 22016 SMID 947 terminated ioc 804b scsi 0 state=
 c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 4b 30 d1 80 0 0 2a 0 length 21504 SMID 683 completed cm 0xffffff800f65d5a8=
 ccb 0xffffff003d47f800 during recovery ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 4b 30 d1 80 0 0 2a 0 length 21504 SMID 683 terminated ioc 804b scsi 0 stat=
e c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 4a d 10 d0 0 0 2b 0 length 22016 SMID 219 completed cm 0xffffff800f641428 =
ccb 0xffffff0031536000 during recovery ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 4a d 10 d0 0 0 2b 0 length 22016 SMID 219 terminated ioc 804b scsi 0 state=
 c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 41 1e 9a 58 0 0 2a 0 length 21504 SMID 169 completed cm 0xffffff800f63e3b8=
 ccb 0xffffff00314ec800 during recovery ioc 804b scsi 0 state c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 41 1e 9a 58 0 0 2a 0 length 21504 SMID 169 terminated ioc 804b scsi 0 stat=
e c xfer 0
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS =
THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 139 =
completed cm 0xffffff800f63c6a8 ccb 0xffffff0026a89000 during recovery ioc =
(pass0:mpslsi0:0:10:0):$
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS =
THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 876 c=
ompleted cm 0xffffff800f6690a0 ccb 0xffffff00314c8800 during recovery ioc 8=
(pass0:mpslsi0:0:10:0):$
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS =
THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 661 =
completed cm 0xffffff800f65c058 ccb 0xffffff0026b7d000 during recovery ioc =
(pass0:mpslsi0:0:10:0):$
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS =
THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 471 =
completed cm 0xffffff800f650848 ccb 0xffffff0026be7800 during recovery ioc =
(pass0:mpslsi0:0:10:0):$
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS =
THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 215 =
completed cm 0xffffff800f641048 ccb 0xffffff0026bef800 during recovery ioc =
(pass0:mpslsi0:0:10:0):$
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS =
THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 203 =
completed cm 0xffffff800f6404a8 ccb 0xffffff0026bb6000 during recovery ioc =
(pass0:mpslsi0:0:10:0):$
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS =
THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 546 =
completed cm 0xffffff800f6550f0 ccb 0xffffff003d447800 during recovery ioc =
(pass0:mpslsi0:0:10:0):$
Nov  3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS =
THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 513 =
completed cm 0xffffff800f6530f8 ccb 0xffffff0026bcb800 during recovery ioc =
(pass0:mpslsi0:0:10:0):$
Nov  3 09:17:11 bcnas1bak kernel: (noperiph:mpslsi0:0:10:0): SMID 1 abort T=
askMID 717 status 0x0 code 0x0 count 20
Nov  3 09:17:11 bcnas1bak kernel: (noperiph:mpslsi0:0:10:0): SMID 1 finishe=
d recovery after aborting TaskMID 717
Nov  3 09:17:11 bcnas1bak kernel: mpslsi0: mpssas_free_tm releasing simq
Nov  3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0=
 41 1e 9a 58 0 0 2a 0
Nov  3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): CAM status: SCSI St=
atus Error
Nov  3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): SCSI status: Check =
Condition
Nov  3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): SCSI sense: UNIT AT=
TENTION asc:29,0 (Power on, reset, or bus device reset occurred)


/Karli

25 okt 2011 kl. 21.33 skrev Kenneth D. Merry:

On Thu, Oct 20, 2011 at 13:28:17 +0200, Karli Sj?berg wrote:
Hi,

I?m in the process of vacating a Sun/Oracle system to a another Supermicro/=
FreeBSD system, doing zfs send/recv between. Two times now, the system has =
panicked while not doing anything at all, and it?s throwing alot of SCSI/CA=
M-related errors while doing IO-intensive operations, like send/recv, resil=
ver, and zpool has sometimes reported read/write errors on the hard drives.=
 Best part is that the errors in messages are about all hard drives at one =
time or another, and they are connected with separate cables, controllers a=
nd caddies. Specs:

HW:
1x  Supermicro X8SIL-F
2x  Supermicro AOC-USAS2-L8i
2x  Supermicro CSE-M35T-1B
1x  Intel Core i5 650 3,2GHz
4x  2GB 1333MHZ DDR3 ECC UDIMM
10x SAMSUNG HD204UI (in a raidz2 zpool)
1x  OCZ Vertex 3 240GB (L2ARC)

SW:
# uname -a
FreeBSD server 8.2-STABLE FreeBSD 8.2-STABLE #0: Mon Oct 10 09:12:25 UTC 20=
11     root@server:/usr/obj/usr/src/sys/GENERIC  amd64
# zpool get version pool1
NAME   PROPERTY  VALUE    SOURCE
pool1  version   28       default[/CODE]

I got the panic from the IPMI KVM:
http://i55.tinypic.com/synpzk.png

In looking at the panic, this is a ZFS panic.  Nothing the disks do should
be able to cause ZFS to panic.  ZFS is panicing in avl_add():

/*
* This is unfortunate.  We want to call panic() here, even for
* non-DEBUG kernels.  In userland, however, we can't depend on anything
* in libc or else the rtld build process gets confused.  So, all we can
* do in userland is resort to a normal ASSERT().
*/
if (avl_find(tree, new_node, &where) !=3D NULL)
#ifdef _KERNEL
panic("avl_find() succeeded inside avl_add()");
#else
ASSERT(0);
#endif

There are certainly timeouts and two terminated IOCs in the log below.  Tha=
t
does suggest a hardware or driver problem, but it isn't very obvious what
it might be.

I have seen bad behavior with SATA drives behind 3Gb Maxim expanders
talking to 6GB LSI controllers, but your particular configuration does not
involve any expanders, and therefore is not that particular STP issue.

My best guess, and it is a guess, is that either the drives are misbehaving
(i.e. firmware type problem) or you've got a cabling issue.

If you have more hardware available, you might try swapping out the cables
and/or drives to see if you can reproduce the drive errors with a
different setup.  If you swap the drives, I would use a different brand if
you've got them available.

I'm CCing the fs list, perhaps someone there can look at the stack trace
above and figure out what ZFS might be doing.

Again, ZFS should survive any errors from the drives, and the panic above
looks like ZFS is flagging a logic bug somewhere.


And an extract from /var/log/messages:
Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): WRITE(10). CDB: 2a 0 6 13 6=
6 f 0 0 f 0
Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): CAM status: SCSI Status Err=
or
Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI status: Check Conditio=
n
Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI sense: UNIT ATTENTION =
asc:29,0 (Power on, reset, or bus device reset occurred)
Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): WRITE(6). CDB: a 0 1 b2 2 0
Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): CAM status: SCSI Status Err=
or
Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI status: Check Conditio=
n
Oct 19 17:37:19 fs2-7 kernel: (da6:mps1:0:0:0): SCSI sense: UNIT ATTENTION =
asc:29,0 (Power on, reset, or bus device reset occurred)
Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev=
ice handle 0x000c SMID 859
Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev=
ice handle 0x000c SMID 495
Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev=
ice handle 0x000c SMID 725
Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev=
ice handle 0x000c SMID 722
Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI command timeout on dev=
ice handle 0x000c SMID 438
Oct 19 17:40:38 fs2-7 kernel: mps1: (1:4:0) terminated ioc 804b scsi 0 stat=
e c xfer 0
Oct 19 17:40:38 fs2-7 last message repeated 3 times
Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on=
 handle 0x0c SMID 859 complete
Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending def=
erred task management request for handle 0x0c SMID 495
Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on=
 handle 0x0c SMID 495 complete
Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending def=
erred task management request for handle 0x0c SMID 725
Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on=
 handle 0x0c SMID 725 complete
Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending def=
erred task management request for handle 0x0c SMID 722
Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on=
 handle 0x0c SMID 722 complete
Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_complete_tm_request: sending def=
erred task management request for handle 0x0c SMID 438
Oct 19 17:40:38 fs2-7 kernel: mps1: mpssas_abort_complete: abort request on=
 handle 0x0c SMID 438 complete
Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): WRITE(10). CDB: 2a 0 6 25 4=
f 75 0 0 b 0
Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): CAM status: SCSI Status Err=
or
Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI status: Check Conditio=
n
Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI sense: UNIT ATTENTION =
asc:29,0 (Power on, reset, or bus device reset occurred)
Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): WRITE(10). CDB: 2a 0 2d a5 =
10 ca 0 0 80 0
Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): CAM status: SCSI Status Err=
or
Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI status: Check Conditio=
n
Oct 19 17:40:38 fs2-7 kernel: (da9:mps1:0:4:0): SCSI sense: UNIT ATTENTION =
asc:29,0 (Power on, reset, or bus device reset occurred)
Oct 19 17:45:40 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on dev=
ice handle 0x000a SMID 976
Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on dev=
ice handle 0x000a SMID 636
Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on dev=
ice handle 0x000a SMID 888
Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI command timeout on dev=
ice handle 0x000a SMID 983
Oct 19 17:45:41 fs2-7 kernel: mps0: (0:1:0) terminated ioc 804b scsi 0 stat=
e c xfer 0
Oct 19 17:45:41 fs2-7 last message repeated 2 times
Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on=
 handle 0x0a SMID 976 complete
Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_complete_tm_request: sending def=
erred task management request for handle 0x0a SMID 636
Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on=
 handle 0x0a SMID 636 complete
Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_complete_tm_request: sending def=
erred task management request for handle 0x0a SMID 888
Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on=
 handle 0x0a SMID 888 complete
Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_complete_tm_request: sending def=
erred task management request for handle 0x0a SMID 983
Oct 19 17:45:41 fs2-7 kernel: mps0: mpssas_abort_complete: abort request on=
 handle 0x0a SMID 983 complete
Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): WRITE(10). CDB: 2a 0 6 40 a=
7 2 0 0 3 0
Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): CAM status: SCSI Status Err=
or
Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI status: Check Conditio=
n
Oct 19 17:45:41 fs2-7 kernel: (da1:mps0:0:1:0): SCSI sense: UNIT ATTENTION =
asc:29,0 (Power on, reset, or bus device reset occurred)
Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): WRITE(10). CDB: 2a 0 6 40 b=
0 9 0 0 9 0
Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): CAM status: SCSI Status Err=
or
Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): SCSI status: Check Conditio=
n
Oct 19 17:45:42 fs2-7 kernel: (da1:mps0:0:1:0): SCSI sense: UNIT ATTENTION =
asc:29,0 (Power on, reset, or bus device reset occurred)

What?s going on?

Regards
Karli Sj?berg_______________________________________________
freebsd-scsi@freebsd.org<mailto:freebsd-scsi@freebsd.org> mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org<mail=
to:freebsd-scsi-unsubscribe@freebsd.org>"

Ken
--
Kenneth Merry
ken@FreeBSD.ORG<mailto:ken@FreeBSD.ORG>


Med V=E4nliga H=E4lsningar
---------------------------------------------------------------------------=
----
Karli Sj=F6berg
Swedish University of Agricultural Sciences
Box 7079 (Visiting Address Kron=E5sv=E4gen 8)
S-750 07 Uppsala, Sweden
Phone:  +46-(0)18-67 15 66
karli.sjoberg@slu.se<mailto:karli.sjoberg@adm.slu.se>


From owner-freebsd-scsi@FreeBSD.ORG  Thu Nov  3 23:53:27 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DB91B106564A
	for <freebsd-scsi@freebsd.org>; Thu,  3 Nov 2011 23:53:27 +0000 (UTC)
	(envelope-from chuck@tuffli.net)
Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com
	[209.85.216.54])
	by mx1.freebsd.org (Postfix) with ESMTP id A7B8B8FC12
	for <freebsd-scsi@freebsd.org>; Thu,  3 Nov 2011 23:53:27 +0000 (UTC)
Received: by qadb12 with SMTP id b12so185330qad.13
	for <freebsd-scsi@freebsd.org>; Thu, 03 Nov 2011 16:53:27 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.7.66 with SMTP id h2mr483342oba.14.1320362947450; Thu, 03
	Nov 2011 16:29:07 -0700 (PDT)
Received: by 10.182.116.102 with HTTP; Thu, 3 Nov 2011 16:29:07 -0700 (PDT)
Date: Thu, 3 Nov 2011 16:29:07 -0700
Message-ID: <CAM0tzX1QkqmVpHR-DGp6Fj7i8DtgPhxXaT8SLH4Ondy05q36cg@mail.gmail.com>
From: Chuck Tuffli <chuck@tuffli.net>
To: freebsd-scsi <freebsd-scsi@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Subject: how to abort an ATIO/INOT
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Nov 2011 23:53:27 -0000

Hi -

I'm implementing a target mode driver using the scsi_target as a
back-end, and am seeing scsi_target hang sometimes when exiting. When
it hangs, the call stack appears to be

abort_all_pending
targdisable
targioctl(TARGIOCDISABLE)

with the "hang" due to the msleep on the pending_ccb_queue. If I
understand the code correctly (which I may not), the msleep is to wait
asynchronously for CCBs to abort.

But what about cases where the CCB completes prior to the msleep? For
example, some drivers call xpt_done on ATIO/INOT CCBs and then return
CAM_REQ_CMP for the abort (I copied this in my driver). I believe this
results in the hang as the abort request completes (status ==
CAM_REQ_CMP) triggering the msleep, but the xpt_done that could wake
up anything sleeping on the pending_ccb_queue has already run.

So, should target drivers not return CAM_REQ_CMP unless a CCB needs to
be asynchronously aborted? What about CTIO? Does that have a potential
race?

---chuck