From owner-freebsd-scsi@freebsd.org  Tue Jul  7 12:02:24 2015
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 83606995AAC
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Tue,  7 Jul 2015 12:02:24 +0000 (UTC)
 (envelope-from lists@yamagi.org)
Received: from mail1.yamagi.org (yugo.yamagi.org [212.48.122.103])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4AC59104F
 for <freebsd-scsi@freebsd.org>; Tue,  7 Jul 2015 12:02:23 +0000 (UTC)
 (envelope-from lists@yamagi.org)
Received: from [192.168.100.101] (helo=aka)
 by mail1.yamagi.org with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256)
 (Exim 4.85 (FreeBSD)) (envelope-from <lists@yamagi.org>)
 id 1ZCQz7-0000LK-GC; Tue, 07 Jul 2015 13:24:22 +0200
Date: Tue, 7 Jul 2015 13:24:16 +0200
From: Yamagi Burmeister <lists@yamagi.org>
To: freebsd-scsi@freebsd.org
Subject: Device timeouts(?) with LSI SAS3008 on mpr(4)
Message-Id: <20150707132416.71b44c90f7f4cd6014a304b2@yamagi.org>
X-Mailer: Sylpheed 3.4.2 (GTK+ 2.24.27; amd64-portbld-freebsd10.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2015 12:02:24 -0000

Hello,
I've got 3 new Supermicro servers based upon the X10DRi-LN4+ platform.
Each server is equiped with 2 LSI SAS9300-8i-SQL SAS adapters. Each
adapter serves 8 Intel DC S3700 SSDs. Operating system is 10.1-STABLE
as of r283938 on 2 servers and r285196 on the last one. 

The controller identify themself as:

----

mpr0: <Avago Technologies (LSI) SAS3008> port 0x6000-0x60ff mem
0xc7240000-0xc724ffff,0xc7200000-0xc723ffff irq 32 at device 0.0 on
pci2 mpr0: IOCFacts  : MsgVersion: 0x205
        HeaderVersion: 0x2300
        IOCNumber: 0
        IOCExceptions: 0x0
        MaxChainDepth: 128
        NumberOfPorts: 1
        RequestCredit: 10240
        ProductID: 0x2221
        IOCRequestFrameSize: 32
        MaxInitiators: 32
        MaxTargets: 1024
        MaxSasExpanders: 42
        MaxEnclosures: 43
        HighPriorityCredit: 128
        MaxReplyDescriptorPostQueueDepth: 65504
        ReplyFrameSize: 32
        MaxVolumes: 0
        MaxDevHandle: 1106
        MaxPersistentEntries: 128
mpr0: Firmware: 08.00.00.00, Driver: 09.255.01.00-fbsd
mpr0: IOCCapabilities:
7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>

----

08.00.00.00 is the last available firmware.


Since day one 'dmesg' is cluttered with CAM errors:

----

mpr1: Sending reset from mprsas_send_abort for target ID 5
        (da11:mpr1:0:5:0): WRITE(10). CDB: 2a 00 4c 15 1f 88 00 00 08
00 length 4096 SMID 554 terminated ioc 804b scsi 0 state c xfer 0
(da11:mpr1:0:5:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00
01 00 00 00 00 00 00 40 06 00 length 512 SMID 506 ter(da11:mpr1:0:5:0):
READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00 minated ioc 804b scsi 0
state c xfer 0 (da11:mpr1:0:5:0): CAM status: Command timeout mpr1:
(da11:Unfreezing devq for target ID 5 mpr1:0:5:0): Retrying command
(da11:mpr1:0:5:0): READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00
(da11:mpr1:0:5:0): CAM status: SCSI Status Error (da11:mpr1:0:5:0):
SCSI status: Check Condition (da11:mpr1:0:5:0): SCSI sense: UNIT
ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da11:mpr1:0:5:0): Retrying command (per sense data) (da11:mpr1:0:5:0):
READ(10). CDB: 28 00 4c 22 b5 b8 00 00 18 00 (da11:mpr1:0:5:0): CAM
status: SCSI Status Error (da11:mpr1:0:5:0): SCSI status: Check
Condition (da11:mpr1:0:5:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power
on, reset, or bus device reset occurred) (da11:mpr1:0:5:0): Retrying
command (per sense data) (noperiph:mpr1:0:4294967295:0): SMID 2
Aborting command 0xfffffe0001601a30

mpr1: Sending reset from mprsas_send_abort for target ID 2
        (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00 00 30 00
length 24576 SMID 898 terminated ioc 804b scsi 0 state c xfer 0
(da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 77 cc e0 00 00 18 00 length
12288 SMID 604 terminated ioc 804b scsi 0 state c xfer 0 mpr1:
Unfreezing devq for target ID 2 (da8:mpr1:0:2:0): ATA COMMAND PASS
THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
(da8:mpr1:0:2:0): CAM status: Command timeout (da8:mpr1:0:2:0):
Retrying command (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00
00 30 00 (da8:mpr1:0:2:0): CAM status: SCSI Status Error
(da8:mpr1:0:2:0): SCSI status: Check Condition (da8:mpr1:0:2:0): SCSI
sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset
occurred) (da8:mpr1:0:2:0): Retrying command (per sense data)
(da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 41 3d 08 00 00 10 00
(da8:mpr1:0:2:0): CAM status: SCSI Status Error (da8:mpr1:0:2:0): SCSI
status: Check Condition (da8:mpr1:0:2:0): SCSI sense: UNIT ATTENTION
asc:29,0 (Power on, reset, or bus device reset occurred)
(da8:mpr1:0:2:0): Retrying command (per sense data)
(noperiph:mpr1:0:4294967295:0): SMID 3 Aborting command
0xfffffe000160b660

----

ZFS doesn't like this and sees read errors or even write errors. In
extreme cases the device is marked as FAULTED:

----

  pool: examplepool
 state: DEGRADED
status: One or more devices are faulted in response to persistent
errors. Sufficient replicas exist for the pool to continue functioning
in a degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the
device repaired.
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	examplepool DEGRADED     0     0     0
	  raidz1-0  ONLINE       0     0     0
	    da3p1   ONLINE       0     0     0
	    da4p1   ONLINE       0     0     0
	    da5p1   ONLINE       0     0     0
	logs
	  da1p1     FAULTED      3     0     0  too many errors
	cache
	  da1p2     FAULTED      3     0     0  too many errors
	spares
	  da2p1     AVAIL   

errors: No known data errors

----

The problems arise on all 3 machines all all SSDs nearly daily. So I
highly suspect a software issue. Has anyone an idea what's going on and
what I can do to solve this problems? More information can be provided
if necessary.

Regards,
Yamagi

-- 
Homepage:  www.yamagi.org
XMPP:      yamagi@yamagi.org
GnuPG/GPG: 0xEFBCCBCB

From owner-freebsd-scsi@freebsd.org  Tue Jul  7 15:37:25 2015
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0A36995F34
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Tue,  7 Jul 2015 15:37:25 +0000 (UTC)
 (envelope-from stephen.mcconnell@avagotech.com)
Received: from mail-vn0-x233.google.com (mail-vn0-x233.google.com
 [IPv6:2607:f8b0:400c:c0f::233])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id AE1451D7E
 for <freebsd-scsi@freebsd.org>; Tue,  7 Jul 2015 15:37:25 +0000 (UTC)
 (envelope-from stephen.mcconnell@avagotech.com)
Received: by vnbf7 with SMTP id f7so16381826vnb.0
 for <freebsd-scsi@freebsd.org>; Tue, 07 Jul 2015 08:37:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=avagotech.com; s=google;
 h=from:references:in-reply-to:mime-version:thread-index:date
 :message-id:subject:to:content-type;
 bh=bbcIyltYojc0gLxutuNbAofzT86Z78VJ6ws17oqPbPU=;
 b=B4xUhOhTvOxWa33Gsv69eLnhzzIv3uuAlWHOafq1Mw2gpD3fleLheXaa5iVXZm/B3p
 9VvnfyZlGQ1gPrBHRr4+160pjCy5pmbjOEsBI7X0vyBirBGvg7ccG79PLAnZcnXNIxdO
 xcpWd7v9JGf/lEi0PG9TzR6r+dV7rh6reqY3c=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:from:references:in-reply-to:mime-version
 :thread-index:date:message-id:subject:to:content-type;
 bh=bbcIyltYojc0gLxutuNbAofzT86Z78VJ6ws17oqPbPU=;
 b=HHD1AV/cn6oD5fqL23ECpPaROMCyUtwwkUmJAuP3/QuEDxfpoTW/O7uA8hFr7zoaVW
 DRW5g6XOsUT8ltx96rX2qrNhsE7W9b7PlVC4UxcVPCjEoc16qUSu60OJoUrlN5sTQR+u
 0ILiuiOG7MdHdE1kNyUuXlUyKDdXNmxmP9osRCVSfKCHs5Gaur19F2CJ0IP3wI085TEF
 j1I74ADABgyjt+XuNxUlhV5Bu70Q0FYaFU58CgQkVQDMo8yzfPhoE2gZa/wTznyeKF/4
 p+4mbNCCACTIG5yKUPyNwgS2KxmGEU7jAWVRoWoxjr4RbqIoUYWX1Twtn3i9XYGiFWGc
 Tklg==
X-Gm-Message-State: ALoCoQkrTMkMVjHo5jsGmJNUAQiAji4yWMYv1PzS7doSgku44xAKFmCMpEOYYNt93kry9b3W/NRk
X-Received: by 10.52.114.230 with SMTP id jj6mr4871589vdb.66.1436283444321;
 Tue, 07 Jul 2015 08:37:24 -0700 (PDT)
From: Stephen Mcconnell <stephen.mcconnell@avagotech.com>
References: <20150707132416.71b44c90f7f4cd6014a304b2@yamagi.org>
In-Reply-To: <20150707132416.71b44c90f7f4cd6014a304b2@yamagi.org>
MIME-Version: 1.0
X-Mailer: Microsoft Outlook 14.0
Thread-Index: AQGYg4GhL/MCbKjUmWvd1rpHhP9ph55AjWnQ
Date: Tue, 7 Jul 2015 09:37:22 -0600
Message-ID: <9426ced85d7def424e106fdefd7448ae@mail.gmail.com>
Subject: RE: Device timeouts(?) with LSI SAS3008 on mpr(4)
To: Yamagi Burmeister <lists@yamagi.org>, freebsd-scsi@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2015 15:37:26 -0000

Hi Yamagi,

I see two drives that are having problems.  Are there others?  Can you try
to remove those drives and let me know what happens.  To me, it actually
looks like those drives could be faulty.

Steve

> -----Original Message-----
> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd-
> scsi@freebsd.org] On Behalf Of Yamagi Burmeister
> Sent: Tuesday, July 07, 2015 5:24 AM
> To: freebsd-scsi@freebsd.org
> Subject: Device timeouts(?) with LSI SAS3008 on mpr(4)
>
> Hello,
> I've got 3 new Supermicro servers based upon the X10DRi-LN4+ platform.
> Each server is equiped with 2 LSI SAS9300-8i-SQL SAS adapters. Each
adapter
> serves 8 Intel DC S3700 SSDs. Operating system is 10.1-STABLE as of
r283938 on
> 2 servers and r285196 on the last one.
>
> The controller identify themself as:
>
> ----
>
> mpr0: <Avago Technologies (LSI) SAS3008> port 0x6000-0x60ff mem
> 0xc7240000-0xc724ffff,0xc7200000-0xc723ffff irq 32 at device 0.0 on
> pci2 mpr0: IOCFacts  : MsgVersion: 0x205
>         HeaderVersion: 0x2300
>         IOCNumber: 0
>         IOCExceptions: 0x0
>         MaxChainDepth: 128
>         NumberOfPorts: 1
>         RequestCredit: 10240
>         ProductID: 0x2221
>         IOCRequestFrameSize: 32
>         MaxInitiators: 32
>         MaxTargets: 1024
>         MaxSasExpanders: 42
>         MaxEnclosures: 43
>         HighPriorityCredit: 128
>         MaxReplyDescriptorPostQueueDepth: 65504
>         ReplyFrameSize: 32
>         MaxVolumes: 0
>         MaxDevHandle: 1106
>         MaxPersistentEntries: 128
> mpr0: Firmware: 08.00.00.00, Driver: 09.255.01.00-fbsd
> mpr0: IOCCapabilities:
>
7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex
> ,HostDisc>
>
> ----
>
> 08.00.00.00 is the last available firmware.
>
>
> Since day one 'dmesg' is cluttered with CAM errors:
>
> ----
>
> mpr1: Sending reset from mprsas_send_abort for target ID 5
>         (da11:mpr1:0:5:0): WRITE(10). CDB: 2a 00 4c 15 1f 88 00 00 08
> 00 length 4096 SMID 554 terminated ioc 804b scsi 0 state c xfer 0
> (da11:mpr1:0:5:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00
> 01 00 00 00 00 00 00 40 06 00 length 512 SMID 506 ter(da11:mpr1:0:5:0):
> READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00 minated ioc 804b scsi 0
state c
> xfer 0 (da11:mpr1:0:5:0): CAM status: Command timeout mpr1:
> (da11:Unfreezing devq for target ID 5 mpr1:0:5:0): Retrying command
> (da11:mpr1:0:5:0): READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00
> (da11:mpr1:0:5:0): CAM status: SCSI Status Error (da11:mpr1:0:5:0):
> SCSI status: Check Condition (da11:mpr1:0:5:0): SCSI sense: UNIT
ATTENTION
> asc:29,0 (Power on, reset, or bus device reset occurred)
> (da11:mpr1:0:5:0): Retrying command (per sense data) (da11:mpr1:0:5:0):
> READ(10). CDB: 28 00 4c 22 b5 b8 00 00 18 00 (da11:mpr1:0:5:0): CAM
> status: SCSI Status Error (da11:mpr1:0:5:0): SCSI status: Check
Condition
> (da11:mpr1:0:5:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset,
or
> bus device reset occurred) (da11:mpr1:0:5:0): Retrying command (per
sense
> data) (noperiph:mpr1:0:4294967295:0): SMID 2 Aborting command
> 0xfffffe0001601a30
>
> mpr1: Sending reset from mprsas_send_abort for target ID 2
>         (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00 00 30 00
length
> 24576 SMID 898 terminated ioc 804b scsi 0 state c xfer 0
> (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 77 cc e0 00 00 18 00 length
> 12288 SMID 604 terminated ioc 804b scsi 0 state c xfer 0 mpr1:
> Unfreezing devq for target ID 2 (da8:mpr1:0:2:0): ATA COMMAND PASS
> THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
> (da8:mpr1:0:2:0): CAM status: Command timeout (da8:mpr1:0:2:0):
> Retrying command (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00
> 00 30 00 (da8:mpr1:0:2:0): CAM status: SCSI Status Error
> (da8:mpr1:0:2:0): SCSI status: Check Condition (da8:mpr1:0:2:0): SCSI
> sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset
> occurred) (da8:mpr1:0:2:0): Retrying command (per sense data)
> (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 41 3d 08 00 00 10 00
> (da8:mpr1:0:2:0): CAM status: SCSI Status Error (da8:mpr1:0:2:0): SCSI
> status: Check Condition (da8:mpr1:0:2:0): SCSI sense: UNIT ATTENTION
> asc:29,0 (Power on, reset, or bus device reset occurred)
> (da8:mpr1:0:2:0): Retrying command (per sense data)
> (noperiph:mpr1:0:4294967295:0): SMID 3 Aborting command
> 0xfffffe000160b660
>
> ----
>
> ZFS doesn't like this and sees read errors or even write errors. In
extreme cases
> the device is marked as FAULTED:
>
> ----
>
>   pool: examplepool
>  state: DEGRADED
> status: One or more devices are faulted in response to persistent
errors.
> Sufficient replicas exist for the pool to continue functioning in a
degraded state.
> action: Replace the faulted device, or use 'zpool clear' to mark the
device
> repaired.
>   scan: none requested
> config:
>
> 	NAME        STATE     READ WRITE CKSUM
> 	examplepool DEGRADED     0     0     0
> 	  raidz1-0  ONLINE       0     0     0
> 	    da3p1   ONLINE       0     0     0
> 	    da4p1   ONLINE       0     0     0
> 	    da5p1   ONLINE       0     0     0
> 	logs
> 	  da1p1     FAULTED      3     0     0  too many errors
> 	cache
> 	  da1p2     FAULTED      3     0     0  too many errors
> 	spares
> 	  da2p1     AVAIL
>
> errors: No known data errors
>
> ----
>
> The problems arise on all 3 machines all all SSDs nearly daily. So I
highly suspect
> a software issue. Has anyone an idea what's going on and what I can do
to solve
> this problems? More information can be provided if necessary.
>
> Regards,
> Yamagi
>
> --
> Homepage:  www.yamagi.org
> XMPP:      yamagi@yamagi.org
> GnuPG/GPG: 0xEFBCCBCB
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"

From owner-freebsd-scsi@freebsd.org  Tue Jul  7 16:31:48 2015
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E0317996D42
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Tue,  7 Jul 2015 16:31:47 +0000 (UTC)
 (envelope-from lists@yamagi.org)
Received: from mail1.yamagi.org (yugo.yamagi.org [212.48.122.103])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id A3ED21543
 for <freebsd-scsi@freebsd.org>; Tue,  7 Jul 2015 16:31:47 +0000 (UTC)
 (envelope-from lists@yamagi.org)
Received: from p4fed1304.dip0.t-ipconnect.de ([79.237.19.4]
 helo=kosei.home.yamagi.org.dhcp.yamagi.org)
 by mail1.yamagi.org with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256)
 (Exim 4.85 (FreeBSD)) (envelope-from <lists@yamagi.org>)
 id 1ZCVmX-0004Nc-7L; Tue, 07 Jul 2015 18:31:42 +0200
Date: Tue, 7 Jul 2015 18:31:35 +0200
From: Yamagi Burmeister <lists@yamagi.org>
To: stephen.mcconnell@avagotech.com
Cc: freebsd-scsi@freebsd.org
Subject: Re: Device timeouts(?) with LSI SAS3008 on mpr(4)
Message-Id: <20150707183135.2c3f5aa45696b55a17e2f87f@yamagi.org>
In-Reply-To: <9426ced85d7def424e106fdefd7448ae@mail.gmail.com>
References: <20150707132416.71b44c90f7f4cd6014a304b2@yamagi.org>
 <9426ced85d7def424e106fdefd7448ae@mail.gmail.com>
X-Mailer: Sylpheed 3.4.2 (GTK+ 2.24.28; x86_64-unknown-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2015 16:31:48 -0000

Hello Stephen,
I'm seeing those errors on all 3 servers and on all 16 devices. The 2
dmesg entries were just an example. It seems to be random were they
occure. Maybe the second controller mps1 has a higher chance then
mps0, but I'm not sure.

My co-worker suspected FreeBSDs power management. On on of the servers
I forced c-states to C1 and deactivated powerd. In the last 2 hours no
new errors arose but it's far too early to draw conclusions.

Regards,
Yamagi

On Tue, 7 Jul 2015 09:37:22 -0600
Stephen Mcconnell <stephen.mcconnell@avagotech.com> wrote:

> Hi Yamagi,
> 
> I see two drives that are having problems.  Are there others?  Can you try
> to remove those drives and let me know what happens.  To me, it actually
> looks like those drives could be faulty.
> 
> Steve
> 
> > -----Original Message-----
> > From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd-
> > scsi@freebsd.org] On Behalf Of Yamagi Burmeister
> > Sent: Tuesday, July 07, 2015 5:24 AM
> > To: freebsd-scsi@freebsd.org
> > Subject: Device timeouts(?) with LSI SAS3008 on mpr(4)
> >
> > Hello,
> > I've got 3 new Supermicro servers based upon the X10DRi-LN4+ platform.
> > Each server is equiped with 2 LSI SAS9300-8i-SQL SAS adapters. Each
> adapter
> > serves 8 Intel DC S3700 SSDs. Operating system is 10.1-STABLE as of
> r283938 on
> > 2 servers and r285196 on the last one.
> >
> > The controller identify themself as:
> >
> > ----
> >
> > mpr0: <Avago Technologies (LSI) SAS3008> port 0x6000-0x60ff mem
> > 0xc7240000-0xc724ffff,0xc7200000-0xc723ffff irq 32 at device 0.0 on
> > pci2 mpr0: IOCFacts  : MsgVersion: 0x205
> >         HeaderVersion: 0x2300
> >         IOCNumber: 0
> >         IOCExceptions: 0x0
> >         MaxChainDepth: 128
> >         NumberOfPorts: 1
> >         RequestCredit: 10240
> >         ProductID: 0x2221
> >         IOCRequestFrameSize: 32
> >         MaxInitiators: 32
> >         MaxTargets: 1024
> >         MaxSasExpanders: 42
> >         MaxEnclosures: 43
> >         HighPriorityCredit: 128
> >         MaxReplyDescriptorPostQueueDepth: 65504
> >         ReplyFrameSize: 32
> >         MaxVolumes: 0
> >         MaxDevHandle: 1106
> >         MaxPersistentEntries: 128
> > mpr0: Firmware: 08.00.00.00, Driver: 09.255.01.00-fbsd
> > mpr0: IOCCapabilities:
> >
> 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex
> > ,HostDisc>
> >
> > ----
> >
> > 08.00.00.00 is the last available firmware.
> >
> >
> > Since day one 'dmesg' is cluttered with CAM errors:
> >
> > ----
> >
> > mpr1: Sending reset from mprsas_send_abort for target ID 5
> >         (da11:mpr1:0:5:0): WRITE(10). CDB: 2a 00 4c 15 1f 88 00 00 08
> > 00 length 4096 SMID 554 terminated ioc 804b scsi 0 state c xfer 0
> > (da11:mpr1:0:5:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00
> > 01 00 00 00 00 00 00 40 06 00 length 512 SMID 506 ter(da11:mpr1:0:5:0):
> > READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00 minated ioc 804b scsi 0
> state c
> > xfer 0 (da11:mpr1:0:5:0): CAM status: Command timeout mpr1:
> > (da11:Unfreezing devq for target ID 5 mpr1:0:5:0): Retrying command
> > (da11:mpr1:0:5:0): READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00
> > (da11:mpr1:0:5:0): CAM status: SCSI Status Error (da11:mpr1:0:5:0):
> > SCSI status: Check Condition (da11:mpr1:0:5:0): SCSI sense: UNIT
> ATTENTION
> > asc:29,0 (Power on, reset, or bus device reset occurred)
> > (da11:mpr1:0:5:0): Retrying command (per sense data) (da11:mpr1:0:5:0):
> > READ(10). CDB: 28 00 4c 22 b5 b8 00 00 18 00 (da11:mpr1:0:5:0): CAM
> > status: SCSI Status Error (da11:mpr1:0:5:0): SCSI status: Check
> Condition
> > (da11:mpr1:0:5:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset,
> or
> > bus device reset occurred) (da11:mpr1:0:5:0): Retrying command (per
> sense
> > data) (noperiph:mpr1:0:4294967295:0): SMID 2 Aborting command
> > 0xfffffe0001601a30
> >
> > mpr1: Sending reset from mprsas_send_abort for target ID 2
> >         (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00 00 30 00
> length
> > 24576 SMID 898 terminated ioc 804b scsi 0 state c xfer 0
> > (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 77 cc e0 00 00 18 00 length
> > 12288 SMID 604 terminated ioc 804b scsi 0 state c xfer 0 mpr1:
> > Unfreezing devq for target ID 2 (da8:mpr1:0:2:0): ATA COMMAND PASS
> > THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
> > (da8:mpr1:0:2:0): CAM status: Command timeout (da8:mpr1:0:2:0):
> > Retrying command (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00
> > 00 30 00 (da8:mpr1:0:2:0): CAM status: SCSI Status Error
> > (da8:mpr1:0:2:0): SCSI status: Check Condition (da8:mpr1:0:2:0): SCSI
> > sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset
> > occurred) (da8:mpr1:0:2:0): Retrying command (per sense data)
> > (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 41 3d 08 00 00 10 00
> > (da8:mpr1:0:2:0): CAM status: SCSI Status Error (da8:mpr1:0:2:0): SCSI
> > status: Check Condition (da8:mpr1:0:2:0): SCSI sense: UNIT ATTENTION
> > asc:29,0 (Power on, reset, or bus device reset occurred)
> > (da8:mpr1:0:2:0): Retrying command (per sense data)
> > (noperiph:mpr1:0:4294967295:0): SMID 3 Aborting command
> > 0xfffffe000160b660
> >
> > ----
> >
> > ZFS doesn't like this and sees read errors or even write errors. In
> extreme cases
> > the device is marked as FAULTED:
> >
> > ----
> >
> >   pool: examplepool
> >  state: DEGRADED
> > status: One or more devices are faulted in response to persistent
> errors.
> > Sufficient replicas exist for the pool to continue functioning in a
> degraded state.
> > action: Replace the faulted device, or use 'zpool clear' to mark the
> device
> > repaired.
> >   scan: none requested
> > config:
> >
> > 	NAME        STATE     READ WRITE CKSUM
> > 	examplepool DEGRADED     0     0     0
> > 	  raidz1-0  ONLINE       0     0     0
> > 	    da3p1   ONLINE       0     0     0
> > 	    da4p1   ONLINE       0     0     0
> > 	    da5p1   ONLINE       0     0     0
> > 	logs
> > 	  da1p1     FAULTED      3     0     0  too many errors
> > 	cache
> > 	  da1p2     FAULTED      3     0     0  too many errors
> > 	spares
> > 	  da2p1     AVAIL
> >
> > errors: No known data errors
> >
> > ----
> >
> > The problems arise on all 3 machines all all SSDs nearly daily. So I
> highly suspect
> > a software issue. Has anyone an idea what's going on and what I can do
> to solve
> > this problems? More information can be provided if necessary.
> >
> > Regards,
> > Yamagi
> >
> > --
> > Homepage:  www.yamagi.org
> > XMPP:      yamagi@yamagi.org
> > GnuPG/GPG: 0xEFBCCBCB
> > _______________________________________________
> > freebsd-scsi@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"


-- 
Homepage:  www.yamagi.org
XMPP:      yamagi@yamagi.org
GnuPG/GPG: 0xEFBCCBCB

From owner-freebsd-scsi@freebsd.org  Tue Jul  7 16:42:49 2015
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DFA8A996EE0
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Tue,  7 Jul 2015 16:42:49 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: from mail-wg0-f44.google.com (mail-wg0-f44.google.com [74.125.82.44])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7782C1BD7
 for <freebsd-scsi@freebsd.org>; Tue,  7 Jul 2015 16:42:48 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: by wgjx7 with SMTP id x7so173066878wgj.2
 for <freebsd-scsi@freebsd.org>; Tue, 07 Jul 2015 09:42:47 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-type
 :content-transfer-encoding;
 bh=OPfJK/aoQr5oNpjlvg/f/SNciqznpgHA1PLpn3DvNGQ=;
 b=Z2AJ3hTnm/09A04soxjuMRc/BpFp0OWGBwY0xZHqzOFp7HekCcvD29fn32WivwJxL5
 CYW7EEu5u9hMDrTxcGTuaNPEAfcRFisF7SDiu3kFZFNLJze9W0a1+CqcKU9wtxCacz9a
 631nEDf78X2UobNPPusJ1Sg7qUJ85ZcP3gmxgzS2Mui6vf/HHc5gzG192q4BewQ0if8I
 zb5UAscjD/dyyleU2VQBv4eNl462G/N18KKoXjWJSLNzirLZlFMjVHKKnMhaVQ+TbTmC
 FsJQG0hW0ocBcquoIzlWBbO3Z+3tH3fHHzuYncAC9JhCPA1gMg2qYewhfuDDXiQp1kvQ
 c7PA==
X-Gm-Message-State: ALoCoQlLqkQD9JZ0WSQtEwwF2jQ1VJIUz3ZDF3VNNWEZkmfe2LHVhdP1EZ0HWcr9UxKR46EEDPbm
X-Received: by 10.194.185.8 with SMTP id ey8mr10351763wjc.118.1436287367064;
 Tue, 07 Jul 2015 09:42:47 -0700 (PDT)
Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk.
 [82.69.141.170])
 by mx.google.com with ESMTPSA id pd7sm34212434wjb.27.2015.07.07.09.42.46
 for <freebsd-scsi@freebsd.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 07 Jul 2015 09:42:46 -0700 (PDT)
Subject: Re: Device timeouts(?) with LSI SAS3008 on mpr(4)
To: freebsd-scsi@freebsd.org
References: <20150707132416.71b44c90f7f4cd6014a304b2@yamagi.org>
 <9426ced85d7def424e106fdefd7448ae@mail.gmail.com>
 <20150707183135.2c3f5aa45696b55a17e2f87f@yamagi.org>
From: Steven Hartland <killing@multiplay.co.uk>
Message-ID: <559C0184.4050102@multiplay.co.uk>
Date: Tue, 7 Jul 2015 17:42:44 +0100
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101
 Thunderbird/38.0.1
MIME-Version: 1.0
In-Reply-To: <20150707183135.2c3f5aa45696b55a17e2f87f@yamagi.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2015 16:42:50 -0000

Have you eliminated the midplane / cabling as the issue as that's very 
common.

On 07/07/2015 17:31, Yamagi Burmeister wrote:
> Hello Stephen,
> I'm seeing those errors on all 3 servers and on all 16 devices. The 2
> dmesg entries were just an example. It seems to be random were they
> occure. Maybe the second controller mps1 has a higher chance then
> mps0, but I'm not sure.
>
> My co-worker suspected FreeBSDs power management. On on of the servers
> I forced c-states to C1 and deactivated powerd. In the last 2 hours no
> new errors arose but it's far too early to draw conclusions.
>
> Regards,
> Yamagi
>
> On Tue, 7 Jul 2015 09:37:22 -0600
> Stephen Mcconnell <stephen.mcconnell@avagotech.com> wrote:
>
>> Hi Yamagi,
>>
>> I see two drives that are having problems.  Are there others?  Can you try
>> to remove those drives and let me know what happens.  To me, it actually
>> looks like those drives could be faulty.
>>
>> Steve
>>
>>> -----Original Message-----
>>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd-
>>> scsi@freebsd.org] On Behalf Of Yamagi Burmeister
>>> Sent: Tuesday, July 07, 2015 5:24 AM
>>> To: freebsd-scsi@freebsd.org
>>> Subject: Device timeouts(?) with LSI SAS3008 on mpr(4)
>>>
>>> Hello,
>>> I've got 3 new Supermicro servers based upon the X10DRi-LN4+ platform.
>>> Each server is equiped with 2 LSI SAS9300-8i-SQL SAS adapters. Each
>> adapter
>>> serves 8 Intel DC S3700 SSDs. Operating system is 10.1-STABLE as of
>> r283938 on
>>> 2 servers and r285196 on the last one.
>>>
>>> The controller identify themself as:
>>>
>>> ----
>>>
>>> mpr0: <Avago Technologies (LSI) SAS3008> port 0x6000-0x60ff mem
>>> 0xc7240000-0xc724ffff,0xc7200000-0xc723ffff irq 32 at device 0.0 on
>>> pci2 mpr0: IOCFacts  : MsgVersion: 0x205
>>>          HeaderVersion: 0x2300
>>>          IOCNumber: 0
>>>          IOCExceptions: 0x0
>>>          MaxChainDepth: 128
>>>          NumberOfPorts: 1
>>>          RequestCredit: 10240
>>>          ProductID: 0x2221
>>>          IOCRequestFrameSize: 32
>>>          MaxInitiators: 32
>>>          MaxTargets: 1024
>>>          MaxSasExpanders: 42
>>>          MaxEnclosures: 43
>>>          HighPriorityCredit: 128
>>>          MaxReplyDescriptorPostQueueDepth: 65504
>>>          ReplyFrameSize: 32
>>>          MaxVolumes: 0
>>>          MaxDevHandle: 1106
>>>          MaxPersistentEntries: 128
>>> mpr0: Firmware: 08.00.00.00, Driver: 09.255.01.00-fbsd
>>> mpr0: IOCCapabilities:
>>>
>> 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex
>>> ,HostDisc>
>>>
>>> ----
>>>
>>> 08.00.00.00 is the last available firmware.
>>>
>>>
>>> Since day one 'dmesg' is cluttered with CAM errors:
>>>
>>> ----
>>>
>>> mpr1: Sending reset from mprsas_send_abort for target ID 5
>>>          (da11:mpr1:0:5:0): WRITE(10). CDB: 2a 00 4c 15 1f 88 00 00 08
>>> 00 length 4096 SMID 554 terminated ioc 804b scsi 0 state c xfer 0
>>> (da11:mpr1:0:5:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00
>>> 01 00 00 00 00 00 00 40 06 00 length 512 SMID 506 ter(da11:mpr1:0:5:0):
>>> READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00 minated ioc 804b scsi 0
>> state c
>>> xfer 0 (da11:mpr1:0:5:0): CAM status: Command timeout mpr1:
>>> (da11:Unfreezing devq for target ID 5 mpr1:0:5:0): Retrying command
>>> (da11:mpr1:0:5:0): READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00
>>> (da11:mpr1:0:5:0): CAM status: SCSI Status Error (da11:mpr1:0:5:0):
>>> SCSI status: Check Condition (da11:mpr1:0:5:0): SCSI sense: UNIT
>> ATTENTION
>>> asc:29,0 (Power on, reset, or bus device reset occurred)
>>> (da11:mpr1:0:5:0): Retrying command (per sense data) (da11:mpr1:0:5:0):
>>> READ(10). CDB: 28 00 4c 22 b5 b8 00 00 18 00 (da11:mpr1:0:5:0): CAM
>>> status: SCSI Status Error (da11:mpr1:0:5:0): SCSI status: Check
>> Condition
>>> (da11:mpr1:0:5:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset,
>> or
>>> bus device reset occurred) (da11:mpr1:0:5:0): Retrying command (per
>> sense
>>> data) (noperiph:mpr1:0:4294967295:0): SMID 2 Aborting command
>>> 0xfffffe0001601a30
>>>
>>> mpr1: Sending reset from mprsas_send_abort for target ID 2
>>>          (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00 00 30 00
>> length
>>> 24576 SMID 898 terminated ioc 804b scsi 0 state c xfer 0
>>> (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 77 cc e0 00 00 18 00 length
>>> 12288 SMID 604 terminated ioc 804b scsi 0 state c xfer 0 mpr1:
>>> Unfreezing devq for target ID 2 (da8:mpr1:0:2:0): ATA COMMAND PASS
>>> THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
>>> (da8:mpr1:0:2:0): CAM status: Command timeout (da8:mpr1:0:2:0):
>>> Retrying command (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00
>>> 00 30 00 (da8:mpr1:0:2:0): CAM status: SCSI Status Error
>>> (da8:mpr1:0:2:0): SCSI status: Check Condition (da8:mpr1:0:2:0): SCSI
>>> sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset
>>> occurred) (da8:mpr1:0:2:0): Retrying command (per sense data)
>>> (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 41 3d 08 00 00 10 00
>>> (da8:mpr1:0:2:0): CAM status: SCSI Status Error (da8:mpr1:0:2:0): SCSI
>>> status: Check Condition (da8:mpr1:0:2:0): SCSI sense: UNIT ATTENTION
>>> asc:29,0 (Power on, reset, or bus device reset occurred)
>>> (da8:mpr1:0:2:0): Retrying command (per sense data)
>>> (noperiph:mpr1:0:4294967295:0): SMID 3 Aborting command
>>> 0xfffffe000160b660
>>>
>>> ----
>>>
>>> ZFS doesn't like this and sees read errors or even write errors. In
>> extreme cases
>>> the device is marked as FAULTED:
>>>
>>> ----
>>>
>>>    pool: examplepool
>>>   state: DEGRADED
>>> status: One or more devices are faulted in response to persistent
>> errors.
>>> Sufficient replicas exist for the pool to continue functioning in a
>> degraded state.
>>> action: Replace the faulted device, or use 'zpool clear' to mark the
>> device
>>> repaired.
>>>    scan: none requested
>>> config:
>>>
>>> 	NAME        STATE     READ WRITE CKSUM
>>> 	examplepool DEGRADED     0     0     0
>>> 	  raidz1-0  ONLINE       0     0     0
>>> 	    da3p1   ONLINE       0     0     0
>>> 	    da4p1   ONLINE       0     0     0
>>> 	    da5p1   ONLINE       0     0     0
>>> 	logs
>>> 	  da1p1     FAULTED      3     0     0  too many errors
>>> 	cache
>>> 	  da1p2     FAULTED      3     0     0  too many errors
>>> 	spares
>>> 	  da2p1     AVAIL
>>>
>>> errors: No known data errors
>>>
>>> ----
>>>
>>> The problems arise on all 3 machines all all SSDs nearly daily. So I
>> highly suspect
>>> a software issue. Has anyone an idea what's going on and what I can do
>> to solve
>>> this problems? More information can be provided if necessary.
>>>
>>> Regards,
>>> Yamagi
>>>
>>> --
>>> Homepage:  www.yamagi.org
>>> XMPP:      yamagi@yamagi.org
>>> GnuPG/GPG: 0xEFBCCBCB
>>> _______________________________________________
>>> freebsd-scsi@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
>>> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"
>


From owner-freebsd-scsi@freebsd.org  Tue Jul  7 18:30:50 2015
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 90B42994428;
 Tue,  7 Jul 2015 18:30:50 +0000 (UTC)
 (envelope-from rdarbha@juniper.net)
Received: from na01-bl2-obe.outbound.protection.outlook.com
 (mail-bl2on0105.outbound.protection.outlook.com [65.55.169.105])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits))
 (Client CN "mail.protection.outlook.com",
 Issuer "MSIT Machine Auth CA 2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 010AB1F28;
 Tue,  7 Jul 2015 18:30:49 +0000 (UTC)
 (envelope-from rdarbha@juniper.net)
Received: from DM2PR0501MB1150.namprd05.prod.outlook.com (10.160.245.152) by
 DM2PR0501MB1151.namprd05.prod.outlook.com (10.160.245.153) with Microsoft
 SMTP Server (TLS) id 15.1.201.16; Tue, 7 Jul 2015 18:30:41 +0000
Received: from DM2PR0501MB1150.namprd05.prod.outlook.com ([10.160.245.152]) by
 DM2PR0501MB1150.namprd05.prod.outlook.com ([10.160.245.152]) with
 mapi id 15.01.0201.000; Tue, 7 Jul 2015 18:30:41 +0000
From: Raviprakash Darbha <rdarbha@juniper.net>
To: "freebsd-scsi@freebsd.org" <freebsd-scsi@freebsd.org>,
 "freebsd-geom@freebsd.org" <freebsd-geom@freebsd.org>
CC: Raviprakash Darbha <rdarbha@juniper.net>
Subject: questions about camcontrol eject
Thread-Topic: questions about camcontrol eject
Thread-Index: AQHQuOMHih5nUrGIREWDx+FWI6Z8KQ==
Date: Tue, 7 Jul 2015 18:30:41 +0000
Message-ID: <FE8D9CAE-003A-43AA-A7CD-84B4372243C7@juniper.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
authentication-results: freebsd.org; dkim=none (message not signed)
 header.d=none;
x-ms-exchange-messagesentrepresentingtype: 1
x-originating-ip: [66.129.239.14]
x-microsoft-exchange-diagnostics: 1; DM2PR0501MB1151;
 5:G//G3T20fMKE7BG/ESagbBtUs5sLap+IGaTdwTbiwtemfHgARKC8uRV/VZQUK5FASwMZ6/rta/87LnMG8WE9lY8q3BB81KJpgYyJMKSiEXEuLPnM47KqP3dqubUixP0rklsKr0pXSNkxefVR80x+4w==;
 24:3FsK8Ba/EHQmABNIjxLjiO2KK14hjCyujOTd7+PKviOeW3LlwfA28SDCJIn/z7WFovWQHwHKMp+34ZAtJBqPJZJHVmNvSQOWqtUJb8yjlDU=;
 20:wZDiY42HqZsHXpgzbqEB2ERJ+DGpD01DUjn+0oCsFvPGx8I3vvZYiTmFYMMWZqxuJ4o8Uq29hVDNTe44mRMbQA==
x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DM2PR0501MB1151;
x-microsoft-antispam-prvs: <DM2PR0501MB1151022AAAD426E00FEE7C60BA920@DM2PR0501MB1151.namprd05.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:;
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0;
 RULEID:(601004)(5005006)(3002001); SRVR:DM2PR0501MB1151; BCL:0; PCL:0; RULEID:;
 SRVR:DM2PR0501MB1151; 
x-forefront-prvs: 0630013541
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10019020)(6009001)(53754006)(46102003)(36756003)(229853001)(77156002)(62966003)(54356999)(50986999)(92566002)(16236675004)(77096005)(2656002)(102836002)(122556002)(83716003)(40100003)(2900100001)(99286002)(450100001)(2501003)(87936001)(19580395003)(86362001)(106116001)(5001770100001)(107886002)(5001960100002)(189998001)(82746002)(33656002)(66066001)(5002640100001)(158833001)(4001430100001)(104396002);
 DIR:OUT; SFP:1102; SCL:1; SRVR:DM2PR0501MB1151;
 H:DM2PR0501MB1150.namprd05.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en;
MIME-Version: 1.0
X-OriginatorOrg: juniper.net
X-MS-Exchange-CrossTenant-originalarrivaltime: 07 Jul 2015 18:30:41.6811 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: bea78b3c-4cdb-4130-854a-1d193232e5f4
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM2PR0501MB1151
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2015 18:30:50 -0000

Hello All

I am trying to get cam control eject working on my router with 2 drives for=
 sometime and have some observations from the code.

While allocating memory for ccb we either have a malloc option or a memory =
pool. In the eject case we choose the memory pool as its low priority.
After getting the ccb and setting the relevant fields it is submitted to th=
e ata_action routine but then it fails there returning an error code .

//Code snippets
from sys/cam/scsi/scsi-pass.c


                /*
                 * Non-immediate CCBs need a CCB from the per-device pool
                 * of CCBs, which is scheduled by the transport layer.
                 * Immediate CCBs and user-supplied CCBs should just be
                 * malloced.
                 */
                if ((inccb->ccb_h.func_code & XPT_FC_QUEUED)
                 && ((inccb->ccb_h.func_code & XPT_FC_USER_CCB) =3D=3D 0)) =
{
                        ccb =3D cam_periph_getccb(periph, priority);
                        ccb_malloced =3D 0;

                } else {
                        ccb =3D xpt_alloc_ccb_nowait();

                        if (ccb !=3D NULL)
                                xpt_setup_ccb(&ccb->ccb_h, periph->path,
                                              priority);
                        ccb_malloced =3D 1;

                }

                if (ccb =3D=3D NULL) {
                        xpt_print(periph->path, "unable to allocate CCB\n")=
;
                        error =3D ENOMEM;
                        break;
                }

                error =3D passsendccb(periph, ccb, inccb);


from sys/cam/ata/ata/xpt.c

  {
                struct cam_ed *device;
                u_int   maxlen =3D 0;

                device =3D start_ccb->ccb_h.path->device;
                if (device->protocol =3D=3D PROTO_SCSI &&
                    (device->flags & CAM_DEV_IDENTIFY_DATA_VALID)) {
                        uint16_t p =3D
                            device->ident_data.config & ATA_PROTO_MASK;

                        maxlen =3D
                            (device->ident_data.config =3D=3D ATA_PROTO_CFA=
) ? 0 :
                            (p =3D=3D ATA_PROTO_ATAPI_16) ? 16 :
                            (p =3D=3D ATA_PROTO_ATAPI_12) ? 12 : 0;
///// maxlen is still set to 0.
               }
                if (start_ccb->csio.cdb_len > maxlen) {
                        start_ccb->ccb_h.status =3D CAM_REQ_INVALID;
                        xpt_done(start_ccb);
                        break;
///// hence returning from  here.
                }
                xpt_action_default(start_ccb);
                break;
        }


My question is if this is a code path thats expected to run this way in whi=
ch case I am missing something or is this a bug ? In the later case I am as=
suming the ccb_hdr is not set correctly in case we get the ccb from the poo=
l so i m considering to set it by calling  xpt_ccb_setup in that case too t=
o get the right values in the device structure.

Any help is greatly appreciated here. Please let me know if more informatio=
n is needed.

Thanks
Ravi

From owner-freebsd-scsi@freebsd.org  Wed Jul  8 05:45:25 2015
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6BE07995651
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Wed,  8 Jul 2015 05:45:25 +0000 (UTC)
 (envelope-from lists@yamagi.org)
Received: from mail1.yamagi.org (yugo.yamagi.org [212.48.122.103])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2EE151F48
 for <freebsd-scsi@freebsd.org>; Wed,  8 Jul 2015 05:45:24 +0000 (UTC)
 (envelope-from lists@yamagi.org)
Received: from p4fed1304.dip0.t-ipconnect.de ([79.237.19.4]
 helo=kosei.home.yamagi.org.dhcp.yamagi.org)
 by mail1.yamagi.org with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256)
 (Exim 4.85 (FreeBSD)) (envelope-from <lists@yamagi.org>)
 id 1ZCiAY-000GCZ-6Y; Wed, 08 Jul 2015 07:45:19 +0200
Date: Wed, 8 Jul 2015 07:45:12 +0200
From: Yamagi Burmeister <lists@yamagi.org>
To: stephen.mcconnell@avagotech.com
Cc: freebsd-scsi@freebsd.org
Subject: Re: Device timeouts(?) with LSI SAS3008 on mpr(4)
Message-Id: <20150708074512.e676c8a9a5b7c6d56d357a02@yamagi.org>
In-Reply-To: <9426ced85d7def424e106fdefd7448ae@mail.gmail.com>
References: <20150707132416.71b44c90f7f4cd6014a304b2@yamagi.org>
 <9426ced85d7def424e106fdefd7448ae@mail.gmail.com>
X-Mailer: Sylpheed 3.4.2 (GTK+ 2.24.28; x86_64-unknown-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Jul 2015 05:45:25 -0000

Good morning,
it wasn't the power managment. This night the errors occured on da6,
da7 and da9. This is the same machine as yesterday:

Jul  8 05:06:21 mars kernel: (noperiph:mpr1:0:4294967295:0): SMID 83 Aborting command 0xfffffe0001a684e0
Jul  8 05:06:21 mars kernel: (da7:mpr1:0:1:0): READ(10). CDB: 28 00 48 0a 44 98 00 00 08 00 length 4096 SMID 556 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 05:06:21 mars kernel: (da7:mpr1:0:1:0): READ(10). CDB: 28 00 48 10 bb a8 00 00 20 00 length 16384 SMID 745 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 05:06:21 mars kernel: (da7:mpr1:0:1:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 680 term(da7:mpr1:0:1:0): WRITE(10). CDB: 2a 00 56 1b 1c 38 00 00 08 00 
Jul  8 05:06:21 mars kernel: inated ioc 804b scsi 0 state c xfer 0
Jul  8 05:06:21 mars kernel: (da7:mpr1:0:1:0): CAM status: Command timeout
Jul  8 05:06:21 mars kernel: (da7:mpr1:0:1:0): Retrying command
Jul  8 05:06:21 mars kernel: mpr1: log_info(0x31110e00): originator(PL), code(0x11), sub_code(0x0e00)
Jul  8 05:06:21 mars kernel: (da7:mpr1:0:1:0): READ(10). CDB: 28 00 48 0a 44 98 00 00 08 00 length 4096 SMID 696 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 05:06:21 mars kernel: mpr1: log_info(0x31110e00): originator(PL), code(0x11), sub_code(0x0e00)
Jul  8 05:06:21 mars kernel: (da7:mpr1:0:1:0): READ(10). CDB: 28 00 48 10 bb a8 00 00 20 00 length 16384 SMID 517 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 05:06:21 mars kernel: mpr1: log_info(0x31110e00): originator(PL), code(0x11), sub_code(0x0e00)
Jul  8 05:06:21 mars kernel: (da7:mpr1:0:1:0): WRITE(10). CDB: 2a 00 56 1b 1c 38 00 00 08 00 length 4096 SMID 905 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 05:06:21 mars kernel: mpr1: log_info(0x31110e00): originator(PL), code(0x11), sub_code(0x0e00)
Jul  8 05:06:21 mars kernel: (da7:mpr1:0:1:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 290 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 05:06:22 mars kernel: (da7:mpr1:0:1:0): READ(10). CDB: 28 00 48 0a 44 98 00 00 08 00 
Jul  8 05:06:22 mars kernel: (da7:mpr1:0:1:0): CAM status: SCSI Status Error
Jul  8 05:06:22 mars kernel: (da7:mpr1:0:1:0): SCSI status: Check Condition
Jul  8 05:06:22 mars kernel: (da7:mpr1:0:1:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Jul  8 05:06:22 mars kernel: (da7:mpr1:0:1:0): Retrying command (per sense data)

Jul  8 06:33:26 mars kernel: (noperiph:mpr1:0:4294967295:0): SMID 84 Aborting command 0xfffffe0001a32fc0
Jul  8 06:33:27 mars kernel: (da9:mpr1:0:3:0): READ(10). CDB: 28 00 48 0f bc 90 00 00 20 00 length 16384 SMID 703 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 06:33:27 mars kernel: (da9:mpr1:0:3:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 719 term(da9:mpr1:0:3:0): WRITE(10). CDB: 2a 00 48 3c d0 58 00 00 10 00 
Jul  8 06:33:27 mars kernel: inated ioc 804b scsi 0 state c xfer 0
Jul  8 06:33:27 mars kernel: (da9:mpr1:0:3:0): CAM status: Command timeout
Jul  8 06:33:27 mars kernel: (da9:mpr1:0:3:0): Retrying command
Jul  8 06:33:27 mars kernel: mpr1: log_info(0x31110e00): originator(PL), code(0x11), sub_code(0x0e00)
Jul  8 06:33:27 mars kernel: (da9:mpr1:0:3:0): READ(10). CDB: 28 00 48 0f bc 90 00 00 20 00 length 16384 SMID 851 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 06:33:27 mars kernel: mpr1: log_info(0x31110e00): originator(PL), code(0x11), sub_code(0x0e00)
Jul  8 06:33:27 mars kernel: (da9:mpr1:0:3:0): WRITE(10). CDB: 2a 00 48 3c d0 58 00 00 10 00 length 8192 SMID 576 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 06:33:27 mars kernel: mpr1: log_info(0x31110e00): originator(PL), code(0x11), sub_code(0x0e00)
Jul  8 06:33:27 mars kernel: (da9:mpr1:0:3:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 854 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 06:33:28 mars kernel: (da9:mpr1:0:3:0): READ(10). CDB: 28 00 48 0f bc 90 00 00 20 00 
Jul  8 06:33:28 mars kernel: (da9:mpr1:0:3:0): CAM status: SCSI Status Error
Jul  8 06:33:28 mars kernel: (da9:mpr1:0:3:0): SCSI status: Check Condition
Jul  8 06:33:28 mars kernel: (da9:mpr1:0:3:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Jul  8 06:33:28 mars kernel: (da9:mpr1:0:3:0): Retrying command (per sense data)

Jul  8 06:35:10 mars kernel: (noperiph:mpr1:0:4294967295:0): SMID 85 Aborting command 0xfffffe0001a70c10
Jul  8 06:35:10 mars kernel: (da6:mpr1:0:0:0): READ(10). CDB: 28 00 48 30 4a 40 00 00 18 00 length 12288 SMID 541 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 06:35:10 mars kernel: (da6:mpr1:0:0:0): WRITE(10). CDB: 2a 00 48 59 82 e8 00 00 10 00 length 8192 SMID 467 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 06:35:10 mars kernel: (da6:mpr1:0:0:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 
Jul  8 06:35:10 mars kernel: (da6:mpr1:0:0:0): CAM status: Command timeout
Jul  8 06:35:10 mars kernel: (da6:mpr1:0:0:0): Retrying command
Jul  8 06:35:10 mars kernel: mpr1: log_info(0x31110e00): originator(PL), code(0x11), sub_code(0x0e00)
Jul  8 06:35:10 mars kernel: (da6:mpr1:0:0:0): READ(10). CDB: 28 00 48 30 4a 40 00 00 18 00 length 12288 SMID 870 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 06:35:10 mars kernel: mpr1: log_info(0x31110e00): originator(PL), code(0x11), sub_code(0x0e00)
Jul  8 06:35:10 mars kernel: (da6:mpr1:0:0:0): WRITE(10). CDB: 2a 00 48 59 82 e8 00 00 10 00 length 8192 SMID 478 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 06:35:10 mars kernel: mpr1: log_info(0x31110e00): originator(PL), code(0x11), sub_code(0x0e00)
Jul  8 06:35:10 mars kernel: (da6:mpr1:0:0:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 764 terminated ioc 804b scsi 0 state c xfer 0
Jul  8 06:35:11 mars kernel: (da6:mpr1:0:0:0): READ(10). CDB: 28 00 48 30 4a 40 00 00 18 00 
Jul  8 06:35:11 mars kernel: (da6:mpr1:0:0:0): CAM status: SCSI Status Error
Jul  8 06:35:11 mars kernel: (da6:mpr1:0:0:0): SCSI status: Check Condition
Jul  8 06:35:11 mars kernel: (da6:mpr1:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Jul  8 06:35:11 mars kernel: (da6:mpr1:0:0:0): Retrying command (per sense data)

Regards,
Yamagi


On Tue, 7 Jul 2015 09:37:22 -0600
Stephen Mcconnell <stephen.mcconnell@avagotech.com> wrote:

> Hi Yamagi,
> 
> I see two drives that are having problems.  Are there others?  Can you try
> to remove those drives and let me know what happens.  To me, it actually
> looks like those drives could be faulty.
> 
> Steve
> 
> > -----Original Message-----
> > From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd-
> > scsi@freebsd.org] On Behalf Of Yamagi Burmeister
> > Sent: Tuesday, July 07, 2015 5:24 AM
> > To: freebsd-scsi@freebsd.org
> > Subject: Device timeouts(?) with LSI SAS3008 on mpr(4)
> >
> > Hello,
> > I've got 3 new Supermicro servers based upon the X10DRi-LN4+ platform.
> > Each server is equiped with 2 LSI SAS9300-8i-SQL SAS adapters. Each
> adapter
> > serves 8 Intel DC S3700 SSDs. Operating system is 10.1-STABLE as of
> r283938 on
> > 2 servers and r285196 on the last one.
> >
> > The controller identify themself as:
> >
> > ----
> >
> > mpr0: <Avago Technologies (LSI) SAS3008> port 0x6000-0x60ff mem
> > 0xc7240000-0xc724ffff,0xc7200000-0xc723ffff irq 32 at device 0.0 on
> > pci2 mpr0: IOCFacts  : MsgVersion: 0x205
> >         HeaderVersion: 0x2300
> >         IOCNumber: 0
> >         IOCExceptions: 0x0
> >         MaxChainDepth: 128
> >         NumberOfPorts: 1
> >         RequestCredit: 10240
> >         ProductID: 0x2221
> >         IOCRequestFrameSize: 32
> >         MaxInitiators: 32
> >         MaxTargets: 1024
> >         MaxSasExpanders: 42
> >         MaxEnclosures: 43
> >         HighPriorityCredit: 128
> >         MaxReplyDescriptorPostQueueDepth: 65504
> >         ReplyFrameSize: 32
> >         MaxVolumes: 0
> >         MaxDevHandle: 1106
> >         MaxPersistentEntries: 128
> > mpr0: Firmware: 08.00.00.00, Driver: 09.255.01.00-fbsd
> > mpr0: IOCCapabilities:
> >
> 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex
> > ,HostDisc>
> >
> > ----
> >
> > 08.00.00.00 is the last available firmware.
> >
> >
> > Since day one 'dmesg' is cluttered with CAM errors:
> >
> > ----
> >
> > mpr1: Sending reset from mprsas_send_abort for target ID 5
> >         (da11:mpr1:0:5:0): WRITE(10). CDB: 2a 00 4c 15 1f 88 00 00 08
> > 00 length 4096 SMID 554 terminated ioc 804b scsi 0 state c xfer 0
> > (da11:mpr1:0:5:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00
> > 01 00 00 00 00 00 00 40 06 00 length 512 SMID 506 ter(da11:mpr1:0:5:0):
> > READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00 minated ioc 804b scsi 0
> state c
> > xfer 0 (da11:mpr1:0:5:0): CAM status: Command timeout mpr1:
> > (da11:Unfreezing devq for target ID 5 mpr1:0:5:0): Retrying command
> > (da11:mpr1:0:5:0): READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00
> > (da11:mpr1:0:5:0): CAM status: SCSI Status Error (da11:mpr1:0:5:0):
> > SCSI status: Check Condition (da11:mpr1:0:5:0): SCSI sense: UNIT
> ATTENTION
> > asc:29,0 (Power on, reset, or bus device reset occurred)
> > (da11:mpr1:0:5:0): Retrying command (per sense data) (da11:mpr1:0:5:0):
> > READ(10). CDB: 28 00 4c 22 b5 b8 00 00 18 00 (da11:mpr1:0:5:0): CAM
> > status: SCSI Status Error (da11:mpr1:0:5:0): SCSI status: Check
> Condition
> > (da11:mpr1:0:5:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset,
> or
> > bus device reset occurred) (da11:mpr1:0:5:0): Retrying command (per
> sense
> > data) (noperiph:mpr1:0:4294967295:0): SMID 2 Aborting command
> > 0xfffffe0001601a30
> >
> > mpr1: Sending reset from mprsas_send_abort for target ID 2
> >         (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00 00 30 00
> length
> > 24576 SMID 898 terminated ioc 804b scsi 0 state c xfer 0
> > (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 77 cc e0 00 00 18 00 length
> > 12288 SMID 604 terminated ioc 804b scsi 0 state c xfer 0 mpr1:
> > Unfreezing devq for target ID 2 (da8:mpr1:0:2:0): ATA COMMAND PASS
> > THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
> > (da8:mpr1:0:2:0): CAM status: Command timeout (da8:mpr1:0:2:0):
> > Retrying command (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00
> > 00 30 00 (da8:mpr1:0:2:0): CAM status: SCSI Status Error
> > (da8:mpr1:0:2:0): SCSI status: Check Condition (da8:mpr1:0:2:0): SCSI
> > sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset
> > occurred) (da8:mpr1:0:2:0): Retrying command (per sense data)
> > (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 41 3d 08 00 00 10 00
> > (da8:mpr1:0:2:0): CAM status: SCSI Status Error (da8:mpr1:0:2:0): SCSI
> > status: Check Condition (da8:mpr1:0:2:0): SCSI sense: UNIT ATTENTION
> > asc:29,0 (Power on, reset, or bus device reset occurred)
> > (da8:mpr1:0:2:0): Retrying command (per sense data)
> > (noperiph:mpr1:0:4294967295:0): SMID 3 Aborting command
> > 0xfffffe000160b660
> >
> > ----
> >
> > ZFS doesn't like this and sees read errors or even write errors. In
> extreme cases
> > the device is marked as FAULTED:
> >
> > ----
> >
> >   pool: examplepool
> >  state: DEGRADED
> > status: One or more devices are faulted in response to persistent
> errors.
> > Sufficient replicas exist for the pool to continue functioning in a
> degraded state.
> > action: Replace the faulted device, or use 'zpool clear' to mark the
> device
> > repaired.
> >   scan: none requested
> > config:
> >
> > 	NAME        STATE     READ WRITE CKSUM
> > 	examplepool DEGRADED     0     0     0
> > 	  raidz1-0  ONLINE       0     0     0
> > 	    da3p1   ONLINE       0     0     0
> > 	    da4p1   ONLINE       0     0     0
> > 	    da5p1   ONLINE       0     0     0
> > 	logs
> > 	  da1p1     FAULTED      3     0     0  too many errors
> > 	cache
> > 	  da1p2     FAULTED      3     0     0  too many errors
> > 	spares
> > 	  da2p1     AVAIL
> >
> > errors: No known data errors
> >
> > ----
> >
> > The problems arise on all 3 machines all all SSDs nearly daily. So I
> highly suspect
> > a software issue. Has anyone an idea what's going on and what I can do
> to solve
> > this problems? More information can be provided if necessary.
> >
> > Regards,
> > Yamagi
> >
> > --
> > Homepage:  www.yamagi.org
> > XMPP:      yamagi@yamagi.org
> > GnuPG/GPG: 0xEFBCCBCB
> > _______________________________________________
> > freebsd-scsi@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"


-- 
Homepage:  www.yamagi.org
XMPP:      yamagi@yamagi.org
GnuPG/GPG: 0xEFBCCBCB

From owner-freebsd-scsi@freebsd.org  Wed Jul  8 05:47:02 2015
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D503399570A
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Wed,  8 Jul 2015 05:47:02 +0000 (UTC)
 (envelope-from lists@yamagi.org)
Received: from mail1.yamagi.org (yugo.yamagi.org [212.48.122.103])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 976F21FBF
 for <freebsd-scsi@freebsd.org>; Wed,  8 Jul 2015 05:47:01 +0000 (UTC)
 (envelope-from lists@yamagi.org)
Received: from p4fed1304.dip0.t-ipconnect.de ([79.237.19.4]
 helo=kosei.home.yamagi.org.dhcp.yamagi.org)
 by mail1.yamagi.org with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256)
 (Exim 4.85 (FreeBSD)) (envelope-from <lists@yamagi.org>)
 id 1ZCiC9-000GEB-N7; Wed, 08 Jul 2015 07:46:59 +0200
Date: Wed, 8 Jul 2015 07:46:52 +0200
From: Yamagi Burmeister <lists@yamagi.org>
To: killing@multiplay.co.uk
Cc: freebsd-scsi@freebsd.org
Subject: Re: Device timeouts(?) with LSI SAS3008 on mpr(4)
Message-Id: <20150708074652.07a815e6aa08526d569f3077@yamagi.org>
In-Reply-To: <559C0184.4050102@multiplay.co.uk>
References: <20150707132416.71b44c90f7f4cd6014a304b2@yamagi.org>
 <9426ced85d7def424e106fdefd7448ae@mail.gmail.com>
 <20150707183135.2c3f5aa45696b55a17e2f87f@yamagi.org>
 <559C0184.4050102@multiplay.co.uk>
X-Mailer: Sylpheed 3.4.2 (GTK+ 2.24.28; x86_64-unknown-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Jul 2015 05:47:02 -0000

Hello Steven,
since the issue occures on all 3 servers it's at least unlikely. But
I'll see what I can do.

Regards,
Yamagi

On Tue, 7 Jul 2015 17:42:44 +0100
Steven Hartland <killing@multiplay.co.uk> wrote:

> Have you eliminated the midplane / cabling as the issue as that's very 
> common.
> 
> On 07/07/2015 17:31, Yamagi Burmeister wrote:
> > Hello Stephen,
> > I'm seeing those errors on all 3 servers and on all 16 devices. The 2
> > dmesg entries were just an example. It seems to be random were they
> > occure. Maybe the second controller mps1 has a higher chance then
> > mps0, but I'm not sure.
> >
> > My co-worker suspected FreeBSDs power management. On on of the servers
> > I forced c-states to C1 and deactivated powerd. In the last 2 hours no
> > new errors arose but it's far too early to draw conclusions.
> >
> > Regards,
> > Yamagi
> >
> > On Tue, 7 Jul 2015 09:37:22 -0600
> > Stephen Mcconnell <stephen.mcconnell@avagotech.com> wrote:
> >
> >> Hi Yamagi,
> >>
> >> I see two drives that are having problems.  Are there others?  Can you try
> >> to remove those drives and let me know what happens.  To me, it actually
> >> looks like those drives could be faulty.
> >>
> >> Steve
> >>
> >>> -----Original Message-----
> >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd-
> >>> scsi@freebsd.org] On Behalf Of Yamagi Burmeister
> >>> Sent: Tuesday, July 07, 2015 5:24 AM
> >>> To: freebsd-scsi@freebsd.org
> >>> Subject: Device timeouts(?) with LSI SAS3008 on mpr(4)
> >>>
> >>> Hello,
> >>> I've got 3 new Supermicro servers based upon the X10DRi-LN4+ platform.
> >>> Each server is equiped with 2 LSI SAS9300-8i-SQL SAS adapters. Each
> >> adapter
> >>> serves 8 Intel DC S3700 SSDs. Operating system is 10.1-STABLE as of
> >> r283938 on
> >>> 2 servers and r285196 on the last one.
> >>>
> >>> The controller identify themself as:
> >>>
> >>> ----
> >>>
> >>> mpr0: <Avago Technologies (LSI) SAS3008> port 0x6000-0x60ff mem
> >>> 0xc7240000-0xc724ffff,0xc7200000-0xc723ffff irq 32 at device 0.0 on
> >>> pci2 mpr0: IOCFacts  : MsgVersion: 0x205
> >>>          HeaderVersion: 0x2300
> >>>          IOCNumber: 0
> >>>          IOCExceptions: 0x0
> >>>          MaxChainDepth: 128
> >>>          NumberOfPorts: 1
> >>>          RequestCredit: 10240
> >>>          ProductID: 0x2221
> >>>          IOCRequestFrameSize: 32
> >>>          MaxInitiators: 32
> >>>          MaxTargets: 1024
> >>>          MaxSasExpanders: 42
> >>>          MaxEnclosures: 43
> >>>          HighPriorityCredit: 128
> >>>          MaxReplyDescriptorPostQueueDepth: 65504
> >>>          ReplyFrameSize: 32
> >>>          MaxVolumes: 0
> >>>          MaxDevHandle: 1106
> >>>          MaxPersistentEntries: 128
> >>> mpr0: Firmware: 08.00.00.00, Driver: 09.255.01.00-fbsd
> >>> mpr0: IOCCapabilities:
> >>>
> >> 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex
> >>> ,HostDisc>
> >>>
> >>> ----
> >>>
> >>> 08.00.00.00 is the last available firmware.
> >>>
> >>>
> >>> Since day one 'dmesg' is cluttered with CAM errors:
> >>>
> >>> ----
> >>>
> >>> mpr1: Sending reset from mprsas_send_abort for target ID 5
> >>>          (da11:mpr1:0:5:0): WRITE(10). CDB: 2a 00 4c 15 1f 88 00 00 08
> >>> 00 length 4096 SMID 554 terminated ioc 804b scsi 0 state c xfer 0
> >>> (da11:mpr1:0:5:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00
> >>> 01 00 00 00 00 00 00 40 06 00 length 512 SMID 506 ter(da11:mpr1:0:5:0):
> >>> READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00 minated ioc 804b scsi 0
> >> state c
> >>> xfer 0 (da11:mpr1:0:5:0): CAM status: Command timeout mpr1:
> >>> (da11:Unfreezing devq for target ID 5 mpr1:0:5:0): Retrying command
> >>> (da11:mpr1:0:5:0): READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00
> >>> (da11:mpr1:0:5:0): CAM status: SCSI Status Error (da11:mpr1:0:5:0):
> >>> SCSI status: Check Condition (da11:mpr1:0:5:0): SCSI sense: UNIT
> >> ATTENTION
> >>> asc:29,0 (Power on, reset, or bus device reset occurred)
> >>> (da11:mpr1:0:5:0): Retrying command (per sense data) (da11:mpr1:0:5:0):
> >>> READ(10). CDB: 28 00 4c 22 b5 b8 00 00 18 00 (da11:mpr1:0:5:0): CAM
> >>> status: SCSI Status Error (da11:mpr1:0:5:0): SCSI status: Check
> >> Condition
> >>> (da11:mpr1:0:5:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset,
> >> or
> >>> bus device reset occurred) (da11:mpr1:0:5:0): Retrying command (per
> >> sense
> >>> data) (noperiph:mpr1:0:4294967295:0): SMID 2 Aborting command
> >>> 0xfffffe0001601a30
> >>>
> >>> mpr1: Sending reset from mprsas_send_abort for target ID 2
> >>>          (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00 00 30 00
> >> length
> >>> 24576 SMID 898 terminated ioc 804b scsi 0 state c xfer 0
> >>> (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 77 cc e0 00 00 18 00 length
> >>> 12288 SMID 604 terminated ioc 804b scsi 0 state c xfer 0 mpr1:
> >>> Unfreezing devq for target ID 2 (da8:mpr1:0:2:0): ATA COMMAND PASS
> >>> THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
> >>> (da8:mpr1:0:2:0): CAM status: Command timeout (da8:mpr1:0:2:0):
> >>> Retrying command (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00
> >>> 00 30 00 (da8:mpr1:0:2:0): CAM status: SCSI Status Error
> >>> (da8:mpr1:0:2:0): SCSI status: Check Condition (da8:mpr1:0:2:0): SCSI
> >>> sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset
> >>> occurred) (da8:mpr1:0:2:0): Retrying command (per sense data)
> >>> (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 41 3d 08 00 00 10 00
> >>> (da8:mpr1:0:2:0): CAM status: SCSI Status Error (da8:mpr1:0:2:0): SCSI
> >>> status: Check Condition (da8:mpr1:0:2:0): SCSI sense: UNIT ATTENTION
> >>> asc:29,0 (Power on, reset, or bus device reset occurred)
> >>> (da8:mpr1:0:2:0): Retrying command (per sense data)
> >>> (noperiph:mpr1:0:4294967295:0): SMID 3 Aborting command
> >>> 0xfffffe000160b660
> >>>
> >>> ----
> >>>
> >>> ZFS doesn't like this and sees read errors or even write errors. In
> >> extreme cases
> >>> the device is marked as FAULTED:
> >>>
> >>> ----
> >>>
> >>>    pool: examplepool
> >>>   state: DEGRADED
> >>> status: One or more devices are faulted in response to persistent
> >> errors.
> >>> Sufficient replicas exist for the pool to continue functioning in a
> >> degraded state.
> >>> action: Replace the faulted device, or use 'zpool clear' to mark the
> >> device
> >>> repaired.
> >>>    scan: none requested
> >>> config:
> >>>
> >>> 	NAME        STATE     READ WRITE CKSUM
> >>> 	examplepool DEGRADED     0     0     0
> >>> 	  raidz1-0  ONLINE       0     0     0
> >>> 	    da3p1   ONLINE       0     0     0
> >>> 	    da4p1   ONLINE       0     0     0
> >>> 	    da5p1   ONLINE       0     0     0
> >>> 	logs
> >>> 	  da1p1     FAULTED      3     0     0  too many errors
> >>> 	cache
> >>> 	  da1p2     FAULTED      3     0     0  too many errors
> >>> 	spares
> >>> 	  da2p1     AVAIL
> >>>
> >>> errors: No known data errors
> >>>
> >>> ----
> >>>
> >>> The problems arise on all 3 machines all all SSDs nearly daily. So I
> >> highly suspect
> >>> a software issue. Has anyone an idea what's going on and what I can do
> >> to solve
> >>> this problems? More information can be provided if necessary.
> >>>
> >>> Regards,
> >>> Yamagi
> >>>
> >>> --
> >>> Homepage:  www.yamagi.org
> >>> XMPP:      yamagi@yamagi.org
> >>> GnuPG/GPG: 0xEFBCCBCB
> >>> _______________________________________________
> >>> freebsd-scsi@freebsd.org mailing list
> >>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> >>> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"
> >
> 
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"


-- 
Homepage:  www.yamagi.org
XMPP:      yamagi@yamagi.org
GnuPG/GPG: 0xEFBCCBCB

From owner-freebsd-scsi@freebsd.org  Wed Jul  8 07:35:21 2015
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 88B5899698F
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Wed,  8 Jul 2015 07:35:21 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: from mail-wg0-f49.google.com (mail-wg0-f49.google.com [74.125.82.49])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 1ED371D25
 for <freebsd-scsi@freebsd.org>; Wed,  8 Jul 2015 07:35:20 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: by wgck11 with SMTP id k11so187741550wgc.0
 for <freebsd-scsi@freebsd.org>; Wed, 08 Jul 2015 00:35:18 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:cc:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-type
 :content-transfer-encoding;
 bh=Bv1GYo55/ZOiCYSpeqLXrv+L7UWzkM6EcxbbJKWYUbc=;
 b=IqMzuvy3zvf1BRy47qiyOzSKI7RwDgIvsk5dE4QF6Slu1W/Eo7mfiBSsseD10hJCiu
 b8cNlsQCB5ftD8vW57wBp+yYFOyP6HYknGfodI2FgedtjYbgzjobUl8uFo8u7Nv1JfT/
 ovMzpvHm8n+I0egWj8LfQlo5nPBtavBcbvu1MU7peqNgoXVyg87MIQE+9mgvAlIF6ZWh
 khP3LMjE04gHV0awFz57pc2kNqPH+E3Fl5YEoZicFK2AalEvzAsmlMH/ysnxBwsciC+M
 abv8ZJfsOu+ux+mTbS9U5GLlZG2de0qTeUE+aZJXX1o3U/oUN7TZmCCrZTSSH2XGFMx2
 Nwnw==
X-Gm-Message-State: ALoCoQkQ6fHpz42Ah2N/Ga1XVaqOf7IK4Wn3Sh6jDJJZn7VImFfXaMPtsKHUOpetAGZovidwmSgX
X-Received: by 10.180.188.48 with SMTP id fx16mr71531067wic.35.1436340918422; 
 Wed, 08 Jul 2015 00:35:18 -0700 (PDT)
Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk.
 [82.69.141.170])
 by smtp.gmail.com with ESMTPSA id c2sm1945437wjf.18.2015.07.08.00.35.17
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 08 Jul 2015 00:35:17 -0700 (PDT)
Subject: Re: Device timeouts(?) with LSI SAS3008 on mpr(4)
To: Yamagi Burmeister <lists@yamagi.org>
References: <20150707132416.71b44c90f7f4cd6014a304b2@yamagi.org>
 <9426ced85d7def424e106fdefd7448ae@mail.gmail.com>
 <20150707183135.2c3f5aa45696b55a17e2f87f@yamagi.org>
 <559C0184.4050102@multiplay.co.uk>
 <20150708074652.07a815e6aa08526d569f3077@yamagi.org>
Cc: freebsd-scsi@freebsd.org
From: Steven Hartland <killing@multiplay.co.uk>
Message-ID: <559CD2B3.7000404@multiplay.co.uk>
Date: Wed, 8 Jul 2015 08:35:15 +0100
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101
 Thunderbird/38.0.1
MIME-Version: 1.0
In-Reply-To: <20150708074652.07a815e6aa08526d569f3077@yamagi.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Jul 2015 07:35:21 -0000

Actually not, it could indicate a design problem with the midplane / 
backplane is the cause of the issue.

We've had a number of Supermicro and Dell chassis when used in 
combination with 6Gbps+ devices particularly SSD's that exhibit timeouts 
like you describe, all turned out to be a backplane issue.

We proved this in by connecting the drives direct to the controller with 
high quality cables eliminating the hotswap backplane, after which the 
timeouts stopped.

This is a PITA to test as power is supplied by the hotswap backplane, 
but I wouldn't recommend you look anywhere else till you've eliminated 
this as a potential cause.

     Regards
     Steve

On 08/07/2015 06:46, Yamagi Burmeister wrote:
> Hello Steven,
> since the issue occures on all 3 servers it's at least unlikely. But
> I'll see what I can do.
>
> Regards,
> Yamagi
>
> On Tue, 7 Jul 2015 17:42:44 +0100
> Steven Hartland <killing@multiplay.co.uk> wrote:
>
>> Have you eliminated the midplane / cabling as the issue as that's very
>> common.
>>


From owner-freebsd-scsi@freebsd.org  Wed Jul  8 13:55:34 2015
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 66E50995D9B
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Wed,  8 Jul 2015 13:55:34 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 52D0C1BD8
 for <freebsd-scsi@FreeBSD.org>; Wed,  8 Jul 2015 13:55:34 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t68DtYLH028983
 for <freebsd-scsi@FreeBSD.org>; Wed, 8 Jul 2015 13:55:34 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-scsi@FreeBSD.org
Subject: [Bug 200883] Installing FreeBSD 10.1-RELEASE-amd64-{disk1|dvd1}.iso
 fails to install on Dell C6220, bootonly.iso works
Date: Wed, 08 Jul 2015 13:55:34 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: misc
X-Bugzilla-Version: 10.1-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Some People
X-Bugzilla-Who: bcr@FreeBSD.org
X-Bugzilla-Status: In Progress
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-200883-5312-9YVRVXHRzd@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-200883-5312@https.bugs.freebsd.org/bugzilla/>
References: <bug-200883-5312@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Jul 2015 13:55:34 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200883
--- Comment #4 from Benedict Reuschling <bcr@FreeBSD.org> ---
I just tested it with FreeBSD-10.2-PRERELEASE-amd64-20150625-r284813-disc1.iso
. The same issue as before: the viewer connection gets terminated during the
installation process, ejecting the media in the process. I can reliably
reproduce the issue each time.

Note: I did install the same machine a couple of times with the
FreeBSD-11.0-CURRENT-amd64-r283577-20150526.disc1.iso . In one of these
instances, the viewer crashed as well. But this was only one instance and next
time, the installer completed just fine and I couldn't reproduce the error like
in 10.X. 

We should try to identify which MFC is missing that makes a difference between
10.X and 11-CURRENT and the behaviour I'm experiencing.

-- 
You are receiving this mail because:
You are on the CC list for the bug.