From owner-freebsd-scsi@freebsd.org  Tue Jun  7 19:53:08 2016
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8DFFEB6D843
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Tue,  7 Jun 2016 19:53:08 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: from mail-wm0-x232.google.com (mail-wm0-x232.google.com
 [IPv6:2a00:1450:400c:c09::232])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 087DB1528
 for <freebsd-scsi@freebsd.org>; Tue,  7 Jun 2016 19:53:08 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: by mail-wm0-x232.google.com with SMTP id k204so83570160wmk.0
 for <freebsd-scsi@freebsd.org>; Tue, 07 Jun 2016 12:53:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623;
 h=subject:to:references:from:message-id:date:user-agent:mime-version
 :in-reply-to; bh=OyWE/LDvMML2a8/vBR7DS72Be3W4ajYMeIMfKmubKFc=;
 b=qaoRJQJk4pOrwFH6RMlcq465W+/CNyRbatsuAaD2N6mN43z/pcZhXJQ0GtIm5FC95A
 l2arOrHD9T8JlN6MI2oB+nFOo94W3EbxP2ZjhZpaufw1LewsvRFq6H3OPC3MOUiA9ha+
 FVt7552OWKtfF7TYvMzFAJnDnBZnzZQoaILFR8WmQzf1i8FGWRbQ1+y7WerAh/msB1G+
 ZwFkT0PqiN9Z4ZQRNjDdlzwn8AmitDYxSjv2+5YaoA4fol7PpnBTn3gesRCTjublXD78
 5AGSQjJWt+v5vQNaKeodmPWHuwc7jJN51cOnNXtvmQqRYedKOT7jzNIklhRrCoJYZ6oS
 qBNQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to;
 bh=OyWE/LDvMML2a8/vBR7DS72Be3W4ajYMeIMfKmubKFc=;
 b=kgoRWam3kz2/jpeAhpbCccVOnbwtUh6sEwPnkI9uEVe7GqSvnTebL/CuBf50OBmREP
 iQbQsS3xI8AP/qDHua/BKL5sz1DJ7ybEIva8SIGHO/jGv/1+EHnvudR0IB3LzGZxgFrX
 LMdaxd8s+zYcKR7Mb7iT3onBO1d0J6vIAQRjDLQHLI9QWp64JMdHewbNVORu3Ue2sWVT
 Kc9MsLCOf7VgvropWsi9EcEIBLuPYbsBmYVZtsecBieOxFPp35yqRz0W1bJdybZ8HW31
 TUCCBeceozJO7iw7OylJTcm1lhiPHkKCy3FMO6Ugtdu+6ovWA+mp5CoKD7dQqiZM/CRv
 7Q3Q==
X-Gm-Message-State: ALyK8tJJlke8H9vvETKopxIL3Dl+FjGE+B4tJFUeFza43ENMIGTio0ezcU6oJimIUkyNFX6P
X-Received: by 10.28.26.138 with SMTP id a132mr4425191wma.82.1465329186240;
 Tue, 07 Jun 2016 12:53:06 -0700 (PDT)
Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171])
 by smtp.gmail.com with ESMTPSA id c62sm20884456wmd.1.2016.06.07.12.53.04
 for <freebsd-scsi@freebsd.org> (version=TLSv1/SSLv3 cipher=OTHER);
 Tue, 07 Jun 2016 12:53:05 -0700 (PDT)
Subject: Re: Avago LSI SAS 3008 & Intel SSD Timeouts
To: freebsd-scsi@freebsd.org
References: <30c04d8b-80cb-c637-26dc-97caebad3acb@mindpackstudios.com>
 <b30f968c-cc41-f7de-5a54-35bed961e65a@multiplay.co.uk>
 <08C01646-9AF3-4E89-A545-C051A284E039@sarenet.es>
 <986e03a7-5dc8-f5e0-5a17-4bf49459f905@mindpackstudios.com>
 <2823D96D-881D-4D40-B610-FC8292FA2FC5@sarenet.es>
 <4072b65d-25d4-2a79-5911-573517b0ee57@mindpackstudios.com>
 <583dddc6-4614-9900-88f7-27347866d7aa@mindpackstudios.com>
From: Steven Hartland <killing@multiplay.co.uk>
Message-ID: <331da785-c88b-d74e-512a-37bdb618d512@multiplay.co.uk>
Date: Tue, 7 Jun 2016 20:53:10 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <583dddc6-4614-9900-88f7-27347866d7aa@mindpackstudios.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jun 2016 19:53:08 -0000

CDB: 85 is a TRIM command IIRC, I know you tried it before using BIO 
delete but assuming your running ZFS can you set the following in 
loader.conf and see how you get on.
vfs.zfs.trim.enabled=0

     Regards
     Steve


On 07/06/2016 20:24, list-news wrote:
> I have additional confirmation that it's not faulty hardware.
>
> I moved the 4 disks that carry the postgresql database over to another 
> server (same model - TWIN 2028-DECR).  Mounted the zpool and fired up 
> my application.
>
> This server is using a much earlier firmware on the SAS controller.  
> Different CPU / Memory / etc.
>
> Errors happen within the first couple minutes, and continue every few 
> minutes (notice time-stamps for each drive timeout every few minutes):
>
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): READ(10). CDB: 28 00 
> 0e 74 79 e0 00 00 08 00 length 4096 SMID 582 terminated ioc 804b scsi 
> 0 state c xfer 0
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): READ(10). CDB: 28 00 
> 0e 74 79 e8 00 00 08 00 length 4096 SMID 1009 terminated ioc 804b scsi 
> 0 state c xfer 0
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): ATA COMMAND PASS 
> THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 
> length 512 SMID 315 terminated ioc 804b scsi 0 state c xfer 0
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): READ(10). CDB: 28 00 
> 33 91 5c 68 00 00 08 00 length 4096 SMID 183 terminated ioc 804b scsi 
> 0 state c xfer 0
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): READ(10). CDB: 28 00 
> 36 f2 39 40 00 00 10 00 length 8192 SMID 446 terminated ioc 804b scsi 
> 0 state c xfer 0
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): SYNCHRONIZE CACHE(10). 
> CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 715 terminated ioc 
> 804b scsi 0 state c xfer 0
> Jun  7 13:08:32 s17 kernel: mpr0: Unfreezing devq for target ID 14
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): READ(10). CDB: 28 00 
> 36 ea dc 60 00 00 08 00
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): CAM status: Command 
> timeout
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): Retrying command
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): READ(10). CDB: 28 00 
> 0e 74 79 e0 00 00 08 00
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): CAM status: SCSI 
> Status Error
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): SCSI status: Check 
> Condition
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): SCSI sense: UNIT 
> ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> Jun  7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): Retrying command (per 
> sense data)
> Jun  7 13:11:08 s17 kernel: (noperiph:mpr0:0:4294967295:0): SMID 4 
> Aborting command 0xfffffe0000be0140
> Jun  7 13:11:08 s17 kernel: mpr0: Sending reset from mprsas_send_abort 
> for target ID 10
> Jun  7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d 
> f6 ee f0 00 00 08 00 length 4096 SMID 335 terminated ioc 804b scsi 0 
> state c xfer 0
> Jun  7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d 
> f6 ee d8 00 00 10 00 length 8192 SMID 262 terminated ioc 804b scsi 0 
> state c xfer 0
> Jun  7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): ATA COMMAND PASS 
> THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 
> length 512 SMID 692 terminated ioc 804b scsi 0 state c xfer 0
> Jun  7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 19 
> be 13 a0 00 00 10 00 length 8192 SMID 509 terminated ioc 804b scsi 0 
> state c xfer 0
> Jun  7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 21 
> 3c 00 d8 00 00 08 00 length 4096 SMID 911 terminated ioc 804b scsi 0 
> state c xfer 0
> Jun  7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 21 
> 3c 00 d0 00 00 08 00 length 4096 SMID 918 terminated ioc 804b scsi 0 
> state c xfer 0
> Jun  7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 21 
> 3c 00 c8 00 00 08 00 length 4096 SMID 585 terminated ioc 804b scsi 0 
> state c xfer 0
> Jun  7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): SYNCHRONIZE CACHE(10). 
> CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 297 terminated ioc 
> 804b scsi 0 state c xfer 0
> Jun  7 13:11:08 s17 kernel: mpr0: Unfreezing devq for target ID 10
> Jun  7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 35 
> 26 ca f0 00 00 08 00
> Jun  7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): CAM status: Command 
> timeout
> Jun  7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): Retrying command
> Jun  7 13:11:09 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d 
> f6 ee f0 00 00 08 00
> Jun  7 13:11:09 s17 kernel: (da2:mpr0:0:10:0): CAM status: SCSI Status 
> Error
> Jun  7 13:11:09 s17 kernel: (da2:mpr0:0:10:0): SCSI status: Check 
> Condition
> Jun  7 13:11:09 s17 kernel: (da2:mpr0:0:10:0): SCSI sense: UNIT 
> ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> Jun  7 13:11:09 s17 kernel: (da2:mpr0:0:10:0): Retrying command (per 
> sense data)
> Jun  7 13:13:04 s17 kernel: (noperiph:mpr0:0:4294967295:0): SMID 5 
> Aborting command 0xfffffe0000bfcca0
> Jun  7 13:13:04 s17 kernel: mpr0: Sending reset from mprsas_send_abort 
> for target ID 10
> Jun  7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): ATA COMMAND PASS 
> THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 
> length 512 SMID 504 terminated ioc 804b scsi 0 state c xfer 0
> Jun  7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 1b 
> 8d 99 48 00 00 08 00 length 4096 SMID 677 terminated ioc 804b scsi 0 
> state c xfer 0
> Jun  7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 13 
> 6b df b8 00 00 10 00 length 8192 SMID 563 terminated ioc 804b scsi 0 
> state c xfer 0
> Jun  7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d 
> f7 cd a8 00 00 08 00 length 4096 SMID 723 terminated ioc 804b scsi 0 
> state c xfer 0
> Jun  7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d 
> f7 cd b0 00 00 08 00 length 4096 SMID 335 terminated ioc 804b scsi 0 
> state c xfer 0
> Jun  7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): SYNCHRONIZE CACHE(10). 
> CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 478 terminated ioc 
> 804b scsi 0 state c xfer 0
> Jun  7 13:13:04 s17 kernel: mpr0: Unfreezing devq for target ID 10
> Jun  7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 1e 
> d6 de f0 00 00 08 00
> Jun  7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): CAM status: Command 
> timeout
> Jun  7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): Retrying command
> Jun  7 13:13:05 s17 kernel: mpr0: log_info(0x31120440): 
> originator(PL), code(0x12), sub_code(0x0440)
> Jun  7 13:13:05 s17 kernel: mpr0: (da2:mpr0:0:10:0): ATA COMMAND PASS 
> THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
> Jun  7 13:13:05 s17 kernel: log_info(0x31120440): originator(PL), 
> code(0x12), sub_code(0x0440)
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): CAM status: CCB request 
> completed with an error
> Jun  7 13:13:05 s17 kernel: mpr0: (da2:log_info(0x31120440): 
> originator(PL), code(0x12), sub_code(0x0440)
> Jun  7 13:13:05 s17 kernel: mpr0:0:mpr0: 10:log_info(0x31120440): 
> originator(PL), code(0x12), sub_code(0x0440)
> Jun  7 13:13:05 s17 kernel: 0): mpr0: Retrying command
> Jun  7 13:13:05 s17 kernel: log_info(0x31120440): originator(PL), 
> code(0x12), sub_code(0x0440)
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): SYNCHRONIZE CACHE(10). 
> CDB: 35 00 00 00 00 00 00 00 00 00
> Jun  7 13:13:05 s17 kernel: mpr0: (da2:mpr0:0:10:0): CAM status: CCB 
> request completed with an error
> Jun  7 13:13:05 s17 kernel: log_info(0x31120440): originator(PL), 
> code(0x12), sub_code(0x0440)
> Jun  7 13:13:05 s17 kernel: (da2:mpr0: mpr0:0:log_info(0x31120440): 
> originator(PL), code(0x12), sub_code(0x0440)
> Jun  7 13:13:05 s17 kernel: 10:0): Retrying command
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 1b 
> 8d 99 48 00 00 08 00
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): CAM status: CCB request 
> completed with an error
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): Retrying command
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 13 
> 6b df b8 00 00 10 00
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): CAM status: CCB request 
> completed with an error
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): Retrying command
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d 
> f7 cd a8 00 00 08 00
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): CAM status: CCB request 
> completed with an error
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): Retrying command
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d 
> f7 cd b0 00 00 08 00
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): CAM status: CCB request 
> completed with an error
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): Retrying command
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 1e 
> d6 de f0 00 00 08 00
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): CAM status: CCB request 
> completed with an error
> Jun  7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): Retrying command
> Jun  7 13:13:06 s17 kernel: (da2:mpr0:0:10:0): SYNCHRONIZE CACHE(10). 
> CDB: 35 00 00 00 00 00 00 00 00 00
> Jun  7 13:13:06 s17 kernel: (da2:mpr0:0:10:0): CAM status: SCSI Status 
> Error
> Jun  7 13:13:06 s17 kernel: (da2:mpr0:0:10:0): SCSI status: Check 
> Condition
> Jun  7 13:13:06 s17 kernel: (da2:mpr0:0:10:0): SCSI sense: UNIT 
> ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> Jun  7 13:13:06 s17 kernel: (da2:mpr0:0:10:0): Error 6, Retries exhausted
> Jun  7 13:13:06 s17 kernel: (da2:mpr0:0:10:0): Invalidating pack
> Jun  7 13:15:11 s17 kernel: (noperiph:mpr0:0:4294967295:0): SMID 6 
> Aborting command 0xfffffe0000c1e960
> Jun  7 13:15:11 s17 kernel: mpr0: Sending reset from mprsas_send_abort 
> for target ID 11
> Jun  7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): ATA COMMAND PASS 
> THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 
> length 512 SMID 942 terminated ioc 804b scsi 0 state c xfer 0
> Jun  7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): READ(10). CDB: 28 00 23 
> 7f 21 c0 00 00 08 00 length 4096 SMID 359 terminated ioc 804b scsi 0 
> state c xfer 0
> Jun  7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): READ(10). CDB: 28 00 31 
> bb 68 30 00 00 08 00 length 4096 SMID 597 terminated ioc 804b scsi 0 
> state c xfer 0
> Jun  7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): READ(10). CDB: 28 00 19 
> 80 02 68 00 00 50 00 length 40960 SMID 786 terminated ioc 804b scsi 0 
> state c xfer(da3:mpr0:0:11:0): READ(10). CDB: 28 00 22 02 ea 38 00 00 
> 10 00
> Jun  7 13:15:12 s17 kernel: 0
> Jun  7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): CAM status: Command 
> timeout
> Jun  7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): READ(10). CDB: 28 00 19 
> 7e 0d 30 00 00 10 00 length 8192 SMID 602 terminated ioc 804b scsi 0 
> state c xfer (da3:0
> Jun  7 13:15:12 s17 kernel: mpr0:0:    (da3:mpr0:0:11:0): SYNCHRONIZE 
> CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 441 
> terminated ioc 804b scsi 0 sta11:te c xfer 0
> Jun  7 13:15:12 s17 kernel: 0): mpr0: Retrying command
> Jun  7 13:15:12 s17 kernel: Unfreezing devq for target ID 11
> Jun  7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): SYNCHRONIZE CACHE(10). 
> CDB: 35 00 00 00 00 00 00 00 00 00
> Jun  7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): CAM status: SCSI Status 
> Error
> Jun  7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): SCSI status: Check 
> Condition
> Jun  7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): SCSI sense: UNIT 
> ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> Jun  7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): Retrying command (per 
> sense data)
>
> gstat output:
> (I'm guessing I caught this during the da2 error)
>
> #gstat -do
>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w d/s kBps   
> ms/d    o/s   ms/o   %busy Name
>    70      0      0      0    0.0      0      0 0.0      0 0    
> 0.0      0    0.0    0.0| da2
>     0      0      0      0    0.0      0      0    0.0      0 0    
> 0.0      0 0.0   0.0| da3
>     0      0      0      0    0.0      0      0    0.0      0 0    
> 0.0      0 0.0   0.0| da10
>     0      0      0      0    0.0      0      0    0.0      0 0    
> 0.0      0 0.0    0.0| da11
>
>
> I then set the tags down to 1 for each device:
>
> #camcontrol tags da2 -N 1
> #camcontrol tags da3 -N 1
> #camcontrol tags da10 -N 1
> #camcontrol tags da11 -N 1
>
> And, no errors for the last hour, system still running at full load.
>
> Everything is feeling like an NCQ firmware issue.  Intel s3610 says it 
> supports NCQ in it's SSDs with 32 tags.  But I've pulled the errors 
> with tags set to 8 plenty of times.
>
> (See NCQ line below.)
>
> # camcontrol identify da2
>
> pass2: <INTEL SSDSC2BX480G4 G2010150> ACS-2 ATA SATA 3.x device
> pass2: 1200.000MB/s transfers, Command Queueing Enabled
> protocol              ATA/ATAPI-9 SATA 3.x
> device model          INTEL SSDSC2BX480G4
> firmware revision     G2010150
> serial number         [redacted]
> WWN [redacted]
> cylinders             16383
> heads                 16
> sectors/track         63
> sector size           logical 512, physical 4096, offset 0
> LBA supported         268435455 sectors
> LBA48 supported       937703088 sectors
> PIO supported         PIO4
> DMA supported         WDMA2 UDMA6
> media RPM             non-rotating
>
> Feature                      Support  Enabled   Value Vendor
> read ahead                     yes    yes
> write cache                    yes    yes
> flush cache                    yes    yes
> overlap                        no
> Tagged Command Queuing (TCQ)   no     no
> Native Command Queuing (NCQ)   yes              32 tags
> NCQ Queue Management           no
> NCQ Streaming                  no
> Receive & Send FPDMA Queued    no
> SMART                          yes    yes
> microcode download             yes    yes
> security                       yes    no
> power management               yes    yes
> advanced power management      no     no
> automatic acoustic management  no     no
> media status notification      no     no
> power-up in Standby            no     no
> write-read-verify              no     no
> unload                         yes    yes
> general purpose logging        yes    yes
> free-fall                      no     no
> Data Set Management (DSM/TRIM) yes
> DSM - max 512byte blocks       yes              4
> DSM - deterministic read       yes              zeroed
> Host Protected Area (HPA)      yes      no 937703088/937703088
> HPA - Security                 no
>
> And it doesn't appear I have any way to deactivate it in firmware.  
> Which would be a nice test.  I did attempt this with no luck:
> # camcontrol negotiate da2 -T disable
> (pass2:mpr0:0:10:0): transfer speed: 1200.000MB/s
> (pass2:mpr0:0:10:0): tagged queueing: enabled
> camcontrol: XPT_SET_TRANS_SETTINGS CCB failed
>
> -Kyle
>
>
> On 6/7/16 12:09 PM, list-news wrote:
>> The system is a Twin.  In the first post I mentioned this but I 
>> probably wasn't clear.
>>
>> The twin unit is this one:
>> https://www.supermicro.com/products/system/2u/2028/sys-2028tp-decr.cfm
>>
>> I've used all components from twin node A and B (cpu / memory / 
>> mainboard / controller).  I still get the errors.  The backplane was 
>> the original thought of concern, and that has been RMA'd and replaced 
>> - errors continue.  I've even swapped out power supplies with another 
>> identical unit I have here.
>>
>> In every case the errors continue, until I do this:
>> #camcontrol daX -N 1
>> (for each drive in the zpool)
>>
>> Then the errors stop.
>>
>> The system errors every few minutes while my application is running.  
>> Set tags to -N 1, and everything goes quiet.  16 cores at 100% cpu 
>> and drives 80% busy @ ~15k IO p/s, for about 5 hours solid before it 
>> finishes a batch, no errors are reported with -N set to 1.  If I set 
>> tags with -N 255 for each device, errors start again within 5 
>> minutes, and continue every 2-5 minutes, until the batch is finished.
>>
>> -Kyle
>>
>>> I would try, if possible, to swap the controller.
>>>
>>>
>>>
>>>
>>>
>>>
>>> Borja.
>>>
>>>
>>
>> _______________________________________________
>> freebsd-scsi@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
>> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"
>
>
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"