From owner-freebsd-scsi@freebsd.org Tue Jun 7 19:53:08 2016 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8DFFEB6D843 for ; Tue, 7 Jun 2016 19:53:08 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x232.google.com (mail-wm0-x232.google.com [IPv6:2a00:1450:400c:c09::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 087DB1528 for ; Tue, 7 Jun 2016 19:53:08 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x232.google.com with SMTP id k204so83570160wmk.0 for ; Tue, 07 Jun 2016 12:53:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to; bh=OyWE/LDvMML2a8/vBR7DS72Be3W4ajYMeIMfKmubKFc=; b=qaoRJQJk4pOrwFH6RMlcq465W+/CNyRbatsuAaD2N6mN43z/pcZhXJQ0GtIm5FC95A l2arOrHD9T8JlN6MI2oB+nFOo94W3EbxP2ZjhZpaufw1LewsvRFq6H3OPC3MOUiA9ha+ FVt7552OWKtfF7TYvMzFAJnDnBZnzZQoaILFR8WmQzf1i8FGWRbQ1+y7WerAh/msB1G+ ZwFkT0PqiN9Z4ZQRNjDdlzwn8AmitDYxSjv2+5YaoA4fol7PpnBTn3gesRCTjublXD78 5AGSQjJWt+v5vQNaKeodmPWHuwc7jJN51cOnNXtvmQqRYedKOT7jzNIklhRrCoJYZ6oS qBNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to; bh=OyWE/LDvMML2a8/vBR7DS72Be3W4ajYMeIMfKmubKFc=; b=kgoRWam3kz2/jpeAhpbCccVOnbwtUh6sEwPnkI9uEVe7GqSvnTebL/CuBf50OBmREP iQbQsS3xI8AP/qDHua/BKL5sz1DJ7ybEIva8SIGHO/jGv/1+EHnvudR0IB3LzGZxgFrX LMdaxd8s+zYcKR7Mb7iT3onBO1d0J6vIAQRjDLQHLI9QWp64JMdHewbNVORu3Ue2sWVT Kc9MsLCOf7VgvropWsi9EcEIBLuPYbsBmYVZtsecBieOxFPp35yqRz0W1bJdybZ8HW31 TUCCBeceozJO7iw7OylJTcm1lhiPHkKCy3FMO6Ugtdu+6ovWA+mp5CoKD7dQqiZM/CRv 7Q3Q== X-Gm-Message-State: ALyK8tJJlke8H9vvETKopxIL3Dl+FjGE+B4tJFUeFza43ENMIGTio0ezcU6oJimIUkyNFX6P X-Received: by 10.28.26.138 with SMTP id a132mr4425191wma.82.1465329186240; Tue, 07 Jun 2016 12:53:06 -0700 (PDT) Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171]) by smtp.gmail.com with ESMTPSA id c62sm20884456wmd.1.2016.06.07.12.53.04 for (version=TLSv1/SSLv3 cipher=OTHER); Tue, 07 Jun 2016 12:53:05 -0700 (PDT) Subject: Re: Avago LSI SAS 3008 & Intel SSD Timeouts To: freebsd-scsi@freebsd.org References: <30c04d8b-80cb-c637-26dc-97caebad3acb@mindpackstudios.com> <08C01646-9AF3-4E89-A545-C051A284E039@sarenet.es> <986e03a7-5dc8-f5e0-5a17-4bf49459f905@mindpackstudios.com> <2823D96D-881D-4D40-B610-FC8292FA2FC5@sarenet.es> <4072b65d-25d4-2a79-5911-573517b0ee57@mindpackstudios.com> <583dddc6-4614-9900-88f7-27347866d7aa@mindpackstudios.com> From: Steven Hartland Message-ID: <331da785-c88b-d74e-512a-37bdb618d512@multiplay.co.uk> Date: Tue, 7 Jun 2016 20:53:10 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: <583dddc6-4614-9900-88f7-27347866d7aa@mindpackstudios.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2016 19:53:08 -0000 CDB: 85 is a TRIM command IIRC, I know you tried it before using BIO delete but assuming your running ZFS can you set the following in loader.conf and see how you get on. vfs.zfs.trim.enabled=0 Regards Steve On 07/06/2016 20:24, list-news wrote: > I have additional confirmation that it's not faulty hardware. > > I moved the 4 disks that carry the postgresql database over to another > server (same model - TWIN 2028-DECR). Mounted the zpool and fired up > my application. > > This server is using a much earlier firmware on the SAS controller. > Different CPU / Memory / etc. > > Errors happen within the first couple minutes, and continue every few > minutes (notice time-stamps for each drive timeout every few minutes): > > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): READ(10). CDB: 28 00 > 0e 74 79 e0 00 00 08 00 length 4096 SMID 582 terminated ioc 804b scsi > 0 state c xfer 0 > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): READ(10). CDB: 28 00 > 0e 74 79 e8 00 00 08 00 length 4096 SMID 1009 terminated ioc 804b scsi > 0 state c xfer 0 > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): ATA COMMAND PASS > THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 > length 512 SMID 315 terminated ioc 804b scsi 0 state c xfer 0 > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): READ(10). CDB: 28 00 > 33 91 5c 68 00 00 08 00 length 4096 SMID 183 terminated ioc 804b scsi > 0 state c xfer 0 > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): READ(10). CDB: 28 00 > 36 f2 39 40 00 00 10 00 length 8192 SMID 446 terminated ioc 804b scsi > 0 state c xfer 0 > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): SYNCHRONIZE CACHE(10). > CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 715 terminated ioc > 804b scsi 0 state c xfer 0 > Jun 7 13:08:32 s17 kernel: mpr0: Unfreezing devq for target ID 14 > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): READ(10). CDB: 28 00 > 36 ea dc 60 00 00 08 00 > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): CAM status: Command > timeout > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): Retrying command > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): READ(10). CDB: 28 00 > 0e 74 79 e0 00 00 08 00 > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): CAM status: SCSI > Status Error > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): SCSI status: Check > Condition > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): SCSI sense: UNIT > ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) > Jun 7 13:08:32 s17 kernel: (da10:mpr0:0:14:0): Retrying command (per > sense data) > Jun 7 13:11:08 s17 kernel: (noperiph:mpr0:0:4294967295:0): SMID 4 > Aborting command 0xfffffe0000be0140 > Jun 7 13:11:08 s17 kernel: mpr0: Sending reset from mprsas_send_abort > for target ID 10 > Jun 7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d > f6 ee f0 00 00 08 00 length 4096 SMID 335 terminated ioc 804b scsi 0 > state c xfer 0 > Jun 7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d > f6 ee d8 00 00 10 00 length 8192 SMID 262 terminated ioc 804b scsi 0 > state c xfer 0 > Jun 7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): ATA COMMAND PASS > THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 > length 512 SMID 692 terminated ioc 804b scsi 0 state c xfer 0 > Jun 7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 19 > be 13 a0 00 00 10 00 length 8192 SMID 509 terminated ioc 804b scsi 0 > state c xfer 0 > Jun 7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 21 > 3c 00 d8 00 00 08 00 length 4096 SMID 911 terminated ioc 804b scsi 0 > state c xfer 0 > Jun 7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 21 > 3c 00 d0 00 00 08 00 length 4096 SMID 918 terminated ioc 804b scsi 0 > state c xfer 0 > Jun 7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 21 > 3c 00 c8 00 00 08 00 length 4096 SMID 585 terminated ioc 804b scsi 0 > state c xfer 0 > Jun 7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): SYNCHRONIZE CACHE(10). > CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 297 terminated ioc > 804b scsi 0 state c xfer 0 > Jun 7 13:11:08 s17 kernel: mpr0: Unfreezing devq for target ID 10 > Jun 7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 35 > 26 ca f0 00 00 08 00 > Jun 7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): CAM status: Command > timeout > Jun 7 13:11:08 s17 kernel: (da2:mpr0:0:10:0): Retrying command > Jun 7 13:11:09 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d > f6 ee f0 00 00 08 00 > Jun 7 13:11:09 s17 kernel: (da2:mpr0:0:10:0): CAM status: SCSI Status > Error > Jun 7 13:11:09 s17 kernel: (da2:mpr0:0:10:0): SCSI status: Check > Condition > Jun 7 13:11:09 s17 kernel: (da2:mpr0:0:10:0): SCSI sense: UNIT > ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) > Jun 7 13:11:09 s17 kernel: (da2:mpr0:0:10:0): Retrying command (per > sense data) > Jun 7 13:13:04 s17 kernel: (noperiph:mpr0:0:4294967295:0): SMID 5 > Aborting command 0xfffffe0000bfcca0 > Jun 7 13:13:04 s17 kernel: mpr0: Sending reset from mprsas_send_abort > for target ID 10 > Jun 7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): ATA COMMAND PASS > THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 > length 512 SMID 504 terminated ioc 804b scsi 0 state c xfer 0 > Jun 7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 1b > 8d 99 48 00 00 08 00 length 4096 SMID 677 terminated ioc 804b scsi 0 > state c xfer 0 > Jun 7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 13 > 6b df b8 00 00 10 00 length 8192 SMID 563 terminated ioc 804b scsi 0 > state c xfer 0 > Jun 7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d > f7 cd a8 00 00 08 00 length 4096 SMID 723 terminated ioc 804b scsi 0 > state c xfer 0 > Jun 7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d > f7 cd b0 00 00 08 00 length 4096 SMID 335 terminated ioc 804b scsi 0 > state c xfer 0 > Jun 7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): SYNCHRONIZE CACHE(10). > CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 478 terminated ioc > 804b scsi 0 state c xfer 0 > Jun 7 13:13:04 s17 kernel: mpr0: Unfreezing devq for target ID 10 > Jun 7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 1e > d6 de f0 00 00 08 00 > Jun 7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): CAM status: Command > timeout > Jun 7 13:13:04 s17 kernel: (da2:mpr0:0:10:0): Retrying command > Jun 7 13:13:05 s17 kernel: mpr0: log_info(0x31120440): > originator(PL), code(0x12), sub_code(0x0440) > Jun 7 13:13:05 s17 kernel: mpr0: (da2:mpr0:0:10:0): ATA COMMAND PASS > THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 > Jun 7 13:13:05 s17 kernel: log_info(0x31120440): originator(PL), > code(0x12), sub_code(0x0440) > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): CAM status: CCB request > completed with an error > Jun 7 13:13:05 s17 kernel: mpr0: (da2:log_info(0x31120440): > originator(PL), code(0x12), sub_code(0x0440) > Jun 7 13:13:05 s17 kernel: mpr0:0:mpr0: 10:log_info(0x31120440): > originator(PL), code(0x12), sub_code(0x0440) > Jun 7 13:13:05 s17 kernel: 0): mpr0: Retrying command > Jun 7 13:13:05 s17 kernel: log_info(0x31120440): originator(PL), > code(0x12), sub_code(0x0440) > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): SYNCHRONIZE CACHE(10). > CDB: 35 00 00 00 00 00 00 00 00 00 > Jun 7 13:13:05 s17 kernel: mpr0: (da2:mpr0:0:10:0): CAM status: CCB > request completed with an error > Jun 7 13:13:05 s17 kernel: log_info(0x31120440): originator(PL), > code(0x12), sub_code(0x0440) > Jun 7 13:13:05 s17 kernel: (da2:mpr0: mpr0:0:log_info(0x31120440): > originator(PL), code(0x12), sub_code(0x0440) > Jun 7 13:13:05 s17 kernel: 10:0): Retrying command > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 1b > 8d 99 48 00 00 08 00 > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): CAM status: CCB request > completed with an error > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): Retrying command > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 13 > 6b df b8 00 00 10 00 > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): CAM status: CCB request > completed with an error > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): Retrying command > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d > f7 cd a8 00 00 08 00 > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): CAM status: CCB request > completed with an error > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): Retrying command > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 0d > f7 cd b0 00 00 08 00 > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): CAM status: CCB request > completed with an error > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): Retrying command > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): READ(10). CDB: 28 00 1e > d6 de f0 00 00 08 00 > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): CAM status: CCB request > completed with an error > Jun 7 13:13:05 s17 kernel: (da2:mpr0:0:10:0): Retrying command > Jun 7 13:13:06 s17 kernel: (da2:mpr0:0:10:0): SYNCHRONIZE CACHE(10). > CDB: 35 00 00 00 00 00 00 00 00 00 > Jun 7 13:13:06 s17 kernel: (da2:mpr0:0:10:0): CAM status: SCSI Status > Error > Jun 7 13:13:06 s17 kernel: (da2:mpr0:0:10:0): SCSI status: Check > Condition > Jun 7 13:13:06 s17 kernel: (da2:mpr0:0:10:0): SCSI sense: UNIT > ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) > Jun 7 13:13:06 s17 kernel: (da2:mpr0:0:10:0): Error 6, Retries exhausted > Jun 7 13:13:06 s17 kernel: (da2:mpr0:0:10:0): Invalidating pack > Jun 7 13:15:11 s17 kernel: (noperiph:mpr0:0:4294967295:0): SMID 6 > Aborting command 0xfffffe0000c1e960 > Jun 7 13:15:11 s17 kernel: mpr0: Sending reset from mprsas_send_abort > for target ID 11 > Jun 7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): ATA COMMAND PASS > THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 > length 512 SMID 942 terminated ioc 804b scsi 0 state c xfer 0 > Jun 7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): READ(10). CDB: 28 00 23 > 7f 21 c0 00 00 08 00 length 4096 SMID 359 terminated ioc 804b scsi 0 > state c xfer 0 > Jun 7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): READ(10). CDB: 28 00 31 > bb 68 30 00 00 08 00 length 4096 SMID 597 terminated ioc 804b scsi 0 > state c xfer 0 > Jun 7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): READ(10). CDB: 28 00 19 > 80 02 68 00 00 50 00 length 40960 SMID 786 terminated ioc 804b scsi 0 > state c xfer(da3:mpr0:0:11:0): READ(10). CDB: 28 00 22 02 ea 38 00 00 > 10 00 > Jun 7 13:15:12 s17 kernel: 0 > Jun 7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): CAM status: Command > timeout > Jun 7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): READ(10). CDB: 28 00 19 > 7e 0d 30 00 00 10 00 length 8192 SMID 602 terminated ioc 804b scsi 0 > state c xfer (da3:0 > Jun 7 13:15:12 s17 kernel: mpr0:0: (da3:mpr0:0:11:0): SYNCHRONIZE > CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 441 > terminated ioc 804b scsi 0 sta11:te c xfer 0 > Jun 7 13:15:12 s17 kernel: 0): mpr0: Retrying command > Jun 7 13:15:12 s17 kernel: Unfreezing devq for target ID 11 > Jun 7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): SYNCHRONIZE CACHE(10). > CDB: 35 00 00 00 00 00 00 00 00 00 > Jun 7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): CAM status: SCSI Status > Error > Jun 7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): SCSI status: Check > Condition > Jun 7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): SCSI sense: UNIT > ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) > Jun 7 13:15:12 s17 kernel: (da3:mpr0:0:11:0): Retrying command (per > sense data) > > gstat output: > (I'm guessing I caught this during the da2 error) > > #gstat -do > L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps > ms/d o/s ms/o %busy Name > 70 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0 0.0 0.0| da2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0 0.0 0.0| da3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0 0.0 0.0| da10 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0 0.0 0.0| da11 > > > I then set the tags down to 1 for each device: > > #camcontrol tags da2 -N 1 > #camcontrol tags da3 -N 1 > #camcontrol tags da10 -N 1 > #camcontrol tags da11 -N 1 > > And, no errors for the last hour, system still running at full load. > > Everything is feeling like an NCQ firmware issue. Intel s3610 says it > supports NCQ in it's SSDs with 32 tags. But I've pulled the errors > with tags set to 8 plenty of times. > > (See NCQ line below.) > > # camcontrol identify da2 > > pass2: ACS-2 ATA SATA 3.x device > pass2: 1200.000MB/s transfers, Command Queueing Enabled > protocol ATA/ATAPI-9 SATA 3.x > device model INTEL SSDSC2BX480G4 > firmware revision G2010150 > serial number [redacted] > WWN [redacted] > cylinders 16383 > heads 16 > sectors/track 63 > sector size logical 512, physical 4096, offset 0 > LBA supported 268435455 sectors > LBA48 supported 937703088 sectors > PIO supported PIO4 > DMA supported WDMA2 UDMA6 > media RPM non-rotating > > Feature Support Enabled Value Vendor > read ahead yes yes > write cache yes yes > flush cache yes yes > overlap no > Tagged Command Queuing (TCQ) no no > Native Command Queuing (NCQ) yes 32 tags > NCQ Queue Management no > NCQ Streaming no > Receive & Send FPDMA Queued no > SMART yes yes > microcode download yes yes > security yes no > power management yes yes > advanced power management no no > automatic acoustic management no no > media status notification no no > power-up in Standby no no > write-read-verify no no > unload yes yes > general purpose logging yes yes > free-fall no no > Data Set Management (DSM/TRIM) yes > DSM - max 512byte blocks yes 4 > DSM - deterministic read yes zeroed > Host Protected Area (HPA) yes no 937703088/937703088 > HPA - Security no > > And it doesn't appear I have any way to deactivate it in firmware. > Which would be a nice test. I did attempt this with no luck: > # camcontrol negotiate da2 -T disable > (pass2:mpr0:0:10:0): transfer speed: 1200.000MB/s > (pass2:mpr0:0:10:0): tagged queueing: enabled > camcontrol: XPT_SET_TRANS_SETTINGS CCB failed > > -Kyle > > > On 6/7/16 12:09 PM, list-news wrote: >> The system is a Twin. In the first post I mentioned this but I >> probably wasn't clear. >> >> The twin unit is this one: >> https://www.supermicro.com/products/system/2u/2028/sys-2028tp-decr.cfm >> >> I've used all components from twin node A and B (cpu / memory / >> mainboard / controller). I still get the errors. The backplane was >> the original thought of concern, and that has been RMA'd and replaced >> - errors continue. I've even swapped out power supplies with another >> identical unit I have here. >> >> In every case the errors continue, until I do this: >> #camcontrol daX -N 1 >> (for each drive in the zpool) >> >> Then the errors stop. >> >> The system errors every few minutes while my application is running. >> Set tags to -N 1, and everything goes quiet. 16 cores at 100% cpu >> and drives 80% busy @ ~15k IO p/s, for about 5 hours solid before it >> finishes a batch, no errors are reported with -N set to 1. If I set >> tags with -N 255 for each device, errors start again within 5 >> minutes, and continue every 2-5 minutes, until the batch is finished. >> >> -Kyle >> >>> I would try, if possible, to swap the controller. >>> >>> >>> >>> >>> >>> >>> Borja. >>> >>> >> >> _______________________________________________ >> freebsd-scsi@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi >> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"