From owner-freebsd-scsi@freebsd.org  Tue Jun  7 19:53:27 2016
Return-Path: <owner-freebsd-scsi@freebsd.org>
Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4468FB6D89A
 for <freebsd-scsi@mailman.ysv.freebsd.org>;
 Tue,  7 Jun 2016 19:53:27 +0000 (UTC)
 (envelope-from list-news@mindpackstudios.com)
Received: from mail.furymx.com (mindpack.mx1.furymx.net [64.141.130.10])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0DC4B167E
 for <freebsd-scsi@freebsd.org>; Tue,  7 Jun 2016 19:53:26 +0000 (UTC)
 (envelope-from list-news@mindpackstudios.com)
Received: from mindpack.furymx.net (mindpack.mx1.furymx.net [10.10.1.10])
 by mail.furymx.com (Postfix) with ESMTP id 8E6561ED4C5
 for <freebsd-scsi@freebsd.org>; Tue,  7 Jun 2016 14:53:25 -0500 (CDT)
X-Virus-Scanned: amavisd-new at furymx.com
Received: from mail.furymx.com ([10.10.1.10])
 by mindpack.furymx.net (mail.furymx.com [10.10.1.10]) (amavisd-new, port 10024)
 with ESMTP id dQxViKVTZADo for <freebsd-scsi@freebsd.org>;
 Tue,  7 Jun 2016 14:53:24 -0500 (CDT)
Received: from vortex.local (c-98-215-180-176.hsd1.in.comcast.net
 [98.215.180.176])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 (Authenticated sender: kyle@mindpackstudios.com)
 by mail.furymx.com (Postfix) with ESMTPSA id 56DDA1ED4BD
 for <freebsd-scsi@freebsd.org>; Tue,  7 Jun 2016 14:53:24 -0500 (CDT)
Subject: Re: Avago LSI SAS 3008 & Intel SSD Timeouts
To: freebsd-scsi@freebsd.org
References: <30c04d8b-80cb-c637-26dc-97caebad3acb@mindpackstudios.com>
 <b30f968c-cc41-f7de-5a54-35bed961e65a@multiplay.co.uk>
 <08C01646-9AF3-4E89-A545-C051A284E039@sarenet.es>
 <986e03a7-5dc8-f5e0-5a17-4bf49459f905@mindpackstudios.com>
 <2823D96D-881D-4D40-B610-FC8292FA2FC5@sarenet.es>
 <4072b65d-25d4-2a79-5911-573517b0ee57@mindpackstudios.com>
 <6f861c77-d9c9-9710-7be6-5b08f1047fe5@multiplay.co.uk>
From: list-news <list-news@mindpackstudios.com>
Message-ID: <d9fb93a6-d3ad-7009-3301-d6bd29be376b@mindpackstudios.com>
Date: Tue, 7 Jun 2016 14:53:23 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
 Gecko/20100101 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <6f861c77-d9c9-9710-7be6-5b08f1047fe5@multiplay.co.uk>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi/>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jun 2016 19:53:27 -0000

I don't believe the mainboard has any SATA ports.  It does have a PCIe 
slot IIRC though, and I may be able to rig something up with another LSI 
adapter I have laying around.  If I can get it to fit and find a way to 
power the drives.

Although, this seems unlikely unless you are seeing something I'm not?

With that last test: If it's the SAS controller, 3 different ones 
running two different firmware versions are all causing the issue.  If 
it's the backplane, I have now tested 3 of them as well, two of which I 
can confirm have different revision numbers.

Errors never appear with tags set to 1 for each drive (effectively 
eliminating NCQ as I understand it).  My brief understanding is that a 
higher tag count allows the SAS adapter to send more commands to the 
drive in parallel, allowing the drive to make the decisions about 
command ordering.  If that is accurate, and the controller firmware was 
bad, I assume this would be a far more common bug that would have been 
fixed already.

On the other hand, if it only happens during heavy SYNCHRONIZE CACHE 
commands in parallel on certain Intel SSD's and only on controllers 
(maybe 12gbps?) that can outrun the drive firmware or cause a race 
condition (my suspicions here).  It seems far more likely this would 
have gone unnoticed by Intel.

-Kyle


On 6/7/16 2:02 PM, Steven Hartland wrote:
> Have you tried direct attaching the drives?
>
> On 07/06/2016 18:09, list-news wrote:
>> The system is a Twin.  In the first post I mentioned this but I 
>> probably wasn't clear.
>>
>> The twin unit is this one:
>> https://www.supermicro.com/products/system/2u/2028/sys-2028tp-decr.cfm
>>
>> I've used all components from twin node A and B (cpu / memory / 
>> mainboard / controller).  I still get the errors.  The backplane was 
>> the original thought of concern, and that has been RMA'd and replaced 
>> - errors continue.  I've even swapped out power supplies with another 
>> identical unit I have here.
>>
>> In every case the errors continue, until I do this:
>> #camcontrol daX -N 1
>> (for each drive in the zpool)
>>
>> Then the errors stop.
>>
>> The system errors every few minutes while my application is running.  
>> Set tags to -N 1, and everything goes quiet.  16 cores at 100% cpu 
>> and drives 80% busy @ ~15k IO p/s, for about 5 hours solid before it 
>> finishes a batch, no errors are reported with -N set to 1.  If I set 
>> tags with -N 255 for each device, errors start again within 5 
>> minutes, and continue every 2-5 minutes, until the batch is finished.
>>
>> -Kyle
>>
>>> I would try, if possible, to swap the controller.
>>>
>>>
>>>
>>>
>>>
>>>
>>> Borja.
>>>
>>>
>>
>> _______________________________________________
>> freebsd-scsi@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
>> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"
>
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"