Date: Tue, 7 Jun 2016 14:53:23 -0500 From: list-news <list-news@mindpackstudios.com> To: freebsd-scsi@freebsd.org Subject: Re: Avago LSI SAS 3008 & Intel SSD Timeouts Message-ID: <d9fb93a6-d3ad-7009-3301-d6bd29be376b@mindpackstudios.com> In-Reply-To: <6f861c77-d9c9-9710-7be6-5b08f1047fe5@multiplay.co.uk> References: <30c04d8b-80cb-c637-26dc-97caebad3acb@mindpackstudios.com> <b30f968c-cc41-f7de-5a54-35bed961e65a@multiplay.co.uk> <08C01646-9AF3-4E89-A545-C051A284E039@sarenet.es> <986e03a7-5dc8-f5e0-5a17-4bf49459f905@mindpackstudios.com> <2823D96D-881D-4D40-B610-FC8292FA2FC5@sarenet.es> <4072b65d-25d4-2a79-5911-573517b0ee57@mindpackstudios.com> <6f861c77-d9c9-9710-7be6-5b08f1047fe5@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
I don't believe the mainboard has any SATA ports. It does have a PCIe slot IIRC though, and I may be able to rig something up with another LSI adapter I have laying around. If I can get it to fit and find a way to power the drives. Although, this seems unlikely unless you are seeing something I'm not? With that last test: If it's the SAS controller, 3 different ones running two different firmware versions are all causing the issue. If it's the backplane, I have now tested 3 of them as well, two of which I can confirm have different revision numbers. Errors never appear with tags set to 1 for each drive (effectively eliminating NCQ as I understand it). My brief understanding is that a higher tag count allows the SAS adapter to send more commands to the drive in parallel, allowing the drive to make the decisions about command ordering. If that is accurate, and the controller firmware was bad, I assume this would be a far more common bug that would have been fixed already. On the other hand, if it only happens during heavy SYNCHRONIZE CACHE commands in parallel on certain Intel SSD's and only on controllers (maybe 12gbps?) that can outrun the drive firmware or cause a race condition (my suspicions here). It seems far more likely this would have gone unnoticed by Intel. -Kyle On 6/7/16 2:02 PM, Steven Hartland wrote: > Have you tried direct attaching the drives? > > On 07/06/2016 18:09, list-news wrote: >> The system is a Twin. In the first post I mentioned this but I >> probably wasn't clear. >> >> The twin unit is this one: >> https://www.supermicro.com/products/system/2u/2028/sys-2028tp-decr.cfm >> >> I've used all components from twin node A and B (cpu / memory / >> mainboard / controller). I still get the errors. The backplane was >> the original thought of concern, and that has been RMA'd and replaced >> - errors continue. I've even swapped out power supplies with another >> identical unit I have here. >> >> In every case the errors continue, until I do this: >> #camcontrol daX -N 1 >> (for each drive in the zpool) >> >> Then the errors stop. >> >> The system errors every few minutes while my application is running. >> Set tags to -N 1, and everything goes quiet. 16 cores at 100% cpu >> and drives 80% busy @ ~15k IO p/s, for about 5 hours solid before it >> finishes a batch, no errors are reported with -N set to 1. If I set >> tags with -N 255 for each device, errors start again within 5 >> minutes, and continue every 2-5 minutes, until the batch is finished. >> >> -Kyle >> >>> I would try, if possible, to swap the controller. >>> >>> >>> >>> >>> >>> >>> Borja. >>> >>> >> >> _______________________________________________ >> freebsd-scsi@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi >> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d9fb93a6-d3ad-7009-3301-d6bd29be376b>