Date: Fri, 22 May 2015 04:11:46 +0300 From: Alexander Motin <mav@FreeBSD.org> To: Warner Losh <imp@bsdimp.com>, Neffi <nefftd@gmail.com> Cc: hackers@freebsd.org, imp@freebsd.org Subject: Re: Botched NCQ on SSD - cannot disable? Message-ID: <555E8252.2060307@FreeBSD.org> In-Reply-To: <8EDE2E6C-FED8-498B-9211-E3534A28D2FC@bsdimp.com> References: <CA%2BK1YHFrxHt5rVU%2BLsH9UN37dr_7or1C7rEB0eHfJisU7sPE0Q@mail.gmail.com> <8EDE2E6C-FED8-498B-9211-E3534A28D2FC@bsdimp.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 21.05.2015 21:54, Warner Losh wrote: > >> On May 21, 2015, at 12:42 PM, Neffi <nefftd@gmail.com> wrote: >> >> I was discussing this issue in freenode/#freebsd and I was >> recommended to shoot an email to you fellows about it. >> >> I've got an Samsung 840 EVO SSD (model MZ-7TE250BW), which uses >> Samsung's own controller from what I can gather. I had issues of >> mass data corruption when used under Linux, and several programs >> crashing unexpectedly when used under FreeBSD. I've gone through >> 2 drives under warranty with the same issue before customer >> service suggested to disable drive queuing. >> >> After some research it seems as though this drive (and several >> other common SSDs) report that they support NCQ, but in fact are >> botched and will have all sorts of problems with NCQ enabled >> ranging from poor performance, to I/O stalls to data corruption. >> >> Sure enough the logs on Linux spit out something along the lines >> of: >> >>> ata1: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x10 frozen >>> ata1.00: failed command: READ FPDMA QUEUED >> >> This happens several times when used on Linux, in the few hours >> leading up to total filesystem corruption. >> >> The recommendation in the Linux world is to disable NCQ on these >> drives, for which there is an easy boot-time tunable for it. This >> fixes the issue. No more data corruption. >> >> There doesn't seem to be a tunable for this anywhere on FreeBSD. >> camcontrol(8) mentions setting the tags used, but only between >> some hardcoded limits, with a default of 2 -- not sufficient to >> disable NCQ on the drive. It looks like presently the only option >> is to manually patch the quirks for this drive in the kernel and >> recompile before I can even install the system to the drive. > > One option is to use drives that don’t suck so bad. > > If you are using the AHCI controller, it has quirks for some cards > that don’t properly fill in the NCQ tags, but so far that’s a tiny > list of mostly older gear. What’s the host controller you are > using. > > Also, just because the command that hung on the drive is an NCQ > command, that doesn’t mean disabling NCQ commands will keep you > safe. That’s just the first one that’s issued after the firmware > wedges (or could be: that’s a very common scenario for this kind of > failure mode). > > There’s a quirk for the 840 EVO, but that’s just to force 4k sector > size. > > While I haven’t used this generation of Samsung SSDs, I’d be highly > surprised if this issue was really a problem in the drive instead > of some cabling issue, or other environmental issue leading the the > wedge. > > It’s true there’s no way to totally disable NCQ, but if the drive > is hanging with NCQ depth of 2, I’d be highly surprised if it is > actually NCQ causing this... IIRC camcontrol can disable NCQ, even though it is not very intuitive: `camcontrol negotiate adaX -T disable ; camcontrol reset <CAM bus number where adaX connected>` -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?555E8252.2060307>