From owner-freebsd-hackers@FreeBSD.ORG Fri May 22 01:11:52 2015 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 87DF289C; Fri, 22 May 2015 01:11:52 +0000 (UTC) Received: from mail-pd0-x234.google.com (mail-pd0-x234.google.com [IPv6:2607:f8b0:400e:c02::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 57D121257; Fri, 22 May 2015 01:11:52 +0000 (UTC) Received: by pdfh10 with SMTP id h10so5201124pdf.3; Thu, 21 May 2015 18:11:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=Ll0L6ZmlhbkTt9w9py0CqTekPjg9OcpoR2koO+p57TU=; b=LqxEdEV7h61Cn0qaeoWaFRN33iCgOxcNrajG1jTFe620bse3RggEE4/XIC7Pgl7f7G d7TAzWLdQtam0oMplLgJDz90CPRimRZU8hGUtmANrV1++E29NnrCaSsglZybbNAZAnr+ QVxMn3pRbyRsVYeHzsQq7Ks77gPdoyncrBhhEaKSsJpqB3UIMWuIaImBCBxNjqWlSx8l v+t2ApxjYUN7Yc2ugR1BQGMF+iL7BctJTDiExvop4TKFq8aKfcafgN0pZmI+Sn32YMru bAubMc1Oca+KHHMDeTIbh6LtvpFBwYTRN/SB8FieH5Hz1DgbifuWRdn2NQikPr36K5Qt NT1A== X-Received: by 10.68.219.42 with SMTP id pl10mr10754384pbc.154.1432257111722; Thu, 21 May 2015 18:11:51 -0700 (PDT) Received: from mavbook.mavhome.dp.ua ([66.126.46.84]) by mx.google.com with ESMTPSA id c1sm292399pdc.45.2015.05.21.18.11.48 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 May 2015 18:11:51 -0700 (PDT) Sender: Alexander Motin Message-ID: <555E8252.2060307@FreeBSD.org> Date: Fri, 22 May 2015 04:11:46 +0300 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Warner Losh , Neffi CC: hackers@freebsd.org, imp@freebsd.org Subject: Re: Botched NCQ on SSD - cannot disable? References: <8EDE2E6C-FED8-498B-9211-E3534A28D2FC@bsdimp.com> In-Reply-To: <8EDE2E6C-FED8-498B-9211-E3534A28D2FC@bsdimp.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 May 2015 01:11:52 -0000 On 21.05.2015 21:54, Warner Losh wrote: > >> On May 21, 2015, at 12:42 PM, Neffi wrote: >> >> I was discussing this issue in freenode/#freebsd and I was >> recommended to shoot an email to you fellows about it. >> >> I've got an Samsung 840 EVO SSD (model MZ-7TE250BW), which uses >> Samsung's own controller from what I can gather. I had issues of >> mass data corruption when used under Linux, and several programs >> crashing unexpectedly when used under FreeBSD. I've gone through >> 2 drives under warranty with the same issue before customer >> service suggested to disable drive queuing. >> >> After some research it seems as though this drive (and several >> other common SSDs) report that they support NCQ, but in fact are >> botched and will have all sorts of problems with NCQ enabled >> ranging from poor performance, to I/O stalls to data corruption. >> >> Sure enough the logs on Linux spit out something along the lines >> of: >> >>> ata1: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x10 frozen >>> ata1.00: failed command: READ FPDMA QUEUED >> >> This happens several times when used on Linux, in the few hours >> leading up to total filesystem corruption. >> >> The recommendation in the Linux world is to disable NCQ on these >> drives, for which there is an easy boot-time tunable for it. This >> fixes the issue. No more data corruption. >> >> There doesn't seem to be a tunable for this anywhere on FreeBSD. >> camcontrol(8) mentions setting the tags used, but only between >> some hardcoded limits, with a default of 2 -- not sufficient to >> disable NCQ on the drive. It looks like presently the only option >> is to manually patch the quirks for this drive in the kernel and >> recompile before I can even install the system to the drive. > > One option is to use drives that don’t suck so bad. > > If you are using the AHCI controller, it has quirks for some cards > that don’t properly fill in the NCQ tags, but so far that’s a tiny > list of mostly older gear. What’s the host controller you are > using. > > Also, just because the command that hung on the drive is an NCQ > command, that doesn’t mean disabling NCQ commands will keep you > safe. That’s just the first one that’s issued after the firmware > wedges (or could be: that’s a very common scenario for this kind of > failure mode). > > There’s a quirk for the 840 EVO, but that’s just to force 4k sector > size. > > While I haven’t used this generation of Samsung SSDs, I’d be highly > surprised if this issue was really a problem in the drive instead > of some cabling issue, or other environmental issue leading the the > wedge. > > It’s true there’s no way to totally disable NCQ, but if the drive > is hanging with NCQ depth of 2, I’d be highly surprised if it is > actually NCQ causing this... IIRC camcontrol can disable NCQ, even though it is not very intuitive: `camcontrol negotiate adaX -T disable ; camcontrol reset ` -- Alexander Motin