From owner-freebsd-stable@FreeBSD.ORG Wed Feb 15 10:42:09 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 039961065676 for ; Wed, 15 Feb 2012 10:42:09 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.westchester.pa.mail.comcast.net (qmta07.westchester.pa.mail.comcast.net [76.96.62.64]) by mx1.freebsd.org (Postfix) with ESMTP id ACF678FC14 for ; Wed, 15 Feb 2012 10:42:08 +0000 (UTC) Received: from omta08.westchester.pa.mail.comcast.net ([76.96.62.12]) by qmta07.westchester.pa.mail.comcast.net with comcast id aAfs1i0010Fqzac57Ai99u; Wed, 15 Feb 2012 10:42:09 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta08.westchester.pa.mail.comcast.net with comcast id aAi71i00P1t3BNj3UAi7tH; Wed, 15 Feb 2012 10:42:08 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id E0E12102C1F; Wed, 15 Feb 2012 02:42:05 -0800 (PST) Date: Wed, 15 Feb 2012 02:42:05 -0800 From: Jeremy Chadwick To: Tom Evans Message-ID: <20120215104205.GA19734@icarus.home.lan> References: <20120214100513.GA94501@icarus.home.lan> <20120214135435.GQ2010@equilibrium.bsdes.net> <20120214141601.GA98986@icarus.home.lan> <4F3A83DE.3000200@ambtec.de> <20120214165029.GA1852@icarus.home.lan> <4F3A971F.9040407@omnilan.de> <20120214192319.44ff7aff@zelda.sugioarto.com> <20120214195255.GA5064@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Harald Schmalzbauer , Claudius Herder , freebsd-stable@freebsd.org, Oscar Prieto , Martin Sugioarto Subject: Re: problems with AHCI on FreeBSD 8.2 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Feb 2012 10:42:09 -0000 On Wed, Feb 15, 2012 at 10:19:37AM +0000, Tom Evans wrote: > On Tue, Feb 14, 2012 at 7:52 PM, Jeremy Chadwick > wrote: > > On Tue, Feb 14, 2012 at 08:31:23PM +0100, Oscar Prieto wrote: > >> I used to had tons of ahci errors in my 4 disk raidz1 worth of > >> HD154UIs when the rig was built a year ago or so (with 8.0 Release), > >> but they dissapeared after tuning ZFS. > >> > >> Sadly i also got a new timeout days ago followed with smartcl erros i > >> still keep unchecked but i guess they cold be legit, i still have to > >> test/swap cables and give it a try. > > Interesting. I have 9 SAMSUNG HD154UI 1AG01118 in my raidz setup, > haven't had a problem with any of them yet (touch wood). > > > Further details which pertain to Samsung drives: > > > > In your case, you run smartd(8), which periodically hits the drive with > > SMART requests, pulling attribute data down and parsing it. ??I believe > > your model is fine for this, but for similar Samsung models, I must > > strongly advise against this. ??There are well-documented problems with > > Samsung firmwares and SMART behaviour which can result in data loss (yes > > you read that right). ??Please see smartmontools' Wiki page on the matter > > for full details. ??Just make sure you're running a fixed firmware: > > > > http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks > > > > Yikes, I have just this week installed a HD204UI. From that page, > drives manufactured after December 2010 should not be affected, which > is fortunate as the linked firmware page doesn't seem to exist > anymore, Samsung no longer seem to offer support for their drives and > point you at Seagate, whose site (of course!) only has downloads for > current Seagate drives. > > > Hmm reading later on in the thread there is a patch to mark certain > drives as having flaky NCQ - in the patch it is for the SAMSUNG > HD154UI. As I mentioned before, I have 9 SAMSUNG HD154UI, all of which > use ahci(4) and NCQ, and all work perfectly, no timeouts. This is > using 9-STABLE. > > I suspect that there may be more going on than 'flaky NCQ', and that > perhaps disabling NCQ masks the real issue. It could simply be a firmware bug in the drive, which is what some others have eluded to (and I'm in agreement with). I would love to say "compare firmware versions on your drives", except there is real in-the-field proof that firmware version strings often do not get updated/changed between firmwares (at least in the case of some Seagate and Western Digital disks). Furthermore, NCQ can "play differently" with different AHCI controllers. That said, the disks / firmware versions mentioned by people involved in this thread / referenced threads are: * Victor Balada Diaz -- SAMSUNG HD154UI, firmware 1AG01118 * Claudius Herder -- SAMSUNG HD753LJ, firmware 1AA01118 * Oscar Prieto -- SAMSUNG HD154UI, firmware 1AG01118 - NOTE: In Oscar's case, his drives exhibit other problems. I would provide a link but the web archive for freebsd-stable does not show my mail which contains analysis of the situation * Harald Schmalzbauer -- not provided, but hints at Samsung EG drives For this to be thorough, one would need to check what all AHCI controllers are being used and compare those as well. I think Scott's theory is probably on-the-ball here, as it pertains to tag exhaustion, which would manifest itself in the described fashion: http://lists.freebsd.org/pipermail/freebsd-stable/2012-February/066177.html I'd urge people experiencing this problem to issue the command Scott provided on all their Samsung disks and see if the problem goes away after that. If it does, great, and I acknowledge there is no loader.conf tunable for doing this, etc. etc. etc. so either make an rc.d script that does it after boot-up or something. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |