Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Feb 2012 02:42:05 -0800
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Tom Evans <tevans.uk@googlemail.com>
Cc:        Harald Schmalzbauer <h.schmalzbauer@omnilan.de>, Claudius Herder <claudius@ambtec.de>, freebsd-stable@freebsd.org, Oscar Prieto <oscarmpp@googlemail.com>, Martin Sugioarto <martin@sugioarto.com>
Subject:   Re: problems with AHCI on FreeBSD 8.2
Message-ID:  <20120215104205.GA19734@icarus.home.lan>
In-Reply-To: <CAFHbX1KY9vAPFXiEQXv=M%2BqQQg17TU8BUc_P8M9FiB-gJ0FpvQ@mail.gmail.com>
References:  <20120214100513.GA94501@icarus.home.lan> <20120214135435.GQ2010@equilibrium.bsdes.net> <20120214141601.GA98986@icarus.home.lan> <4F3A83DE.3000200@ambtec.de> <20120214165029.GA1852@icarus.home.lan> <4F3A971F.9040407@omnilan.de> <20120214192319.44ff7aff@zelda.sugioarto.com> <CAK9wqRpjRXtkBqL%2BgX5gY3foqz-O5mT-qg7Z=_t2m=Q3rZizJg@mail.gmail.com> <20120214195255.GA5064@icarus.home.lan> <CAFHbX1KY9vAPFXiEQXv=M%2BqQQg17TU8BUc_P8M9FiB-gJ0FpvQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Feb 15, 2012 at 10:19:37AM +0000, Tom Evans wrote:
> On Tue, Feb 14, 2012 at 7:52 PM, Jeremy Chadwick
> <freebsd@jdc.parodius.com> wrote:
> > On Tue, Feb 14, 2012 at 08:31:23PM +0100, Oscar Prieto wrote:
> >> I used to had tons of ahci errors in my 4 disk raidz1 worth of
> >> HD154UIs when the rig was built a year ago or so (with 8.0 Release),
> >> but they dissapeared after tuning ZFS.
> >>
> >> Sadly i also got a new timeout days ago followed with smartcl erros i
> >> still keep unchecked but i guess they cold be legit, i still have to
> >> test/swap cables and give it a try.
> 
> Interesting. I have 9 SAMSUNG HD154UI 1AG01118 in my raidz setup,
> haven't had a problem with any of them yet (touch wood).
> 
> > Further details which pertain to Samsung drives:
> >
> > In your case, you run smartd(8), which periodically hits the drive with
> > SMART requests, pulling attribute data down and parsing it. ??I believe
> > your model is fine for this, but for similar Samsung models, I must
> > strongly advise against this. ??There are well-documented problems with
> > Samsung firmwares and SMART behaviour which can result in data loss (yes
> > you read that right). ??Please see smartmontools' Wiki page on the matter
> > for full details. ??Just make sure you're running a fixed firmware:
> >
> > http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks
> >
> 
> Yikes, I have just this week installed a HD204UI. From that page,
> drives manufactured after December 2010 should not be affected, which
> is fortunate as the linked firmware page doesn't seem to exist
> anymore, Samsung no longer seem to offer support for their drives and
> point you at Seagate, whose site (of course!) only has downloads for
> current Seagate drives.
> 
> 
> Hmm reading later on in the thread there is a patch to mark certain
> drives as having flaky NCQ - in the patch it is for the SAMSUNG
> HD154UI. As I mentioned before, I have 9 SAMSUNG HD154UI, all of which
> use ahci(4) and NCQ, and all work perfectly, no timeouts. This is
> using 9-STABLE.
> 
> I suspect that there may be more going on than 'flaky NCQ', and that
> perhaps disabling NCQ masks the real issue.

It could simply be a firmware bug in the drive, which is what some
others have eluded to (and I'm in agreement with).  I would love to say
"compare firmware versions on your drives", except there is real
in-the-field proof that firmware version strings often do not get
updated/changed between firmwares (at least in the case of some Seagate
and Western Digital disks).  Furthermore, NCQ can "play differently" with
different AHCI controllers.

That said, the disks / firmware versions mentioned by people involved in
this thread / referenced threads are:

* Victor Balada Diaz  -- SAMSUNG HD154UI, firmware 1AG01118
* Claudius Herder     -- SAMSUNG HD753LJ, firmware 1AA01118
* Oscar Prieto        -- SAMSUNG HD154UI, firmware 1AG01118
  - NOTE: In Oscar's case, his drives exhibit other problems.  I
    would provide a link but the web archive for freebsd-stable does
    not show my mail which contains analysis of the situation
* Harald Schmalzbauer -- not provided, but hints at Samsung EG drives

For this to be thorough, one would need to check what all AHCI
controllers are being used and compare those as well.

I think Scott's theory is probably on-the-ball here, as it pertains to
tag exhaustion, which would manifest itself in the described fashion:

http://lists.freebsd.org/pipermail/freebsd-stable/2012-February/066177.html

I'd urge people experiencing this problem to issue the command Scott
provided on all their Samsung disks and see if the problem goes away
after that.  If it does, great, and I acknowledge there is no
loader.conf tunable for doing this, etc. etc. etc. so either make an
rc.d script that does it after boot-up or something.

-- 
| Jeremy Chadwick                              jdc at parodius.com |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120215104205.GA19734>