Date: Wed, 15 Feb 2012 00:34:20 +0100 From: Victor Balada Diaz <victor@bsdes.net> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: Harald Schmalzbauer <h.schmalzbauer@omnilan.de>, Alexander Motin <mav@freebsd.org>, freebsd-stable@freebsd.org, Claudius Herder <claudius@ambtec.de> Subject: Re: problems with AHCI on FreeBSD 8.2 Message-ID: <20120214233420.GU2010@equilibrium.bsdes.net> In-Reply-To: <20120214230958.GA8434@icarus.home.lan> References: <20120214091909.GP2010@equilibrium.bsdes.net> <20120214100513.GA94501@icarus.home.lan> <20120214135435.GQ2010@equilibrium.bsdes.net> <20120214141601.GA98986@icarus.home.lan> <4F3A83DE.3000200@ambtec.de> <20120214165029.GA1852@icarus.home.lan> <4F3A971F.9040407@omnilan.de> <20120214221527.GT2010@equilibrium.bsdes.net> <20120214230958.GA8434@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
On Tue, Feb 14, 2012 at 03:09:58PM -0800, Jeremy Chadwick wrote:
> On Tue, Feb 14, 2012 at 11:15:27PM +0100, Victor Balada Diaz wrote:
> > On Tue, Feb 14, 2012 at 06:17:19PM +0100, Harald Schmalzbauer wrote:
> > > schrieb Jeremy Chadwick am 14.02.2012 17:50 (localtime):
> > > > On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote:
> > > >> Hello,
> > > >>
> > > >> I have got a quite similar problem with AHCI on FreeBSD 8.2 and it still
> > > >> persists on FreeBSD 9.0 release.
> > > >>
> > > >> Switching from ahci to ataahci resolved the problem for me too.
> > > >>
> > > >> I'm using gmirror for swap, system is on a zpool and the problem first
> > > >> occurred during a zpool scrub, but it is easily reproducible with dd.
> > > >>
> > > >> The timeouts only occur when writing to disks, dd if=/dev/ada{0|1}
> > > >> of=/dev/null is not an issue.
> > > >> Sometimes I need to power off the server because after a reboot one disk
> > > >> is still missing.
> > > >>
> > > >> I really would like to help in this issue, so let me know if you need
> > > >> any more information.
> > > > I find it interesting that, at least so far, the only people reporting
> > > > problems of this type with the ahci.ko driver are people using Samsung
> > > > disks. The only difference is that your models are F1s while the OPs
> > > > are F2s.
> > >
> > > I saw such timeouts long ago and mav@ had a look at my postings and he
> > > mentioned it could be a NCQ problem.
> > > I suspected the disks firmware.
> > > I never tracked it down further, because after replacing the Samsung (F3
> > > in that case) disks with hitachi ones solved all my problems and gave a
> > > big performance kick as well (with zfs).
> > > You can find the discussion here:
> > > http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html
> > >
> >
> > You gave me a good idea: try to disable NCQ and see if that's the fault. So
> > i went and applied the attached patch. After it, i can no longer reproduce
> > the issue with ahci driver.
> >
> > I know this is not a solution because it disables NCQ at controller level
> > instead of disk level, but at least we know for sure where the problem is.
> >
> > I think the solution would be to add a new quirk ADA_Q_NONCQ in sys/cam/ata/ata_da.c.
> > Quirks infraestructure is already built, so adding a new quirk for this seems
> > easy.
> >
> > Is someone interested? Do you think there is a better solution?
> >
> > If someone is interested i can build a patch to add ADA_Q_NONCQ quirk and add my drives
> > to it.
>
> I took a stab at this, but I don't feel confident this is the proper
> solution/method. I worry there's some sort of chicken-or-the-egg
> condition here (quirk setup/matching comes *after* SATA capabilities
> detection), or that it makes the code messier. Need mav@'s
> recommendations on this.
>
> Below is for RELENG_8. I should note I haven't tested if this works, or
> even compiles -- normally I don't provide such patches without testing
> so I apologise in advance / user beware.
You're amazingly fast. Thanks for all your help :)
You start applying the quirks before
snprintf(announce_buf, sizeof(announce_buf),
"kern.cam.ada.%d.quirks", periph->unit_number);
quirks = softc->quirks;
TUNABLE_INT_FETCH(announce_buf, &quirks);
So you're breaking quirk setting at boot time.
See my attached patch. I can confirm it works for me.
Regards.
--
La prueba más fehaciente de que existe vida inteligente en otros
planetas, es que no han intentado contactar con nosotros.
[-- Attachment #2 --]
--- ata_da.c 2012-02-14 22:17:54.000000000 +0100
+++ ata_da.c 2012-02-14 22:58:05.000000000 +0100
@@ -91,6 +91,7 @@
typedef enum {
ADA_Q_NONE = 0x00,
ADA_Q_4K = 0x01,
+ ADA_Q_NONCQ = 0x02,
} ada_quirks;
typedef enum {
@@ -162,6 +163,14 @@
/*quirks*/ADA_Q_4K
},
{
+ /*
+ * Samsung have NCQ broken:
+ * http://lists.freebsd.org/pipermail/freebsd-stable/2012-February/066168.html
+ */
+ { T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG HD154UI*", "*" },
+ /*quirks*/ADA_Q_NONCQ
+ },
+ {
/* Samsung Advanced Format (4k) drives */
{ T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG HD155UI*", "*" },
/*quirks*/ADA_Q_4K
@@ -967,6 +976,10 @@
softc->disk->d_maxsize = maxio;
softc->disk->d_unit = periph->unit_number;
softc->disk->d_flags = 0;
+ /* Disable NCQ if needed */
+ if (softc->flags & ADA_FLAG_CAN_NCQ &&
+ softc->quirks & ADA_Q_NONCQ)
+ softc->flags ^= ADA_FLAG_CAN_NCQ;
if (softc->flags & ADA_FLAG_CAN_FLUSHCACHE)
softc->disk->d_flags |= DISKFLAG_CANFLUSHCACHE;
if ((softc->flags & ADA_FLAG_CAN_TRIM) ||
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120214233420.GU2010>
