Date: Tue, 14 Feb 2012 15:09:58 -0800 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Victor Balada Diaz <victor@bsdes.net> Cc: Harald Schmalzbauer <h.schmalzbauer@omnilan.de>, Alexander Motin <mav@freebsd.org>, freebsd-stable@freebsd.org, Claudius Herder <claudius@ambtec.de> Subject: Re: problems with AHCI on FreeBSD 8.2 Message-ID: <20120214230958.GA8434@icarus.home.lan> In-Reply-To: <20120214221527.GT2010@equilibrium.bsdes.net> References: <20120214091909.GP2010@equilibrium.bsdes.net> <20120214100513.GA94501@icarus.home.lan> <20120214135435.GQ2010@equilibrium.bsdes.net> <20120214141601.GA98986@icarus.home.lan> <4F3A83DE.3000200@ambtec.de> <20120214165029.GA1852@icarus.home.lan> <4F3A971F.9040407@omnilan.de> <20120214221527.GT2010@equilibrium.bsdes.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 14, 2012 at 11:15:27PM +0100, Victor Balada Diaz wrote: > On Tue, Feb 14, 2012 at 06:17:19PM +0100, Harald Schmalzbauer wrote: > > schrieb Jeremy Chadwick am 14.02.2012 17:50 (localtime): > > > On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote: > > >> Hello, > > >> > > >> I have got a quite similar problem with AHCI on FreeBSD 8.2 and it still > > >> persists on FreeBSD 9.0 release. > > >> > > >> Switching from ahci to ataahci resolved the problem for me too. > > >> > > >> I'm using gmirror for swap, system is on a zpool and the problem first > > >> occurred during a zpool scrub, but it is easily reproducible with dd. > > >> > > >> The timeouts only occur when writing to disks, dd if=/dev/ada{0|1} > > >> of=/dev/null is not an issue. > > >> Sometimes I need to power off the server because after a reboot one disk > > >> is still missing. > > >> > > >> I really would like to help in this issue, so let me know if you need > > >> any more information. > > > I find it interesting that, at least so far, the only people reporting > > > problems of this type with the ahci.ko driver are people using Samsung > > > disks. The only difference is that your models are F1s while the OPs > > > are F2s. > > > > I saw such timeouts long ago and mav@ had a look at my postings and he > > mentioned it could be a NCQ problem. > > I suspected the disks firmware. > > I never tracked it down further, because after replacing the Samsung (F3 > > in that case) disks with hitachi ones solved all my problems and gave a > > big performance kick as well (with zfs). > > You can find the discussion here: > > http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html > > > > You gave me a good idea: try to disable NCQ and see if that's the fault. So > i went and applied the attached patch. After it, i can no longer reproduce > the issue with ahci driver. > > I know this is not a solution because it disables NCQ at controller level > instead of disk level, but at least we know for sure where the problem is. > > I think the solution would be to add a new quirk ADA_Q_NONCQ in sys/cam/ata/ata_da.c. > Quirks infraestructure is already built, so adding a new quirk for this seems > easy. > > Is someone interested? Do you think there is a better solution? > > If someone is interested i can build a patch to add ADA_Q_NONCQ quirk and add my drives > to it. I took a stab at this, but I don't feel confident this is the proper solution/method. I worry there's some sort of chicken-or-the-egg condition here (quirk setup/matching comes *after* SATA capabilities detection), or that it makes the code messier. Need mav@'s recommendations on this. Below is for RELENG_8. I should note I haven't tested if this works, or even compiles -- normally I don't provide such patches without testing so I apologise in advance / user beware. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | diff -ruN /usr/src/sys/cam/ata/ata_da.c src/sys/cam/ata/ata_da.c --- /usr/src/sys/cam/ata/ata_da.c 2012-02-10 17:22:25.000000000 -0800 +++ src/sys/cam/ata/ata_da.c 2012-02-14 15:07:07.988814133 -0800 @@ -90,7 +90,8 @@ typedef enum { ADA_Q_NONE = 0x00, - ADA_Q_4K = 0x01, + ADA_Q_4K = 0x01, /* 4k sectors */ + ADA_Q_NONCQ = 0x02, /* device has flaky NCQ support */ } ada_quirks; typedef enum { @@ -162,6 +163,11 @@ /*quirks*/ADA_Q_4K }, { + /* Samsung Spinpoint F2 EG (EcoGreen) drives */ + { T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG HD154UI*", "*" }, + /*quirks*/ADA_Q_NONCQ, + }, + { /* Samsung Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG HD155UI*", "*" }, /*quirks*/ADA_Q_4K @@ -887,9 +893,6 @@ softc->flags |= ADA_FLAG_CAN_FLUSHCACHE; if (cgd->ident_data.support.command1 & ATA_SUPPORT_POWERMGT) softc->flags |= ADA_FLAG_CAN_POWERMGT; - if (cgd->ident_data.satacapabilities & ATA_SUPPORT_NCQ && - (cgd->inq_flags & SID_DMA) && (cgd->inq_flags & SID_CmdQue)) - softc->flags |= ADA_FLAG_CAN_NCQ; if (cgd->ident_data.support_dsm & ATA_SUPPORT_DSM_TRIM) { softc->flags |= ADA_FLAG_CAN_TRIM; softc->trim_max_ranges = TRIM_MAX_RANGES; @@ -916,6 +919,15 @@ else softc->quirks = ADA_Q_NONE; + /* + * Do not enable NCQ for devices which have the ADA_Q_NONCQ quirk. + */ + if (!(softc->quirks & ADA_Q_NONCQ)) { + if (cgd->ident_data.satacapabilities & ATA_SUPPORT_NCQ && + (cgd->inq_flags & SID_DMA) && (cgd->inq_flags & SID_CmdQue)) + softc->flags |= ADA_FLAG_CAN_NCQ; + } + bzero(&cpi, sizeof(cpi)); xpt_setup_ccb(&cpi.ccb_h, periph->path, CAM_PRIORITY_NONE); cpi.ccb_h.func_code = XPT_PATH_INQ;
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120214230958.GA8434>