Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Feb 2012 15:09:58 -0800
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Victor Balada Diaz <victor@bsdes.net>
Cc:        Harald Schmalzbauer <h.schmalzbauer@omnilan.de>, Alexander Motin <mav@freebsd.org>, freebsd-stable@freebsd.org, Claudius Herder <claudius@ambtec.de>
Subject:   Re: problems with AHCI on FreeBSD 8.2
Message-ID:  <20120214230958.GA8434@icarus.home.lan>
In-Reply-To: <20120214221527.GT2010@equilibrium.bsdes.net>
References:  <20120214091909.GP2010@equilibrium.bsdes.net> <20120214100513.GA94501@icarus.home.lan> <20120214135435.GQ2010@equilibrium.bsdes.net> <20120214141601.GA98986@icarus.home.lan> <4F3A83DE.3000200@ambtec.de> <20120214165029.GA1852@icarus.home.lan> <4F3A971F.9040407@omnilan.de> <20120214221527.GT2010@equilibrium.bsdes.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 14, 2012 at 11:15:27PM +0100, Victor Balada Diaz wrote:
> On Tue, Feb 14, 2012 at 06:17:19PM +0100, Harald Schmalzbauer wrote:
> >  schrieb Jeremy Chadwick am 14.02.2012 17:50 (localtime):
> > > On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote:
> > >> Hello,
> > >>
> > >> I have got a quite similar problem with AHCI on FreeBSD 8.2 and it still
> > >> persists on FreeBSD 9.0 release.
> > >>
> > >> Switching from ahci to ataahci resolved the problem for me too.
> > >>
> > >> I'm using gmirror for swap, system is on a zpool and the problem first
> > >> occurred during a zpool scrub, but it is easily reproducible with dd.
> > >>
> > >> The timeouts only occur when writing to disks, dd if=/dev/ada{0|1}
> > >> of=/dev/null is not an issue.
> > >> Sometimes I need to power off the server because after a reboot one disk
> > >> is still missing.
> > >>
> > >> I really would like to help in this issue, so let me know if you need
> > >> any more information.
> > > I find it interesting that, at least so far, the only people reporting
> > > problems of this type with the ahci.ko driver are people using Samsung
> > > disks.  The only difference is that your models are F1s while the OPs
> > > are F2s.
> > 
> > I saw such timeouts long ago and mav@ had a look at my postings and he
> > mentioned it could be a NCQ problem.
> > I suspected the disks firmware.
> > I never tracked it down further, because after replacing the Samsung (F3
> > in that case) disks with hitachi ones solved all my problems and gave a
> > big performance kick as well (with zfs).
> > You can find the discussion here:
> > http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html
> > 
> 
> You gave me a good idea: try to disable NCQ and see if that's the fault. So
> i went and applied the attached patch. After it, i can no longer reproduce
> the issue with ahci driver.
> 
> I know this is not a solution because it disables NCQ at controller level
> instead of disk level, but at least we know for sure where the problem is.
> 
> I think the solution would be to add a new quirk ADA_Q_NONCQ in sys/cam/ata/ata_da.c.
> Quirks infraestructure is already built, so adding a new quirk for this seems
> easy.
> 
> Is someone interested? Do you think there is a better solution?
> 
> If someone is interested i can build a patch to add ADA_Q_NONCQ quirk and add my drives
> to it.

I took a stab at this, but I don't feel confident this is the proper
solution/method.  I worry there's some sort of chicken-or-the-egg
condition here (quirk setup/matching comes *after* SATA capabilities
detection), or that it makes the code messier.  Need mav@'s
recommendations on this.

Below is for RELENG_8.  I should note I haven't tested if this works, or
even compiles -- normally I don't provide such patches without testing
so I apologise in advance / user beware.

-- 
| Jeremy Chadwick                                 jdc@parodius.com |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

diff -ruN /usr/src/sys/cam/ata/ata_da.c src/sys/cam/ata/ata_da.c
--- /usr/src/sys/cam/ata/ata_da.c	2012-02-10 17:22:25.000000000 -0800
+++ src/sys/cam/ata/ata_da.c	2012-02-14 15:07:07.988814133 -0800
@@ -90,7 +90,8 @@
 
 typedef enum {
 	ADA_Q_NONE		= 0x00,
-	ADA_Q_4K		= 0x01,
+	ADA_Q_4K		= 0x01,	/* 4k sectors */
+	ADA_Q_NONCQ		= 0x02,	/* device has flaky NCQ support */
 } ada_quirks;
 
 typedef enum {
@@ -162,6 +163,11 @@
 		/*quirks*/ADA_Q_4K
 	},
 	{
+		/* Samsung Spinpoint F2 EG (EcoGreen) drives */
+		{ T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG HD154UI*", "*" },
+		/*quirks*/ADA_Q_NONCQ,
+	},
+	{
 		/* Samsung Advanced Format (4k) drives */
 		{ T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG HD155UI*", "*" },
 		/*quirks*/ADA_Q_4K
@@ -887,9 +893,6 @@
 		softc->flags |= ADA_FLAG_CAN_FLUSHCACHE;
 	if (cgd->ident_data.support.command1 & ATA_SUPPORT_POWERMGT)
 		softc->flags |= ADA_FLAG_CAN_POWERMGT;
-	if (cgd->ident_data.satacapabilities & ATA_SUPPORT_NCQ &&
-	    (cgd->inq_flags & SID_DMA) && (cgd->inq_flags & SID_CmdQue))
-		softc->flags |= ADA_FLAG_CAN_NCQ;
 	if (cgd->ident_data.support_dsm & ATA_SUPPORT_DSM_TRIM) {
 		softc->flags |= ADA_FLAG_CAN_TRIM;
 		softc->trim_max_ranges = TRIM_MAX_RANGES;
@@ -916,6 +919,15 @@
 	else
 		softc->quirks = ADA_Q_NONE;
 
+	/*
+	 * Do not enable NCQ for devices which have the ADA_Q_NONCQ quirk.
+	 */
+	if (!(softc->quirks & ADA_Q_NONCQ)) {
+		if (cgd->ident_data.satacapabilities & ATA_SUPPORT_NCQ &&
+		    (cgd->inq_flags & SID_DMA) && (cgd->inq_flags & SID_CmdQue))
+			softc->flags |= ADA_FLAG_CAN_NCQ;
+	}
+
 	bzero(&cpi, sizeof(cpi));
 	xpt_setup_ccb(&cpi.ccb_h, periph->path, CAM_PRIORITY_NONE);
 	cpi.ccb_h.func_code = XPT_PATH_INQ;



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120214230958.GA8434>