From owner-freebsd-current@FreeBSD.ORG Sat Jul 25 20:13:31 2009 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A258F106566C; Sat, 25 Jul 2009 20:13:31 +0000 (UTC) (envelope-from nox@jelal.kn-bremen.de) Received: from smtp.kn-bremen.de (gelbbaer.kn-bremen.de [78.46.108.116]) by mx1.freebsd.org (Postfix) with ESMTP id 31C628FC0A; Sat, 25 Jul 2009 20:13:31 +0000 (UTC) (envelope-from nox@jelal.kn-bremen.de) Received: by smtp.kn-bremen.de (Postfix, from userid 10) id 7E4A01E002B9; Sat, 25 Jul 2009 22:13:30 +0200 (CEST) Received: from triton.kn-bremen.de (noident@localhost [127.0.0.1]) by triton.kn-bremen.de (8.14.3/8.14.3) with ESMTP id n6PKBFXm062456; Sat, 25 Jul 2009 22:11:15 +0200 (CEST) (envelope-from nox@triton.kn-bremen.de) Received: (from nox@localhost) by triton.kn-bremen.de (8.14.3/8.14.3/Submit) id n6PKBFXA062455; Sat, 25 Jul 2009 22:11:15 +0200 (CEST) (envelope-from nox) From: Juergen Lock Date: Sat, 25 Jul 2009 22:11:14 +0200 To: Alexander Motin Message-ID: <20090725201114.GA62249@triton.kn-bremen.de> References: <4A4517BE.9040504@FreeBSD.org> <4A4FEBBC.30203@omnilan.de> <4A5053A8.2060902@FreeBSD.org> <90a5caac0907050729k4b24f2eaj64f7d752bddff1ea@mail.gmail.com> <200907062116.n66LGk5t013830@triton.kn-bremen.de> <20090725190001.GA56987@triton.kn-bremen.de> <4A6B5AAE.9010809@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4A6B5AAE.9010809@FreeBSD.org> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-current@FreeBSD.org Subject: Re: RFC: ATA to CAM integration patch (and gjournaled previuos nodes) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jul 2009 20:13:32 -0000 On Sat, Jul 25, 2009 at 10:19:10PM +0300, Alexander Motin wrote: > Juergen Lock wrote: > > On Mon, Jul 06, 2009 at 11:16:46PM +0200, Juergen Lock wrote: > >> I tried this on the box with that optical drive that head no > >> longer likes (fails to be probed and generates an irq storm, see > >> http://docs.freebsd.org/cgi/mid.cgi?20090628101656.GA38983 > >> ), and with ahci.ko loaded by loader.conf I got timeouts followed by > >> a panic: > >> http://people.freebsd.org/~nox/cam-ata.20090704-panic1.jpg > >> http://people.freebsd.org/~nox/cam-ata.20090704-panic2.jpg > >> [...] > > > > Ok I managed to dig myself out of this mess by connecting the problem > > drive to a jmicron pcie card that fell into my hands yesterday; I updated > > the test install to head from today and started reinstalling ports (bc of > > the shlib bumps) and testing the new hplip port on head (seems to work > > no worse than on 7), when suddenly ahci got problems: it printed endless > > retrying messages with the box' disk access led on solid, causing processes > > to get stuck. I was still able to switch to a console and enter ddb, > > but dumping (call doadump) failed and I didn't know what to look for > > otherwise, so I'm afraid I can't give more info about this hang. :( > > Anyway, could this be caused by ncq? I have disabled ahci.ko for now, > > we'll see if this `fixes' it... > > Difficult to say without seeing those messages. NCQ errors actually may > lead to series (up to 32) of retries, as if there were several running > commands when error happened, all other commands are aborted and retried > after error recovery process completes. Ah so the recovery could take several minutes? Maybe I didn't wait long enough then... > I haven't experimented with > really broken drives, but artificially generated NCQ errors were handled > properly on my tests. > OK I guess I should take a photo next time it happens... Btw, can the max # of `tags' be lowered with ncq too in case a drive cant handle too many? I think its `camcontrol tags' for scsi... > > Here is the dmesg with ahci and the jmicron: > > > > atapci0: port 0xbf00-0xbf07,0xbe00-0xbe03,0xbd00-0xbd07,0xbc00-0xbc03,0xbb00-0xbb0f mem 0xfd8fe000-0xfd8fffff irq 17 at device 0.0 on pci2 > > atapci0: Reserved 0x10 bytes for rid 0x20 type 4 at 0xbb00 > > ioapic0: routing intpin 17 (PCI IRQ 17) to lapic 0 vector 49 > > atapci0: [MPSAFE] > > atapci0: [ITHREAD] > > atapci0: Reserved 0x2000 bytes for rid 0x24 type 3 at 0xfd8fe000 > > atapci0: AHCI called from vendor specific driver > > atapci0: AHCI v1.00 controller with 2 3Gbps ports, PM supported > > atapci0: Caps: 64bit NCQ ALP AL CLO 3Gbps PM PMD SSC PSC 32cmd 2ports > > ata2: on atapci0 > > ata2: AHCI reset... > > ata2: hardware reset ... > > ata2: SATA connect timeout status=00000000 > > ata2: AHCI reset done: phy reset found no device > > ata2: [MPSAFE] > > ata2: [ITHREAD] > > ata3: on atapci0 > > ata3: AHCI reset... > > ata3: hardware reset ... > > ata3: SATA connect time=0ms status=00000113 > > ata3: ready wait time=11ms > > ata3: software reset port 15... > > ata3: ready wait time=0ms > > ata3: SIGNATURE: eb140101 > > ata3: AHCI reset done: devices=00010000 > > ata3: [MPSAFE] > > ata3: [ITHREAD] > > ata4: on atapci0 > > atapci0: Reserved 0x8 bytes for rid 0x10 type 4 at 0xbf00 > > atapci0: Reserved 0x4 bytes for rid 0x14 type 4 at 0xbe00 > > ata4: reset tp1 mask=03 ostat0=60 ostat1=70 > > ata4: stat0=0x20 err=0x20 lsb=0x20 msb=0x20 > > ata4: stat1=0x30 err=0x30 lsb=0x30 msb=0x30 > > ata4: reset tp2 stat0=20 stat1=30 devices=0x0 > > ata4: [MPSAFE] > > ata4: [ITHREAD] > > As I can see here, your JMicron configured for combined mode, not for > plain AHCI, so it was handled by ata(4), not by ahci(4). Ah that can be configured? Anyway there's only an optical drive on it atm so its probably not _that_ important. :) Thanx, Juergen