From owner-freebsd-current@FreeBSD.ORG Fri Jun 26 22:32:57 2009 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E9B9C1065679; Fri, 26 Jun 2009 22:32:57 +0000 (UTC) (envelope-from nox@jelal.kn-bremen.de) Received: from smtp.kn-bremen.de (gelbbaer.kn-bremen.de [78.46.108.116]) by mx1.freebsd.org (Postfix) with ESMTP id 65BD18FC08; Fri, 26 Jun 2009 22:32:57 +0000 (UTC) (envelope-from nox@jelal.kn-bremen.de) Received: by smtp.kn-bremen.de (Postfix, from userid 10) id 183841E00261; Sat, 27 Jun 2009 00:32:56 +0200 (CEST) Received: from triton.kn-bremen.de (noident@localhost [127.0.0.1]) by triton.kn-bremen.de (8.14.3/8.14.3) with ESMTP id n5QMQTtp002596; Sat, 27 Jun 2009 00:26:29 +0200 (CEST) (envelope-from nox@triton.kn-bremen.de) Received: (from nox@localhost) by triton.kn-bremen.de (8.14.3/8.14.3/Submit) id n5QMQTtQ002595; Sat, 27 Jun 2009 00:26:29 +0200 (CEST) (envelope-from nox) From: Juergen Lock Date: Sat, 27 Jun 2009 00:26:29 +0200 To: freebsd-current@FreeBSD.org Message-ID: <20090626222629.GA1909@triton.kn-bremen.de> References: <20090625202536.GA2645@triton.kn-bremen.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090625202536.GA2645@triton.kn-bremen.de> User-Agent: Mutt/1.5.20 (2009-06-14) X-Mailman-Approved-At: Mon, 29 Jun 2009 00:49:37 +0000 Cc: mav@FreeBSD.org Subject: ata irq storm/probe failure; tdq_notify pagefault (was: pmap_extract panic @boot (20090623-JPSNAP-amd64-dvd1.iso)) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jun 2009 22:32:58 -0000 On Thu, Jun 25, 2009 at 10:25:36PM +0200, Juergen Lock wrote: > I wanted to try a head snapshot iso today and it failed to boot: > http://people.freebsd.org/~nox/20090623-JPSNAP-amd64-bootpanic1.jpg > http://people.freebsd.org/~nox/20090623-JPSNAP-amd64-bootpanic2.jpg > (I tried to get a serial console but couldn't get this box' serial > port working for some reason... ): > > As you can see in the jpegs it panic'd at pmap_extract+0x106, which > I _think_ is in line 963: (unfortunately there's no kernel.symbols on > the iso) > pdep = pmap_pde(pmap, va); > (see > http://fxr.watson.org/fxr/source/amd64/amd64/pmap.c#L963 > ), and that got invoked from isa_dmarangecheck+0x98: > http://fxr.watson.org/fxr/source/amd64/isa/isa_dma.c#L374 > [...] After I realized ppc_isa_attach() belongs to the parallel port driver which is unused on this box I was able to work around this issue by disabling it in the bios. But the dmesg I posted already points at the next problem... > [...] > start_init: trying /sbin/init > start_init: trying /sbin/oinit > start_init: trying /sbin/init.bak > start_init: trying /rescue/init > start_init: trying /stand/sysinstall > ata5: CONNECT requested > ata5: reiniting channel .. > ata5: AHCI reset... > ata5: hardware reset ... > ata5: SATA connect timeout status=00000001 > ata5: AHCI reset done: phy reset found no device > ata5: reinit done .. > ata5: DISCONNECT requested > ata5: reiniting channel .. > ata5: AHCI reset... > ata5: hardware reset ... > ata5: SATA connect timeout status=00000001 > ata5: AHCI reset done: phy reset found no device > ata5: reinit done .. > ata5: CONNECT requested > ata5: reiniting channel .. > ata5: AHCI reset... > ata5: hardware reset ... > ata5: SATA connect timeout status=00000001 > ata5: AHCI reset done: phy reset found no device > ata5: reinit done .. > ata5: DISCONNECT requested > ata5: reiniting channel .. > ata5: AHCI reset... > ata5: hardware reset ... > ata5: SATA connect timeout status=00000000 > ata5: AHCI reset done: phy reset found no device > ata5: reinit done .. > ata5: CONNECT requested > ata5: reiniting channel .. > ata5: AHCI reset... > ata5: hardware reset ... > ata5: SATA connect timeout status=00000000 > ata5: AHCI reset done: phy reset found no device > ata5: reinit done .. > ata5: DISCONNECT requested > ata5: reiniting channel .. > ata5: AHCI reset... > ata5: hardware reset ... > ata5: SATA connect timeout status=00000001 > ata5: AHCI reset done: phy reset found no device > ata5: reinit done .. > ata5: CONNECT requested > ata5: reiniting channel .. > ata5: AHCI reset... > ata5: hardware reset ... > ata5: SATA connect timeout status=00000001 > ata5: AHCI reset done: phy reset found no device > ata5: reinit done .. > ata5: DISCONNECT requested > ata5: reiniting channel .. > ata5: AHCI reset... > ata5: hardware reset ... > [...] i.e. ata5 isnt probed correctly anymore (its an optical drive), and meanwhile I discovered there's an irq storm on irq 22 too (shared between atapci0 and fwohci0), causing one cpu to be at near 100%. This doesn't seem to occur with a 20090315 snapshot so _maybe_ its related to this commit: mav 2009-03-30 22:18:38 UTC FreeBSD src repository Modified files: sys/dev/ata ata-pci.c ata-pci.h ata-sata.c sys/dev/ata/chipsets ata-ahci.c ata-intel.c ata-jmicron.c ata-marvell.c ata-nvidia.c ata-promise.c ata-siliconimage.c ata-sis.c ata-via.c Log: SVN rev 190581 on 2009-03-30 22:18:38Z by mav Integrate user/mav/ata branch: Add ch_suspend/ch_resume methods for PCI controllers and implement them for AHCI. Refactor AHCI channel initialization according to it. Fix Port Multipliers operation. It is far from perfect yet, but works now. Tested with JMicron JMB363 AHCI + SiI 3726 PMP pair. Previous version was also tested with SiI 4726 PMP. Hardware sponsored by: Vitsch Electronics / VEHosting.nl (I first tried manually reverting commit 191568 in the 20090623 kernel as this box has an amd/ati sb600, but that seemed to make no difference.) While doing that I discovered another problem tho: I installed onto an usb key for testing, and booting that panic'd in tdq_notify+0x4e with a pagefault due to a null pointer here: movzbl 0x27a(%rax),%edx (it was trying to read from 0x27a; the backtrace was almost empty as this seems to be part of the scheduler - I can post the backtrace too if anyone's interested.) I then got it to boot into single user, and from there got into multiuser, and then discovered the ata issue... The good news is the new usb stack seems to like my printer better! (I didn't actually test printing itself but at least it got detected at boot, which on 7-stable it doesn't, I always have to plug it into another port atm after boot before I can use it...) Ok that should be enough for now. Cheers, :) Juergen PS: ata5 related parts of stable dmesg: ... ata5: on atapci0 ata5: SATA connect time=0ms ata5: SIGNATURE: eb140101 ata5: ahci_reset devices=0x4 ata5: [MPSAFE] ata5: [ITHREAD] ... ata5-master: pio=PIO4 wdma=WDMA2 udma=UDMA100 cable=40 wire ata5: device_reset timeout=660us acd1: DVDR drive at ata5 as master acd1: read 5512KB/s (5512KB/s) write 5512KB/s (5512KB/s), 16384KB buffer, SATA150 acd1: Reads: CDR, CDRW, CDDA stream, DVDROM, DVDR, packet acd1: Writes: CDR, CDRW, DVDR, test write, burnproof acd1: Audio: play, 2 volume levels acd1: Mechanism: ejectable tray, unlocked acd1: Medium: no/blank disc ... ...and the other optical drive that also gets detected on head (this dmesg still from stable:) ... ata3: on atapci0 ata3: SATA connect time=0ms ata3: SIGNATURE: eb140101 ata3: ahci_reset devices=0x4 ata3: [MPSAFE] ata3: [ITHREAD] ... ata3-master: pio=PIO4 wdma=WDMA2 udma=UDMA66 cable=40 wire ata3: device_reset timeout=120us acd0: DVDR drive at ata3 as master acd0: read 6890KB/s (6890KB/s) write 5511KB/s (6890KB/s), 2000KB buffer, SATA150 acd0: Reads: CDR, CDRW, CDDA stream, DVDROM, DVDR, DVDRAM, packet acd0: Writes: CDR, CDRW, DVDR, test write, burnproof acd0: Audio: play, 256 volume levels acd0: Mechanism: ejectable tray, unlocked acd0: Medium: no/blank disc ... ..so maybe (wild guess!) UDMA100 is related? Here is the head dmesg related to that drive for comparison: ... ata3: on atapci0 ata3: AHCI reset... ata3: hardware reset ... ata3: SATA connect time=10ms status=00000113 ata3: ready wait time=183ms ata3: software reset port 15... ata3: ahci_issue_cmd timeout: 3000 of 3000ms, status=00000001 ata3: port is not ready (timeout 0ms) tfd = 00000180 ata3: software reset clear timeout ata3: software reset port 0... ata3: ready wait time=0ms ata3: SIGNATURE: eb140101 ata3: AHCI reset done: devices=00010000 ata3: [MPSAFE] ata3: [ITHREAD] ... ata3: Identifying devices: 00010000 ata3: New devices: 00010000 ata3-master: pio=PIO4 wdma=WDMA2 udma=UDMA66 cable=40 wire ata3: device_reset timeout=120us uhub3: 6 ports with 6 removable, self powered uhub6: 6 ports with 6 removable, self powered acd0: DVDR drive at ata3 as master acd0: read 17583KB/s (17583KB/s) write 5410KB/s (5410KB/s), 2000KB buffer, +SATA150 acd0: Reads: CDR, CDRW, CDDA stream, DVDROM, DVDR, DVDRAM, packet acd0: Writes: CDR, CDRW, DVDR, test write, burnproof acd0: Audio: play, 256 volume levels acd0: Mechanism: ejectable tray, unlocked acd0: Medium: DVD 120mm data disc