From owner-freebsd-current@FreeBSD.ORG Mon May 9 18:48:53 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BAAD9106564A; Mon, 9 May 2011 18:48:53 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7A0E68FC0A; Mon, 9 May 2011 18:48:53 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id EC07946B35; Mon, 9 May 2011 14:48:52 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7E9888A027; Mon, 9 May 2011 14:48:52 -0400 (EDT) From: John Baldwin To: David Naylor Date: Mon, 9 May 2011 14:48:51 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; ) References: <201104152329.59294.naylor.b.david@gmail.com> <201105092024.41588.naylor.b.david@gmail.com> In-Reply-To: <201105092024.41588.naylor.b.david@gmail.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201105091448.51961.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 09 May 2011 14:48:52 -0400 (EDT) Cc: Alexander Motin , FreeBSD-Current Subject: Re: [regression] unable to boot: no GEOM devices found. X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2011 18:48:53 -0000 On Monday, May 09, 2011 2:24:37 pm David Naylor wrote: > On Friday 15 April 2011 23:29:55 David Naylor wrote: > > On Friday 15 April 2011 18:28:06 John Baldwin wrote: > > > On Wednesday, April 13, 2011 1:07:06 pm David Naylor wrote: > > > > On Tuesday 12 April 2011 22:12:55 Alexander Motin wrote: > > > > > David Naylor wrote: > > > > > > On Tuesday 12 April 2011 08:17:51 Alexander Motin wrote: > > > > > >> David Naylor wrote: > > > > > >>> I am running -current and since a few days ago (at least > > > > > >>> 2011/04/11) I am unable to boot. > > > > > >>> > > > > > >>> The boot process stops when it looks to find a bootable device. > > > > > >>> The prompt (when pressing '?') does not display any device and > > > > > >>> yielding > > > > > > one > > > > > > > > >>> second (or more) to the kernel (by pressing '.') does not improve > > > > > >>> the situation. > > > > > >>> > > > > > >>> A known working date is 2011/02/20. > > > > > >>> > > > > > >>> I am running amd64 on a nVidia MCP51 chipset. > > > > > >> > > > > > >> MCP51... again... > > > > > > > > > > +ata2: reiniting channel .. > > > > > +ata2: SATA connect time=0ms status=00000113 > > > > > +ata2: reset tp1 mask=01 ostat0=58 ostat1=00 > > > > > +ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 > > > > > +ata2: reset tp2 stat0=50 stat1=00 devices=0x1 > > > > > +ata2: reinit done .. > > > > > +unknown: FAILURE - ATA_IDENTIFY timed out LBA=0 > > > > > > > > > > As soon as all devices detected but not responding to commands, I > > > > > would suppose that there is something wrong with ATA interrupts. > > > > > There is a long chain of interrupt problems in this chipset. I have > > > > > already tried to debug one case where ATA wasn't generating > > > > > interrupts at all. Unfortunately, without success -- requests were > > > > > executing, but not generating interrupts, it wasn't looked like ATA > > > > > driver problem. > > > > > > > > > > What's about possible candidate to revision triggering your problem, > > > > > I would look on this message: > > > > > +pcib0: Enabling MSI window for HyperTransport slave at pci0:0:9:0 > > > > > > > > > > At least it is recent (SVN revs 219737,219740 on 2011-03-18 by jhb) > > > > > and it is interrupt related. > > > > > > > > I reverted those two revs and everything works again. > > > > > > Hmm, can you provide a full boot verbose dmesg? Alternatively, can you > > > see if the device at pci0:0:9:0 is a PCI-PCI bridge? > > > > I can provide a verbose dmesg if the following is not enough: > > > > none17@pci0:0:9:0: class=0x050000 card=0x50011458 chip=0x027010de > > rev=0xa2 hdr=0x00 > > vendor = 'NVIDIA Corporation' > > device = 'MCP51 Host Bridge' > > class = memory > > subclass = RAM > > > > I see two PCI-PCI bridges at pci0:0:3:0 and pci0:0:16:0. I've attached the > > full `pciconf -lv` output. > > FYI, this issue is still present on current (~24 hours old). Reverting the > above mentioned revisions still fixes the problem. Yes, I'm still chewing on how best to fix this. The problem is that for the most part we should enable the MSI mapping window everywhere, but for certain broken Nvidia chipsets it seems that doing so breaks INTx interrupts and we need to not enable it (and disable MSI globally) on those chipsets. Linux has some grotty code to allow PCI devices to figure out which Host Bridge device on PCI bus 0 is the real host bridge for each HT slave and to selectively enable it in the host bridge when an MSI interrupt is first enabled. They also have a quirk to disable MSI altogether on certain nvidia chipsets if the MSI mapping window is not enabled by the BIOS. I attempted to implement the latter, but it broke perfectly good nvidia chipsets on older ppc-based Macs. I think I want to just disable MSI entirely on busted chipsets like yours, but I need to come up with a good way to detect your chipset (and similar). -- John Baldwin