Date: Thu, 1 May 2014 15:52:57 -0700 (PDT) From: Don Lewis <truckman@FreeBSD.org> To: jhb@FreeBSD.org Cc: stable@FreeBSD.org Subject: Re: Thinkpad R60 hangs when booting recent 8.4-STABLE Message-ID: <201405012253.s41Mqvsd037832@gw.catspoiler.org> In-Reply-To: <201405011424.31981.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 1 May, John Baldwin wrote: > On Wednesday, April 30, 2014 6:45:02 pm Don Lewis wrote: >> On 30 Apr, John Baldwin wrote: >> > On Wednesday, April 30, 2014 1:30:01 pm Don Lewis wrote: >> >> On 30 Apr, John Baldwin wrote: >> >> > On Tuesday, April 29, 2014 9:17:19 pm Don Lewis wrote: >> >> >> On 29 Apr, John Baldwin wrote: >> >> >> > On Monday, April 28, 2014 8:56:03 pm Don Lewis wrote: >> >> >> >> >> >> I just took a closer look at the dmesg output from the two kernels. >> >> >> >> >> >> >> agp0: <Intel 82855 host to AGP bridge> on hostb0 >> >> >> >> hostb0: Reserved 0x10000000 bytes for rid 0x10 type 3 at 0xd0000000 >> >> >> >> >> >> The above line is different with the r262226 kernel: >> >> >> hostb0: Reserved 0x10000000 bytes for rid 0x10 type 3 at 0 >> >> > >> >> > Yes, a resource at 0 is going to break things. 9.2 has the NEW_PCIB >> > option >> >> > enabled. You can try enabling that for 8.4 to see if it fixes this issue. >> >> > If it does, it narrows down where to look for the bug. >> >> >> >> It behaves the same way with NEW_PCIB. I see hostb at 0 and then the >> >> hang shortly thereafter. >> > >> > Ok. hostb isn't actually behind a bridge so that probably makes sense. The >> > one other reporter who sent me debug output had a BAR on his vgapci0 device >> > that ended up being at 0 as well (and an active BAR at 0 is pretty much >> > guaranteed to hose a box). >> > >> > Are you up for doing some printf sleuthing? There are two odd things that I >> > see so far: >> >> Yup, I've already started down that path. >> >> > 1) the base address of 0. The question here is if pci_add_map() in >> > sys/dev/pci/pci.c decides to set start to 0 explicitly, or if it happens >> > further up the callchain (should be bus_alloc_resource calls in >> > sys/dev/acpica/acpi_pcib_acpi.c, sys/x86/x86/nexus.c and then in the >> > rman code itself in sys/kern/subr_rman.c) >> > >> > 2) The 'reserved' printfs during boot probe. Those come from a printf in >> > pci_alloc_resource() in sys/dev/pci/pci.c. However, that should not be called >> > until a driver attaches to a device and calls bus_alloc_resource(). It should >> > not be called from pci_add_child() as it seems to be now. >> >> What I know so far is that for hostb0, pci_alloc_resource() is being >> called with start=0x0 and end=0xffffffff, resource_list_find() is >> succeeding, we don't call pci_alloc_map(), and rman_get_start(rle->res) >> is returning 0. I don't see a call for pci_add_map() for hostb0 unless >> it is much earlier and scrolled off the screen. > > The call to pci_add_map() is earlier. It is called from pci_add_child() when > we scan the PCI bus during attach of the PCI bus device itself. In theory, > pci_alloc_resource() should not be called until a driver actually attaches > to the device and calls bus_alloc_resource() from its probe or attach routine. > >> For debugging #2, should I back out r262226 so that the machine boots >> and I can capture the full dmesg buffer? > > Yes. I think you could just add a panic at that printf line and get a > backtrace for now as a first step as it is occurring way too early. pci_alloc_resource() bus_alloc_resource() pci_hostb_alloc_resource() bus_alloc_resource() agp_generic_attach() agp_intel_attach() device_attach() device_probe_and_attach() bus_generic_attach() pci_hostb_attach() device_attach() device_probe_and_attach() bus_generic_attach() acpi_pci_attach() device_attach() device_probe_and_attach() bus_generic_attach() acpi_attach() device_attach() device_probe_and_attach() bus_generic_attach() nexus_acpi_attach() device_attach() device_probe_and_attach() bus_generic_new_pass() bus_set_pass() root_bus_configure() configure() mi_startup() begin() Since the hang is triggered by passing the size as part of the flags argument to resource_list_alloc(), I tried upgrading subr_rman.c to the version in HEAD. No change in behavior :-(
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201405012253.s41Mqvsd037832>