Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 1 May 2014 15:52:57 -0700 (PDT)
From:      Don Lewis <truckman@FreeBSD.org>
To:        jhb@FreeBSD.org
Cc:        stable@FreeBSD.org
Subject:   Re: Thinkpad R60 hangs when booting recent 8.4-STABLE
Message-ID:  <201405012253.s41Mqvsd037832@gw.catspoiler.org>
In-Reply-To: <201405011424.31981.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On  1 May, John Baldwin wrote:
> On Wednesday, April 30, 2014 6:45:02 pm Don Lewis wrote:
>> On 30 Apr, John Baldwin wrote:
>> > On Wednesday, April 30, 2014 1:30:01 pm Don Lewis wrote:
>> >> On 30 Apr, John Baldwin wrote:
>> >> > On Tuesday, April 29, 2014 9:17:19 pm Don Lewis wrote:
>> >> >> On 29 Apr, John Baldwin wrote:
>> >> >> > On Monday, April 28, 2014 8:56:03 pm Don Lewis wrote:
>> >> >> 
>> >> >> I just took a closer look at the dmesg output from the two kernels.
>> >> >> 
>> >> >> >> agp0: <Intel 82855 host to AGP bridge> on hostb0
>> >> >> >> hostb0: Reserved 0x10000000 bytes for rid 0x10 type 3 at 0xd0000000
>> >> >> 
>> >> >> The above line is different with the r262226 kernel:
>> >> >>  hostb0: Reserved 0x10000000 bytes for rid 0x10 type 3 at 0
>> >> > 
>> >> > Yes, a resource at 0 is going to break things.  9.2 has the NEW_PCIB 
>> > option
>> >> > enabled.  You can try enabling that for 8.4 to see if it fixes this issue.
>> >> > If it does, it narrows down where to look for the bug.
>> >> 
>> >> It behaves the same way with NEW_PCIB.  I see hostb at 0 and then the
>> >> hang shortly thereafter.
>> > 
>> > Ok.  hostb isn't actually behind a bridge so that probably makes sense.  The
>> > one other reporter who sent me debug output had a BAR on his vgapci0 device
>> > that ended up being at 0 as well (and an active BAR at 0 is pretty much
>> > guaranteed to hose a box).
>> > 
>> > Are you up for doing some printf sleuthing?  There are two odd things that I 
>> > see so far:
>> 
>> Yup, I've already started down that path.
>> 
>> > 1) the base address of 0.  The question here is if pci_add_map() in 
>> > sys/dev/pci/pci.c decides to set start to 0 explicitly, or if it happens 
>> > further up the callchain (should be bus_alloc_resource calls in 
>> > sys/dev/acpica/acpi_pcib_acpi.c, sys/x86/x86/nexus.c and then in the
>> > rman code itself in sys/kern/subr_rman.c)
>> > 
>> > 2) The 'reserved' printfs during boot probe.  Those come from a printf in 
>> > pci_alloc_resource() in sys/dev/pci/pci.c.  However, that should not be called 
>> > until a driver attaches to a device and calls bus_alloc_resource().  It should 
>> > not be called from pci_add_child() as it seems to be now.
>> 
>> What I know so far is that for hostb0, pci_alloc_resource() is being
>> called with start=0x0 and end=0xffffffff, resource_list_find() is
>> succeeding, we don't call pci_alloc_map(), and rman_get_start(rle->res)
>> is returning 0.  I don't see a call for pci_add_map() for hostb0 unless
>> it is much earlier and scrolled off the screen.
> 
> The call to pci_add_map() is earlier.  It is called from pci_add_child() when
> we scan the PCI bus during attach of the PCI bus device itself.  In theory,
> pci_alloc_resource() should not be called until a driver actually attaches
> to the device and calls bus_alloc_resource() from its probe or attach routine.
> 
>> For debugging #2, should I back out r262226 so that the machine boots
>> and I can capture the full dmesg buffer?
> 
> Yes.  I think you could just add a panic at that printf line and get a
> backtrace for now as a first step as it is occurring way too early.

pci_alloc_resource()
bus_alloc_resource()
pci_hostb_alloc_resource()
bus_alloc_resource()
agp_generic_attach()
agp_intel_attach()
device_attach()
device_probe_and_attach()
bus_generic_attach()
pci_hostb_attach()
device_attach()
device_probe_and_attach()
bus_generic_attach()
acpi_pci_attach()
device_attach()
device_probe_and_attach()
bus_generic_attach()
acpi_attach()
device_attach()
device_probe_and_attach()
bus_generic_attach()
nexus_acpi_attach()
device_attach()
device_probe_and_attach()
bus_generic_new_pass()
bus_set_pass()
root_bus_configure()
configure()
mi_startup()
begin()

Since the hang is triggered by passing the size as part of the flags
argument to resource_list_alloc(), I tried upgrading subr_rman.c to the
version in HEAD.  No change in behavior :-(






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201405012253.s41Mqvsd037832>