From owner-freebsd-stable@FreeBSD.ORG Fri May 2 18:48:13 2014 Return-Path: Delivered-To: stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A5EDAF7; Fri, 2 May 2014 18:48:13 +0000 (UTC) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 79B641CB0; Fri, 2 May 2014 18:48:13 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id s42Im5ld040702; Fri, 2 May 2014 11:48:09 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201405021848.s42Im5ld040702@gw.catspoiler.org> Date: Fri, 2 May 2014 11:48:05 -0700 (PDT) From: Don Lewis Subject: Re: Thinkpad R60 hangs when booting recent 8.4-STABLE To: jhb@FreeBSD.org In-Reply-To: <201405021405.32570.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: stable@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 May 2014 18:48:13 -0000 On 2 May, John Baldwin wrote: > On Friday, May 02, 2014 5:39:19 am Don Lewis wrote: >> On 1 May, John Baldwin wrote: >> > On Thursday, May 01, 2014 1:55:41 am Don Lewis wrote: >> >> On 30 Apr, John Baldwin wrote: >> >> >> >> > Are you up for doing some printf sleuthing? There are two odd things that I >> >> > see so far: >> >> > >> >> > 1) the base address of 0. The question here is if pci_add_map() in >> >> > sys/dev/pci/pci.c decides to set start to 0 explicitly, or if it happens >> >> > further up the callchain (should be bus_alloc_resource calls in >> >> > sys/dev/acpica/acpi_pcib_acpi.c, sys/x86/x86/nexus.c and then in the >> >> > rman code itself in sys/kern/subr_rman.c) >> >> > >> >> > 2) The 'reserved' printfs during boot probe. Those come from a printf in >> >> > pci_alloc_resource() in sys/dev/pci/pci.c. However, that should not be called >> >> > until a driver attaches to a device and calls bus_alloc_resource(). It should >> >> > not be called from pci_add_child() as it seems to be now. >> >> >> >> The call graph for the four earlier ones that you previously pointed >> >> out (not hostb0) is: >> >> pci_add_child() >> >> pci_add_resources() >> >> *_early_takeover() >> >> [I suspect] >> >> bus_alloc_resource_any() >> >> pci_alloc_resource() >> >> >> >> These are the three system uhci controllers and the system ehci >> >> controller, which apparently pass this test: >> >> pci_get_class(dev) == PCIC_SERIALBUS && >> >> pci_get_subclass(dev) == PCIS_SERIALBUS_USB >> > >> > Oh, ok. That is fine, and that explains why it was selective in the past (and >> > only for I/O resources). That just leaves 1) then. It would be especially good >> > to know what pci_add_map() does when it sees this BAR during the bus probe. >> >> In either case after reading the BAR, base=0xd0000000, >> testval=0xf0000008. and the following gets printed: >> map[10]: type Prefetchable Memory, range 32, base 0xd0000000, size 28, enabled >> Then we call resource_list_add(..., d0000000, dfffffff, 10000000) >> >> When we don't pass the size in flags (w/o r262226), >> resource_list_alloc() eventually calls nexus_alloc_resource() with the >> expected start, end, and count values. After returning from >> pci_add_map, we write 0xd0000000 to the BAR. >> >> If we *do* pass the size in flags (with r262226), then >> resource_list_alloc() never calls nexus_alloc_resource(), but it does >> return a non-NULL value. We then write 0 to the BAR. >> >> Digging into resource_list_alloc(), I see start, count, and end having >> the expected values before the second call to BUS_ALLOC_RESOURCE() >> but afterwards rman_get_start(rle->res) is 0 and rman_get_end(rle->res) >> is 0xfffffff. We don't call nexus_alloc_resource(). Setting flags to 0 >> for this specific call to BUS_ALLOC_RESOURCE() based on the value of >> start fixes the problem. >> >> The culprit appears to be the call to rman_reserve_resource() in >> acpi_alloc_resource(). With r262226 it "succeeds", but the returned >> resource has the wrong start and end addresses. Without r262226, >> rman_reserve_resource returns NULL and we call BUS_ALLOC_RESOURCE(), >> which calls nexus_alloc_resource(), which returns the expected range. >> >> I enabled debug.rman_debug and saw the following: >> >> rman_reserve_resource_bound: request: >> [0xd0000000, 0xdfffffff], length 0x10000000, flags 28736, device (null) >> considering [0xfec00000, 0xffffffff] >> truncated region: [0, 0xdfffffff]; size 0xe0000000 (requested >> 0x10000000) >> candidate region [0, 0xdfffffff], size 0xe0000000 >> allocating at the end >> >> This is with subr_rman.c from HEAD. >> >> This expression will overflow: >> if (s->r_start + count - 1 > end) { >> I don't understand the point of this test. Is it an optimization to >> exit the loop early based on an assumption about the ordering of >> elements in the list? > > Yes. The elements in an rman list are supposed to be sorted. > >> This expression will also overflow: >> rstart = (rstart + amask) & ~amask; >> which is where I think the start address is getting set to 0. >> >> >> After reverting subr_rman.c and applying the following patch, I see: >> considering [0xfec00000, 0xffffffff] >> no unshared regions found >> Then nexus_alloc_resource() gets called and I see >> considering [0x3ff60000, 0xfebfffff] >> and then all the right stuff happens. >> >> >> Index: sys/kern/subr_rman.c >> =================================================================== >> --- sys/kern/subr_rman.c (revision 262226) >> +++ sys/kern/subr_rman.c (working copy) >> @@ -468,11 +468,9 @@ >> */ >> for (s = r; s; s = TAILQ_NEXT(s, r_link)) { >> DPRINTF(("considering [%#lx, %#lx]\n", s->r_start, s->r_end)); >> - if (s->r_start + count - 1 > end) { >> - DPRINTF(("s->r_start (%#lx) + count - 1> end (%#lx)\n", >> - s->r_start, end)); >> - break; >> - } >> + if (s->r_end < start + count - 1 || >> + s->r_start > end - count + 1) >> + continue; >> if (s->r_flags & RF_ALLOCATED) { >> DPRINTF(("region is allocated\n")); >> continue; >> >> > > I think this looks good. > >> I have no idea why 9.2-STABLE was able to boot on this machine ... > > acpi_alloc_resource() works differently in 9.x and later. It only > tries the internal rmans if nexus_alloc_resource() fails. In 8.x it > tries the internal rmans first. That change was made in r216674. Yeah, I enabled rman debugging and booted a 9.2 snapshot CD. I only saw one call to rman_reserve_resource_bound(), which looked like it was from nexus_alloc_resource(). Here's a more paranoid patch, though I don't like the code duplication between the two loops: Index: sys/kern/subr_rman.c =================================================================== --- sys/kern/subr_rman.c (revision 262226) +++ sys/kern/subr_rman.c (working copy) @@ -461,6 +461,11 @@ } amask = (1ul << RF_ALIGNMENT(flags)) - 1; + if (start + amask < start) { + DPRINTF(("start+amask wrapped around\n")); + goto out; + } + /* If bound is 0, bmask will also be 0 */ bmask = ~(bound - 1); /* @@ -468,11 +473,10 @@ */ for (s = r; s; s = TAILQ_NEXT(s, r_link)) { DPRINTF(("considering [%#lx, %#lx]\n", s->r_start, s->r_end)); - if (s->r_start + count - 1 > end) { - DPRINTF(("s->r_start (%#lx) + count - 1> end (%#lx)\n", - s->r_start, end)); - break; - } + if (s->r_end < start + count - 1 || + s->r_start > end - count + 1 || + s->r_start + amask < s->r_start) + continue; if (s->r_flags & RF_ALLOCATED) { DPRINTF(("region is allocated\n")); continue; @@ -584,8 +588,10 @@ goto out; for (s = r; s; s = TAILQ_NEXT(s, r_link)) { - if (s->r_start > end) - break; + if (s->r_end < start + count - 1 || + s->r_start > end - count + 1 || + s->r_start + amask < s->r_start) + continue; if ((s->r_flags & flags) != flags) continue; rstart = ulmax(s->r_start, start);