From owner-freebsd-stable@FreeBSD.ORG  Fri May  2 18:48:13 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A5EDAF7;
 Fri,  2 May 2014 18:48:13 +0000 (UTC)
Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 79B641CB0;
 Fri,  2 May 2014 18:48:13 +0000 (UTC)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
 by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id s42Im5ld040702;
 Fri, 2 May 2014 11:48:09 -0700 (PDT)
 (envelope-from truckman@FreeBSD.org)
Message-Id: <201405021848.s42Im5ld040702@gw.catspoiler.org>
Date: Fri, 2 May 2014 11:48:05 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
Subject: Re: Thinkpad R60 hangs when booting recent 8.4-STABLE
To: jhb@FreeBSD.org
In-Reply-To: <201405021405.32570.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
Cc: stable@FreeBSD.org
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 May 2014 18:48:13 -0000

On  2 May, John Baldwin wrote:
> On Friday, May 02, 2014 5:39:19 am Don Lewis wrote:
>> On  1 May, John Baldwin wrote:
>> > On Thursday, May 01, 2014 1:55:41 am Don Lewis wrote:
>> >> On 30 Apr, John Baldwin wrote:
>> >> 
>> >> > Are you up for doing some printf sleuthing?  There are two odd things that I 
>> >> > see so far:
>> >> > 
>> >> > 1) the base address of 0.  The question here is if pci_add_map() in 
>> >> > sys/dev/pci/pci.c decides to set start to 0 explicitly, or if it happens 
>> >> > further up the callchain (should be bus_alloc_resource calls in 
>> >> > sys/dev/acpica/acpi_pcib_acpi.c, sys/x86/x86/nexus.c and then in the
>> >> > rman code itself in sys/kern/subr_rman.c)
>> >> > 
>> >> > 2) The 'reserved' printfs during boot probe.  Those come from a printf in 
>> >> > pci_alloc_resource() in sys/dev/pci/pci.c.  However, that should not be called 
>> >> > until a driver attaches to a device and calls bus_alloc_resource().  It should 
>> >> > not be called from pci_add_child() as it seems to be now.
>> >> 
>> >> The call graph for the four earlier ones that you previously pointed
>> >> out (not hostb0) is:
>> >> 	pci_add_child()
>> >> 		pci_add_resources()
>> >> 			*_early_takeover()
>> >> 				[I suspect]
>> >> 				bus_alloc_resource_any()
>> >> 					pci_alloc_resource()
>> >> 					
>> >> These are the three system uhci controllers and the system ehci
>> >> controller, which apparently pass this test:
>> >> 	pci_get_class(dev) == PCIC_SERIALBUS &&
>> >> 	pci_get_subclass(dev) == PCIS_SERIALBUS_USB
>> > 
>> > Oh, ok.  That is fine, and that explains why it was selective in the past (and
>> > only for I/O resources).  That just leaves 1) then.  It would be especially good
>> > to know what pci_add_map() does when it sees this BAR during the bus probe.
>> 
>> In either case after reading the BAR, base=0xd0000000,
>> testval=0xf0000008. and the following gets printed:
>>         map[10]: type Prefetchable Memory, range 32, base 0xd0000000, size 28, enabled
>> Then we call resource_list_add(..., d0000000, dfffffff, 10000000)
>> 
>> When we don't pass the size in flags (w/o r262226),
>> resource_list_alloc() eventually calls nexus_alloc_resource() with the
>> expected start, end, and count values.  After returning from
>> pci_add_map, we write 0xd0000000 to the BAR.
>> 
>> If we *do* pass the size in flags (with r262226), then
>> resource_list_alloc() never calls nexus_alloc_resource(), but it does
>> return a non-NULL value.  We then write 0 to the BAR.
>> 
>> Digging into resource_list_alloc(), I see start, count, and end having
>> the expected values before the second call to BUS_ALLOC_RESOURCE()
>> but afterwards rman_get_start(rle->res) is 0 and rman_get_end(rle->res)
>> is 0xfffffff.  We don't call nexus_alloc_resource(). Setting flags to 0
>> for this specific call to BUS_ALLOC_RESOURCE() based on the value of
>> start fixes the problem.
>> 
>> The culprit appears to be the call to rman_reserve_resource() in
>> acpi_alloc_resource().  With r262226 it "succeeds", but the returned
>> resource has the wrong start and end addresses.  Without r262226,
>> rman_reserve_resource returns NULL and we call BUS_ALLOC_RESOURCE(),
>> which calls nexus_alloc_resource(), which returns the expected range.
>> 
>> I enabled debug.rman_debug and saw the following:
>> 
>> rman_reserve_resource_bound: <ACPI I/O memory addresses> request:
>> [0xd0000000, 0xdfffffff], length 0x10000000, flags 28736, device (null)
>> considering [0xfec00000, 0xffffffff]
>> truncated region: [0, 0xdfffffff]; size 0xe0000000 (requested
>> 0x10000000)
>> candidate region [0, 0xdfffffff], size 0xe0000000
>> allocating at the end
>> 
>> This is with subr_rman.c from HEAD.
>> 
>> This expression will overflow:
>> 		if (s->r_start + count - 1 > end) {
>> I don't understand the point of this test.  Is it an optimization to
>> exit the loop early based on an assumption about the ordering of
>> elements in the list?
> 
> Yes.  The elements in an rman list are supposed to be sorted.
> 
>> This expression will also overflow:
>> 			rstart = (rstart + amask) & ~amask;
>> which is where I think the start address is getting set to 0.
>> 
>> 
>> After reverting subr_rman.c and applying the following patch, I see:
>> considering [0xfec00000, 0xffffffff]
>> no unshared regions found
>> Then nexus_alloc_resource() gets called and I see
>> considering [0x3ff60000, 0xfebfffff]
>> and then all the right stuff happens.
>> 
>> 
>> Index: sys/kern/subr_rman.c
>> ===================================================================
>> --- sys/kern/subr_rman.c	(revision 262226)
>> +++ sys/kern/subr_rman.c	(working copy)
>> @@ -468,11 +468,9 @@
>>  	 */
>>  	for (s = r; s; s = TAILQ_NEXT(s, r_link)) {
>>  		DPRINTF(("considering [%#lx, %#lx]\n", s->r_start, s->r_end));
>> -		if (s->r_start + count - 1 > end) {
>> -			DPRINTF(("s->r_start (%#lx) + count - 1> end (%#lx)\n",
>> -			    s->r_start, end));
>> -			break;
>> -		}
>> +		if (s->r_end < start + count - 1 ||
>> +		    s->r_start > end - count + 1)
>> +			continue;
>>  		if (s->r_flags & RF_ALLOCATED) {
>>  			DPRINTF(("region is allocated\n"));
>>  			continue;
>> 
>> 
> 
> I think this looks good.
> 
>> I have no idea why 9.2-STABLE was able to boot on this machine ...
> 
> acpi_alloc_resource() works differently in 9.x and later.  It only
> tries the internal rmans if nexus_alloc_resource() fails.  In 8.x it
> tries the internal rmans first.  That change was made in r216674.

Yeah, I enabled rman debugging and booted a 9.2 snapshot CD.  I only saw
one call to rman_reserve_resource_bound(), which looked like it was from
nexus_alloc_resource().

Here's a more paranoid patch, though I don't like the code duplication
between the two loops:


Index: sys/kern/subr_rman.c
===================================================================
--- sys/kern/subr_rman.c	(revision 262226)
+++ sys/kern/subr_rman.c	(working copy)
@@ -461,6 +461,11 @@
 	}
 
 	amask = (1ul << RF_ALIGNMENT(flags)) - 1;
+	if (start + amask < start) {
+		DPRINTF(("start+amask wrapped around\n"));
+		goto out;
+	}
+
 	/* If bound is 0, bmask will also be 0 */
 	bmask = ~(bound - 1);
 	/*
@@ -468,11 +473,10 @@
 	 */
 	for (s = r; s; s = TAILQ_NEXT(s, r_link)) {
 		DPRINTF(("considering [%#lx, %#lx]\n", s->r_start, s->r_end));
-		if (s->r_start + count - 1 > end) {
-			DPRINTF(("s->r_start (%#lx) + count - 1> end (%#lx)\n",
-			    s->r_start, end));
-			break;
-		}
+		if (s->r_end < start + count - 1 ||
+		    s->r_start > end - count + 1 ||
+		    s->r_start + amask < s->r_start)
+			continue;
 		if (s->r_flags & RF_ALLOCATED) {
 			DPRINTF(("region is allocated\n"));
 			continue;
@@ -584,8 +588,10 @@
 		goto out;
 
 	for (s = r; s; s = TAILQ_NEXT(s, r_link)) {
-		if (s->r_start > end)
-			break;
+		if (s->r_end < start + count - 1 ||
+		    s->r_start > end - count + 1 ||
+		    s->r_start + amask < s->r_start)
+			continue;
 		if ((s->r_flags & flags) != flags)
 			continue;
 		rstart = ulmax(s->r_start, start);