Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 Mar 2008 08:44:10 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-hackers@freebsd.org
Cc:        anholt@FreeBSD.org, =?iso-8859-1?q?Fr=E9d=E9ric_PRACA?= <frederic.praca@freebsd-fr.org>
Subject:   Re: Kernel crash on Asus A7N8X-X
Message-ID:  <200803060844.10772.jhb@freebsd.org>
In-Reply-To: <200803060831.27056.jhb@freebsd.org>
References:  <1204671599.47cdd46f6b1e2@imp.free.fr> <200803060831.27056.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 06 March 2008 08:31:26 am John Baldwin wrote:
> On Tuesday 04 March 2008 05:59:59 pm Fr=E9d=E9ric PRACA wrote:
> > Hello dear hackers,
> > I own a Asus A7N8X-X motherboard (NForce2 chipset) with a Radeon 9600
> > video card. After upgrading from 6.3 to 7.0, I launched xorg which
> > crashed the kernel. After looking in the kernel core dump, I found that
> > the
> > agp_nvidia_flush_tlb function of /usr/src/sys/pci/agp_nvidia.c crashed =
on
> > the line 377. The loop fails from the beginning (when i=3D=3D0). I comm=
ented
> > out the two last loops and it seems to work now but as I didn't
> > understand what is this code for, I'd like to have some explanation abo=
ut
> > it and want to know if someone got the same problem.
>
> The Linux AGP driver has the same code.  It appears to be forcing a read =
of
> the TLB registers to force prior writes to clear the TLB entries to flush
> perhaps?  I'm not sure why you are getting a panic.  What kind of fault d=
id
> you get?  (The original kernel panic messages would be needed.)

Actually, it looks like you have a 64MB aperture and with either a 32MB or=
=20
64MB aperture this loop runs off the end of the GATT (GATT has 16384 entrie=
s=20
* 4 bytes =3D=3D 64k =3D=3D 16 pages on x86) so if it dies before it starts=
 the next=20
loop that might explain it.  The patch below makes it walk the full GATT=20
reading the first word from each page to force a flush w/o walking off the=
=20
end of the GATT.

Actually, this is what appears to have happened:

(gdb) set $start =3D 0xd4d05000  (ag_virtual)
(gdb) set $fva =3D 3570491392    (eva in trap_pfault() frame)
(gdb) p ($fva - $start) / 4
$2 =3D 17408

That's well over your current ag_entries of 16384.  Try this patch (note=20
Linux's in-kernel agp driver has the same bug):

Index: agp_nvidia.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /host/cvs/usr/cvs/src/sys/dev/agp/agp_nvidia.c,v
retrieving revision 1.13
diff -u -r1.13 agp_nvidia.c
=2D-- agp_nvidia.c	12 Nov 2007 21:51:37 -0000	1.13
+++ agp_nvidia.c	6 Mar 2008 13:37:43 -0000
@@ -347,7 +347,7 @@
 	struct agp_nvidia_softc *sc;
 	u_int32_t wbc_reg, temp;
 	volatile u_int32_t *ag_virtual;
=2D	int i;
+	int i, pages;
=20
 	sc =3D (struct agp_nvidia_softc *)device_get_softc(dev);
=20
@@ -373,9 +373,10 @@
 	ag_virtual =3D (volatile u_int32_t *)sc->gatt->ag_virtual;
=20
 	/* Flush TLB entries. */
=2D	for(i =3D 0; i < 32 + 1; i++)
+	pages =3D sc->gatt->ag_entries * sizeof(u_int32_t) / PAGE_SIZE;
+	for(i =3D 0; i < pages; i++)
 		temp =3D ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
=2D	for(i =3D 0; i < 32 + 1; i++)
+	for(i =3D 0; i < pages; i++)
 		temp =3D ag_virtual[i * PAGE_SIZE / sizeof(u_int32_t)];
=20
 	return (0);

=2D-=20
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200803060844.10772.jhb>