From owner-freebsd-hackers@FreeBSD.ORG Tue Aug 5 03:22:31 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6595237B404 for ; Tue, 5 Aug 2003 03:22:31 -0700 (PDT) Received: from boeing.ieo-research.it (devil.ieo-research.it [193.204.98.150]) by mx1.FreeBSD.org (Postfix) with ESMTP id 938CC43FB1 for ; Tue, 5 Aug 2003 03:22:28 -0700 (PDT) (envelope-from andrea.cocito@ieo-research.it) Received: (qmail 9199 invoked from network); 5 Aug 2003 10:22:22 -0000 Received: from unknown (HELO ieo-research.it) (acocito@[62.211.141.65]) (envelope-sender ) by smtp.ieo-research.it (qmail-ldap-1.03) with SMTP for ; 5 Aug 2003 10:22:22 -0000 Mime-Version: 1.0 (Apple Message framework v578) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; format=flowed To: freebsd-hackers@freebsd.org From: Andrea Cocito Date: Tue, 5 Aug 2003 12:22:13 +0200 X-Mailer: Apple Mail (2.578) Subject: Patch for Alladin dallas (ALi) AGP kernel panic [long] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Aug 2003 10:22:31 -0000 Hallo, first of all: I am completely new to both FreeBSD kernel internals and to PC-Intel hardware stuff, and the last time I put my hands on a *BSD kernel was a few years ago, so I might be wrong.... but still: it works and for sure fixed an existing bug. My machine did not like to go beyond 4.7. Kernel panic after AGP probing, trying to contigmalloc1() with size = 0; the problem seem to be shared by several machines based on the ALi chipset. Compiling a custom kernel without the "agp" device worked. Looking into the code and several panic outputs I found these issues: - I am almost sure that the box here does not have an AGP bus at all (it has an onboard video, maybe an internal AGP bus is there but there is not a connector for sure). 4.7 did not show any agp* device at boot, 4.8 and 5.* without agp* just see the vga* on isa* and work. So *maybe* someone could investigate if the device is really an AGP bus... (or give me a clue on how to check it). - If it is an agp bus then for some reason it reports an aperture size of zero, does this make any sense ? Again: my knowledge of agp stuff is NULL. Could we leave the device attached without allocating anything there ? - In /src/sys/pci/agp_ali.c and others there is a loop that supposedly tries to alloc smaller apertures until it either fails reaching zero size or manages to allocate something: no matter what the two points above result to be the thing is just broken (as is will either alloc the requested size at the first shot, crash if it was zero, or fail if it ever tries to reduce the aperture), this is what I patched. The attached patch fixes the said loop to make it do something meaningful (try to malloc a progressively smaller aperture, until it reaches zero or succeeds, if it fails detaches the device and returns ENOMEM). As said: whatever the real origin of the problem was this thing was just broken and now does what the original code probably intended to do so that at least now the system boots. Maybe this helps also others having troubles with some Toshiba laptops using that chipset and that reported the same panic on several lists. Pasted down here the patch -rc3 for pci_ali.c. The same loop should be fixed on several other pci_*.c sources, let me know what patch style is preferred and if the list supports attachments and I'll be glad to send the complete diff for all pertinent files. Also let me know if returning ENOMEM when we are requested to have an aperture size of 0 is ok or I should better have it return EINVAL (this option looks better to me). Then is up to someone knowing a bit better the hardware and kernel internals to work on the real solution (understand if we fail the probe and this is not really an agp bus, or find a way to know correctly the aperture size). Ciao, A. ========= CUT HERE ========== *** agp_ali.c.unpatched Mon Aug 4 09:25:13 2003 --- agp_ali.c Mon Aug 4 12:13:44 2003 *************** *** 101,121 **** return error; sc->initial_aperture = AGP_GET_APERTURE(dev); ! for (;;) { gatt = agp_alloc_gatt(dev); if (gatt) break; ! ! /* ! * Probably contigmalloc failure. Try reducing the ! * aperture so that the gatt size reduces. ! */ ! if (AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2)) { agp_generic_detach(dev); return ENOMEM; - } } sc->gatt = gatt; /* Install the gatt. */ --- 101,120 ---- return error; sc->initial_aperture = AGP_GET_APERTURE(dev); + gatt = NULL; ! while (AGP_GET_APERTURE(dev) != 0) { gatt = agp_alloc_gatt(dev); if (gatt) break; ! AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2); ! } ! ! if (!gatt) { agp_generic_detach(dev); return ENOMEM; } + sc->gatt = gatt; /* Install the gatt. */ ========= CUT HERE ========== PS: cc: me on the replies please, I'm not on the list, thanks. ---------- Andrea Cocito < andrea.cocito@ieo-research.it > IEO -- European Institute of Oncology - Bioinformatics group tel: +39 02 57489 857 fax: +39 02 57489 851 "Imagination is more important than knowledge" -Albert Einstein