From owner-freebsd-current@FreeBSD.ORG Sat Jul 31 09:10:45 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9624F16A4CE for ; Sat, 31 Jul 2004 09:10:45 +0000 (GMT) Received: from spider.deepcore.dk (cpe.atm2-0-53484.0x50a6c9a6.abnxx9.customer.tele.dk [80.166.201.166]) by mx1.FreeBSD.org (Postfix) with ESMTP id CCC3843D69 for ; Sat, 31 Jul 2004 09:10:44 +0000 (GMT) (envelope-from sos@DeepCore.dk) Received: from [192.168.0.132] ([192.168.0.132]) by spider.deepcore.dk (8.12.11/8.12.10) with ESMTP id i6V9AHWs022598; Sat, 31 Jul 2004 11:10:22 +0200 (CEST) (envelope-from sos@DeepCore.dk) Message-ID: <410B61F2.90504@DeepCore.dk> Date: Sat, 31 Jul 2004 11:10:10 +0200 From: =?ISO-8859-1?Q?S=F8ren_Schmidt?= User-Agent: Mozilla Thunderbird 0.7.1 (X11/20040711) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Nate Lawson References: <410AD054.8070202@root.org> In-Reply-To: <410AD054.8070202@root.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-mail-scanned: by DeepCore Virus & Spam killer v1.4 cc: current@freebsd.org Subject: Re: memory corruption/panic solved ("FAILURE - ATAPI_IDENTIFY no interrupt") X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Jul 2004 09:10:45 -0000 Nate Lawson wrote: > I've tracked down the source of the memory corruption in -current that > results when booting with various CD and DVD drives (especially the ones > that come with Thinkpads including T23, R32, T41, etc.) The panic is > obvious when running with INVARIANTS ("memory modified after free") but > not so obvious in other configurations. For instance, without > INVARIANTS, part of the rt_info structure is corrupted on my wireless > card, resulting in a panic during ifconfig on boot. This is likely the > source of other problems, including phk's ACPI panic (again, only > triggered when booting with the CD drive in the bay.) OK, first thanks for digging into this! > The root problem is that ata_timeout() fires and calls ata_pio_read() > which overwrites 512 bytes random memory. There are actually two bugs > here that overwrite memory. The code path is as follows: > > 1. ata runs an IDENTIFY command on each drive. It reaches this stack: > > ata_getparam() > ata_identify_devices() > ata_boot_attach() > > 2. ata_getparam() allocates a request and runs it: > ata_alloc_request() > loop on retries (2 max) > fill out an immediate read request for 512 bytes (DEV_BSIZE) > *** Bug 1: transfersize is 512 bytes but sizeof(struct ata_request) is > much less (~80 bytes). > ata_queue_request() starts the request and arms a timeout Its correct to allocate via ata_alloc_request, the data is *not* put into the request, but into another memory area (atadev->param) so this is not a bug. > 3. Completion for first request results in FAILURE - no interrupt. > ata_completed() calls ata_generic_interrupt() which calls ata_pio_read() > *** Bug 1: here's where the 512 - sizeof(struct ata_request) bytes > following "request" are overwritten. See above, the data hits the malloc'd buffer (atadev->param) *not* the request. > . The completion code also updates request->donecount (an > upward-counting residual) from 0 to 512 after the request has been > completed. > > 4. ata_getparam runs through the loop again and starts another identify > *** Bug 2: donecount is now 512 so ata_pio_read() will write on > request->data + 512 with 512 bytes. Whatever follows request->data + > 512 is overwritten. Thats the real bug, donecount should be reset on retry.. > @@ -561,11 +561,12 @@ > request->data = (caddr_t)atadev->param; > request->bytecount = sizeof(struct ata_params); > request->transfersize = DEV_BSIZE; > + request->donecount = 0; > ata_queue_request(request); > if (!(error = request->result)) > break; > } This part should be all thats needed IMHO... -Søren