Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 31 Jul 2004 11:10:10 +0200
From:      =?ISO-8859-1?Q?S=F8ren_Schmidt?= <sos@DeepCore.dk>
To:        Nate Lawson <nate@root.org>
Cc:        current@freebsd.org
Subject:   Re: memory corruption/panic solved ("FAILURE - ATAPI_IDENTIFY no interrupt")
Message-ID:  <410B61F2.90504@DeepCore.dk>
In-Reply-To: <410AD054.8070202@root.org>
References:  <410AD054.8070202@root.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Nate Lawson wrote:
> I've tracked down the source of the memory corruption in -current that
> results when booting with various CD and DVD drives (especially the ones
> that come with Thinkpads including T23, R32, T41, etc.)  The panic is
> obvious when running with INVARIANTS ("memory modified after free") but
> not so obvious in other configurations.  For instance, without
> INVARIANTS, part of the rt_info structure is corrupted on my wireless
> card, resulting in a panic during ifconfig on boot.  This is likely the
> source of other problems, including phk's ACPI panic (again, only
> triggered when booting with the CD drive in the bay.)

OK, first thanks for digging into this!

> The root problem is that ata_timeout() fires and calls ata_pio_read()
> which overwrites 512 bytes random memory.  There are actually two bugs 
> here that overwrite memory.  The code path is as follows:
> 
> 1. ata runs an IDENTIFY command on each drive.  It reaches this stack:
> 
> ata_getparam()
> ata_identify_devices()
> ata_boot_attach()
> 
> 2. ata_getparam() allocates a request and runs it:
> ata_alloc_request()
> loop on retries (2 max)
> fill out an immediate read request for 512 bytes (DEV_BSIZE)
> *** Bug 1: transfersize is 512 bytes but sizeof(struct ata_request) is 
> much less (~80 bytes).
> ata_queue_request() starts the request and arms a timeout

Its correct to allocate via ata_alloc_request, the data is *not* put 
into the request, but into another memory area (atadev->param) so this 
is not a bug.

> 3. Completion for first request results in FAILURE - no interrupt. 
> ata_completed() calls ata_generic_interrupt() which calls ata_pio_read()
> *** Bug 1: here's where the 512 - sizeof(struct ata_request) bytes 
> following "request" are overwritten.

See above, the data hits the malloc'd buffer (atadev->param) *not* the 
request.

> . The completion code also updates request->donecount (an 
> upward-counting residual) from 0 to 512 after the request has been 
> completed.
> 
> 4. ata_getparam runs through the loop again and starts another identify
> *** Bug 2: donecount is now 512 so ata_pio_read() will write on 
> request->data + 512 with 512 bytes.  Whatever follows request->data + 
> 512 is overwritten.

Thats the real bug, donecount should be reset on retry..

> @@ -561,11 +561,12 @@
>  		request->data = (caddr_t)atadev->param;
>  		request->bytecount = sizeof(struct ata_params);
>  		request->transfersize = DEV_BSIZE;
> +		request->donecount = 0;
>  		ata_queue_request(request);
>  		if (!(error = request->result))
>  		    break;
>  	    }

This part should be all thats needed IMHO...

-Søren



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?410B61F2.90504>