Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Oct 2005 08:09:10 -0600
From:      Scott Long <scottl@samsco.org>
To:        Jacques Caron <jc@oxado.com>
Cc:        freebsd-amd64@freebsd.org, sos@freebsd.org
Subject:   Re: busdma dflt_lock on amd64 > 4 GB
Message-ID:  <435F8E06.9060507@samsco.org>
In-Reply-To: <6.2.3.4.0.20051026131012.03a80a20@pop.interactivemediafactory.net>
References:  <6.2.3.4.0.20051025171333.03a15490@pop.interactivemediafactory.net> <6.2.3.4.0.20051026131012.03a80a20@pop.interactivemediafactory.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Jacques Caron wrote:

> Hi all,
> 
> Continuing on this story... [I took the liberty of CC'ing Scott and 
> Soren], pr is amd64/87977 though it finally isn't amd64-specific but 
>  >4GB-specific.
> 
> There is really a big problem somewhere between ata and bus_dma for 
> boxes with more than 4 GB RAM and more than 2 ata disks:
> * bounce buffers will be needed
> * ata will have bus_dma allocate bounce buffers:
> hw.busdma.zone1.total_bpages: 32
> hw.busdma.zone1.free_bpages: 32
> hw.busdma.zone1.reserved_bpages: 0
> hw.busdma.zone1.active_bpages: 0
> hw.busdma.zone1.total_bounced: 27718
> hw.busdma.zone1.total_deferred: 0
> hw.busdma.zone1.lowaddr: 0xffffffff
> hw.busdma.zone1.alignment: 2
> hw.busdma.zone1.boundary: 65536
> 
> * if I do a dd with a bs=256000, 16 bounce pages will be used (most of 
> the time). As long as I stay on the same disk, no more pages will be used.
> * as soon as I access another disk (e.g. with another dd with the same 
> bs=256000), another set of 16 pages will be used (bus_dma tags and maps 
> are allocated on a per-channel basis), and all 32 bounce pages will be 
> used (most of the time)
> * and if I try to access a third disk, more bounce pages are needed and:
> - one of ata_dmaalloc calls to bus_dma_tag_create has ALLOCNOW set
> - busdma_machdep will not allocate more bounce pages in that case (the 
> limit is imposed by maxsize in that situation, which has already been 
> reached)
> - ata_dmaalloc will fail
> - but some other bus_dma_tag_create call without ALLOCNOW set will still 
> cause bounce pages to be allocated, but deferred, and the non-existent 
> lockfunc to be called, and panic.
> 
> Adding the standard lockfunc will (probably) solve the panic issue, but 
> there will still be a problem with DMA in ata.

Actually, it won't.  It'll result in silent data corruption.  What is
happening is that bus_dmamap_load() is returning EINPROGRESS, but the
ATA driver ignores it and assumes that the load failed.  Later on the
busdma subsystem tries to run the callback but trips over the 
intentional assertion.  If the standard lock was used, then the callback
would succeed and start spamming memory that either had been freed or
is in the process of being used by other ATA commands.

So, the panic is doing exactly what it is supposed to do.  It's guarding
against bugs in the driver.  The workaround for this is to use the 
NOWAIT flag in all instances of bus_dmamap_load() where deferals can
happen.  This, however, means that using bounce pages still remains 
fragile and that the driver is still likely to return ENOMEM to the 
upper layers.  C'est la vie, I guess.  At one time I had patches that
made ATA use the busdma API correctly (it is one of the few remaining
that does not), but they rotted over time.


> 
> The same problems most probably exist with many other drivers.
> 
> I think we thus have two issues:
> - providing a lockfunc in nearly all bus_dma_tag_create calls (or have a 
> better default than a panic)

No.  Some tags specifically should not permit deferals.  A good example
is tags for static memory that is allocated with bus_dmamem_alloc().
Just about every other modern driver honors the API correctly.  iir is
one exception that I can think of, but it'll require a significant
rewrite to fix.

> - allocating more bounce pages when needed in the ALLOCNOW case (with a 
> logic similar to that used to allocate bounce pages in the non-ALLOCNOW 
> case)

Bounce pages cannot be reclaimed to the system, so overallocating just
wastes memory.  The whole point of the deferal mechanism is to allow
you to allocate enough pages for a normal load while also being able to
handle sporadic spikes in load (like when the syncer runs) without
trapping memory.  Eight years of use has shown this to be a good
strategy; FreeBSD continues to perform better under memory pressure than
other operating systems like Linux.

Scott



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?435F8E06.9060507>