From owner-freebsd-amd64@FreeBSD.ORG Wed Oct 26 14:09:13 2005 Return-Path: X-Original-To: freebsd-amd64@freebsd.org Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 81B9F16A41F; Wed, 26 Oct 2005 14:09:13 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id BEA2B43D5A; Wed, 26 Oct 2005 14:09:12 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.14] (imini.samsco.home [192.168.254.14]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id j9QE9AZ5011970; Wed, 26 Oct 2005 08:09:10 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <435F8E06.9060507@samsco.org> Date: Wed, 26 Oct 2005 08:09:10 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.7) Gecko/20050416 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jacques Caron References: <6.2.3.4.0.20051025171333.03a15490@pop.interactivemediafactory.net> <6.2.3.4.0.20051026131012.03a80a20@pop.interactivemediafactory.net> In-Reply-To: <6.2.3.4.0.20051026131012.03a80a20@pop.interactivemediafactory.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.1.0 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on pooker.samsco.org Cc: freebsd-amd64@freebsd.org, sos@freebsd.org Subject: Re: busdma dflt_lock on amd64 > 4 GB X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Oct 2005 14:09:13 -0000 Jacques Caron wrote: > Hi all, > > Continuing on this story... [I took the liberty of CC'ing Scott and > Soren], pr is amd64/87977 though it finally isn't amd64-specific but > >4GB-specific. > > There is really a big problem somewhere between ata and bus_dma for > boxes with more than 4 GB RAM and more than 2 ata disks: > * bounce buffers will be needed > * ata will have bus_dma allocate bounce buffers: > hw.busdma.zone1.total_bpages: 32 > hw.busdma.zone1.free_bpages: 32 > hw.busdma.zone1.reserved_bpages: 0 > hw.busdma.zone1.active_bpages: 0 > hw.busdma.zone1.total_bounced: 27718 > hw.busdma.zone1.total_deferred: 0 > hw.busdma.zone1.lowaddr: 0xffffffff > hw.busdma.zone1.alignment: 2 > hw.busdma.zone1.boundary: 65536 > > * if I do a dd with a bs=256000, 16 bounce pages will be used (most of > the time). As long as I stay on the same disk, no more pages will be used. > * as soon as I access another disk (e.g. with another dd with the same > bs=256000), another set of 16 pages will be used (bus_dma tags and maps > are allocated on a per-channel basis), and all 32 bounce pages will be > used (most of the time) > * and if I try to access a third disk, more bounce pages are needed and: > - one of ata_dmaalloc calls to bus_dma_tag_create has ALLOCNOW set > - busdma_machdep will not allocate more bounce pages in that case (the > limit is imposed by maxsize in that situation, which has already been > reached) > - ata_dmaalloc will fail > - but some other bus_dma_tag_create call without ALLOCNOW set will still > cause bounce pages to be allocated, but deferred, and the non-existent > lockfunc to be called, and panic. > > Adding the standard lockfunc will (probably) solve the panic issue, but > there will still be a problem with DMA in ata. Actually, it won't. It'll result in silent data corruption. What is happening is that bus_dmamap_load() is returning EINPROGRESS, but the ATA driver ignores it and assumes that the load failed. Later on the busdma subsystem tries to run the callback but trips over the intentional assertion. If the standard lock was used, then the callback would succeed and start spamming memory that either had been freed or is in the process of being used by other ATA commands. So, the panic is doing exactly what it is supposed to do. It's guarding against bugs in the driver. The workaround for this is to use the NOWAIT flag in all instances of bus_dmamap_load() where deferals can happen. This, however, means that using bounce pages still remains fragile and that the driver is still likely to return ENOMEM to the upper layers. C'est la vie, I guess. At one time I had patches that made ATA use the busdma API correctly (it is one of the few remaining that does not), but they rotted over time. > > The same problems most probably exist with many other drivers. > > I think we thus have two issues: > - providing a lockfunc in nearly all bus_dma_tag_create calls (or have a > better default than a panic) No. Some tags specifically should not permit deferals. A good example is tags for static memory that is allocated with bus_dmamem_alloc(). Just about every other modern driver honors the API correctly. iir is one exception that I can think of, but it'll require a significant rewrite to fix. > - allocating more bounce pages when needed in the ALLOCNOW case (with a > logic similar to that used to allocate bounce pages in the non-ALLOCNOW > case) Bounce pages cannot be reclaimed to the system, so overallocating just wastes memory. The whole point of the deferal mechanism is to allow you to allocate enough pages for a normal load while also being able to handle sporadic spikes in load (like when the syncer runs) without trapping memory. Eight years of use has shown this to be a good strategy; FreeBSD continues to perform better under memory pressure than other operating systems like Linux. Scott