From owner-freebsd-amd64@FreeBSD.ORG Wed Oct 26 15:11:20 2005 Return-Path: X-Original-To: freebsd-amd64@FreeBSD.ORG Delivered-To: freebsd-amd64@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C435116A41F for ; Wed, 26 Oct 2005 15:11:20 +0000 (GMT) (envelope-from sos@FreeBSD.ORG) Received: from spider.deepcore.dk (cpe.atm2-0-53484.0x50a6c9a6.abnxx9.customer.tele.dk [80.166.201.166]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4073143D45 for ; Wed, 26 Oct 2005 15:11:20 +0000 (GMT) (envelope-from sos@FreeBSD.ORG) Received: from [194.192.25.136] (mac.deepcore.dk [194.192.25.136]) by spider.deepcore.dk (8.13.4/8.13.3) with ESMTP id j9QFAg8e083943; Wed, 26 Oct 2005 17:10:42 +0200 (CEST) (envelope-from sos@FreeBSD.ORG) In-Reply-To: <435F8E06.9060507@samsco.org> References: <6.2.3.4.0.20051025171333.03a15490@pop.interactivemediafactory.net> <6.2.3.4.0.20051026131012.03a80a20@pop.interactivemediafactory.net> <435F8E06.9060507@samsco.org> Mime-Version: 1.0 (Apple Message framework v734) Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: <08A81034-AB5D-4BFC-8F53-21501073D674@FreeBSD.ORG> Content-Transfer-Encoding: quoted-printable From: =?ISO-8859-1?Q?S=F8ren_Schmidt?= Date: Wed, 26 Oct 2005 17:11:14 +0200 To: Scott Long X-Mailer: Apple Mail (2.734) X-mail-scanned: by DeepCore Virus & Spam killer v1.12 Cc: freebsd-amd64@FreeBSD.ORG Subject: Re: busdma dflt_lock on amd64 > 4 GB X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Oct 2005 15:11:21 -0000 On 26/10/2005, at 16:09, Scott Long wrote: > Jacques Caron wrote: >> Hi all, >> Continuing on this story... [I took the liberty of CC'ing Scott =20 >> and Soren], pr is amd64/87977 though it finally isn't amd64-=20 >> specific but >4GB-specific. >> There is really a big problem somewhere between ata and bus_dma =20 >> for boxes with more than 4 GB RAM and more than 2 ata disks: >> * bounce buffers will be needed >> * ata will have bus_dma allocate bounce buffers: >> hw.busdma.zone1.total_bpages: 32 >> hw.busdma.zone1.free_bpages: 32 >> hw.busdma.zone1.reserved_bpages: 0 >> hw.busdma.zone1.active_bpages: 0 >> hw.busdma.zone1.total_bounced: 27718 >> hw.busdma.zone1.total_deferred: 0 >> hw.busdma.zone1.lowaddr: 0xffffffff >> hw.busdma.zone1.alignment: 2 >> hw.busdma.zone1.boundary: 65536 >> * if I do a dd with a bs=3D256000, 16 bounce pages will be used =20 >> (most of the time). As long as I stay on the same disk, no more =20 >> pages will be used. >> * as soon as I access another disk (e.g. with another dd with the =20 >> same bs=3D256000), another set of 16 pages will be used (bus_dma =20 >> tags and maps are allocated on a per-channel basis), and all 32 =20 >> bounce pages will be used (most of the time) >> * and if I try to access a third disk, more bounce pages are =20 >> needed and: >> - one of ata_dmaalloc calls to bus_dma_tag_create has ALLOCNOW set >> - busdma_machdep will not allocate more bounce pages in that case =20 >> (the limit is imposed by maxsize in that situation, which has =20 >> already been reached) >> - ata_dmaalloc will fail >> - but some other bus_dma_tag_create call without ALLOCNOW set will =20= >> still cause bounce pages to be allocated, but deferred, and the =20 >> non-existent lockfunc to be called, and panic. >> Adding the standard lockfunc will (probably) solve the panic =20 >> issue, but there will still be a problem with DMA in ata. >> > > Actually, it won't. It'll result in silent data corruption. What is > happening is that bus_dmamap_load() is returning EINPROGRESS, but the > ATA driver ignores it and assumes that the load failed. Later on the > busdma subsystem tries to run the callback but trips over the =20 > intentional assertion. If the standard lock was used, then the =20 > callback > would succeed and start spamming memory that either had been freed or > is in the process of being used by other ATA commands. Ehm, according to the man page the load should succed for at least =20 one map when the ALLOCNOW flag is set. ATA only use one map so there =20 is no way that spamming can happen. The bug i ATA is that the sg_tag and the work_tag is not created with =20= the ALLOCNOW flag so if all resources are used before they are called =20= things get messy. The below patch takes care of that problem. > So, the panic is doing exactly what it is supposed to do. It's =20 > guarding > against bugs in the driver. The workaround for this is to use the =20 > NOWAIT flag in all instances of bus_dmamap_load() where deferals can > happen. This, however, means that using bounce pages still remains =20= > fragile and that the driver is still likely to return ENOMEM to the =20= > upper layers. C'est la vie, I guess. At one time I had patches that > made ATA use the busdma API correctly (it is one of the few remaining > that does not), but they rotted over time. As long as ATA doesn't do tags there is no gain by changing this at =20 all except spamming the code with all the callback crap thats not =20 needed. According to the man page bus_dmamap_load takes no flags, so thats =20 why thats not done. Besides its not needed as shown above. S=F8ren Schmidt sos@FreeBSD.org