From owner-freebsd-amd64@FreeBSD.ORG Tue Oct 25 22:10:09 2005 Return-Path: X-Original-To: freebsd-amd64@freebsd.org Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BF14316A41F for ; Tue, 25 Oct 2005 22:10:09 +0000 (GMT) (envelope-from jc@oxado.com) Received: from mars.interactivemediafactory.net (mars.imfeurope.net [194.2.222.161]) by mx1.FreeBSD.org (Postfix) with ESMTP id 264C043D49 for ; Tue, 25 Oct 2005 22:10:08 +0000 (GMT) (envelope-from jc@oxado.com) Received: from JC-8600.oxado.com (localhost [127.0.0.1]) by mars.interactivemediafactory.net (8.12.11/8.12.11) with ESMTP id j9PMA3PR061723 for ; Wed, 26 Oct 2005 00:10:04 +0200 (CEST) (envelope-from jc@oxado.com) Message-Id: <6.2.3.4.0.20051025171333.03a15490@pop.interactivemediafactory.net> X-Mailer: QUALCOMM Windows Eudora Version 6.2.3.4 Date: Wed, 26 Oct 2005 00:09:55 +0200 To: freebsd-amd64@freebsd.org From: Jacques Caron Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Subject: busdma dflt_lock on amd64 > 4 GB X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Oct 2005 22:10:10 -0000 Hi all, It seems there is a continuing story about bus_dma (or rather its use by drivers) and systems with more than 4 GB RAM. I submitted a pr for this issue: http://www.freebsd.org/cgi/query-pr.cgi?pr=87977 I know it happens on amd64 machines, though after looking a bit further and trying to figure out the whole busdma thing, the issue might be more general (as busdma_machdep.c is exactly the same for i386 and amd64), but as it has been discussed around here a number of times and because there are probably more amd64 systems with more than 4 GB RAM than other types, I've selected this list, let me know if another list would be more suitable. What I understand (please correct me if I'm wrong) is that: - busdma will use bounce buffers when needed, and this includes the use of devices that are limited to 32-bit addressing (most of them, I would guess?) when there is more than 4 GB RAM - I'm not 100% sure, but it seems bounce buffers are a limited ressource (that's at least what sysctl -a | grep busdma tells me, and that really looks like a bottleneck, btw) - apparently busdma will defer the allocation of bounce buffers when there aren't enough available (and this can happen pretty quickly in some situations, though I haven't yet figured out the difference between the two zones): two simultaneous dd's from two disks with a large block size (bs=256000) will use up all available bounce buffer pages in zone1... - if that happens, busdma_swi will eventually call the lockfunc associated with the dma tag, and panic if none is defined Now, it seems that many drivers don't provide a lockfunc to bus_dma_tag_create. The commit log for the lockfunc addition says: "The only time that NULL, NULL should ever be used is when the driver ensures that bus_dmamap_load() will not be deferred." The problem is: what does this mean? How can a driver "ensure that bus_dmamap_load will not be deferred"? Calls to bus_dma_tag_create are not consistent in drivers: - some drivers are apparently cautious: twe will either have BUS_DMA_ALLOCNOW and no lockfunc, or no flags and use busdma_lock_mutex and Giant. Is this the right approach? - other drivers are *very* cautious: fxp will always use busdma_lock_mutex and Giant. - other drivers don't care at all: bge and ata never provide a lockfunc, and in most cases don't use any flags either. My (humble) opinion and a few questions: - clarification of the cases when a lockfunc is required or not is needed. I fear it is always needed unless the created tag is only used as a "parent" for others, or (maybe?) if BUS_DMA_ALLOCNOW is set. - an audit of bus_dma_tag_create calls in most drivers is needed, at least regarding lockfunc args (bge also has weird lowaddr/hiaddr, as has already been reported) - maybe the dflt_lock should actually use the Giant mutex by default rather than panicking - or maybe the lockfunc call in busdma_swi is not needed? I'm really not versed into kernelese, so I really have no idea - is using Giant the best option, or should each driver use a different mutex, or...? I will try a kernel with a modified ata driver with busdma_lock_mutex,&Giant where needed tomorrow and report back. I think that this will actually fix the issue, but I don't know if it might not cause other issues or degrade performance or if there is a better solution... Any hints welcome, Jacques.