Date: Wed, 26 Oct 2005 00:09:55 +0200 From: Jacques Caron <jc@oxado.com> To: freebsd-amd64@freebsd.org Subject: busdma dflt_lock on amd64 > 4 GB Message-ID: <6.2.3.4.0.20051025171333.03a15490@pop.interactivemediafactory.net>
next in thread | raw e-mail | index | archive | help
Hi all, It seems there is a continuing story about bus_dma (or rather its use by drivers) and systems with more than 4 GB RAM. I submitted a pr for this issue: http://www.freebsd.org/cgi/query-pr.cgi?pr=87977 I know it happens on amd64 machines, though after looking a bit further and trying to figure out the whole busdma thing, the issue might be more general (as busdma_machdep.c is exactly the same for i386 and amd64), but as it has been discussed around here a number of times and because there are probably more amd64 systems with more than 4 GB RAM than other types, I've selected this list, let me know if another list would be more suitable. What I understand (please correct me if I'm wrong) is that: - busdma will use bounce buffers when needed, and this includes the use of devices that are limited to 32-bit addressing (most of them, I would guess?) when there is more than 4 GB RAM - I'm not 100% sure, but it seems bounce buffers are a limited ressource (that's at least what sysctl -a | grep busdma tells me, and that really looks like a bottleneck, btw) - apparently busdma will defer the allocation of bounce buffers when there aren't enough available (and this can happen pretty quickly in some situations, though I haven't yet figured out the difference between the two zones): two simultaneous dd's from two disks with a large block size (bs=256000) will use up all available bounce buffer pages in zone1... - if that happens, busdma_swi will eventually call the lockfunc associated with the dma tag, and panic if none is defined Now, it seems that many drivers don't provide a lockfunc to bus_dma_tag_create. The commit log for the lockfunc addition says: "The only time that NULL, NULL should ever be used is when the driver ensures that bus_dmamap_load() will not be deferred." The problem is: what does this mean? How can a driver "ensure that bus_dmamap_load will not be deferred"? Calls to bus_dma_tag_create are not consistent in drivers: - some drivers are apparently cautious: twe will either have BUS_DMA_ALLOCNOW and no lockfunc, or no flags and use busdma_lock_mutex and Giant. Is this the right approach? - other drivers are *very* cautious: fxp will always use busdma_lock_mutex and Giant. - other drivers don't care at all: bge and ata never provide a lockfunc, and in most cases don't use any flags either. My (humble) opinion and a few questions: - clarification of the cases when a lockfunc is required or not is needed. I fear it is always needed unless the created tag is only used as a "parent" for others, or (maybe?) if BUS_DMA_ALLOCNOW is set. - an audit of bus_dma_tag_create calls in most drivers is needed, at least regarding lockfunc args (bge also has weird lowaddr/hiaddr, as has already been reported) - maybe the dflt_lock should actually use the Giant mutex by default rather than panicking - or maybe the lockfunc call in busdma_swi is not needed? I'm really not versed into kernelese, so I really have no idea - is using Giant the best option, or should each driver use a different mutex, or...? I will try a kernel with a modified ata driver with busdma_lock_mutex,&Giant where needed tomorrow and report back. I think that this will actually fix the issue, but I don't know if it might not cause other issues or degrade performance or if there is a better solution... Any hints welcome, Jacques.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6.2.3.4.0.20051025171333.03a15490>