From owner-freebsd-mips@FreeBSD.ORG Mon Aug 20 19:01:38 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A15F2106572D; Mon, 20 Aug 2012 19:01:38 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 4892E8FC24; Mon, 20 Aug 2012 19:01:38 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id B53A1B95B; Mon, 20 Aug 2012 15:01:37 -0400 (EDT) From: John Baldwin To: Ian Lepore , scottl@freebsd.org, Justin Gibbs Date: Mon, 20 Aug 2012 14:34:16 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p17; KDE/4.5.5; amd64; ; ) References: <20120703111753.GB72292@server.rulingia.com> <1344355782.1128.186.camel@revolution.hippie.lan> <201208071406.45172.jhb@freebsd.org> In-Reply-To: <201208071406.45172.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201208201434.16538.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 20 Aug 2012 15:01:37 -0400 (EDT) Cc: arm@freebsd.org, mips@freebsd.org, Peter Jeremy Subject: Re: On-stack allocation of DMA S/G lists X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2012 19:01:39 -0000 [ Adding Scott Long and Justin Gibbs for their input ] On Tuesday, August 07, 2012 2:06:44 pm John Baldwin wrote: > On Tuesday, August 07, 2012 12:09:42 pm Ian Lepore wrote: > > On Mon, 2012-08-06 at 10:26 -0400, John Baldwin wrote: > > > On Thursday, July 12, 2012 8:26:05 am John Baldwin wrote: > > > > On Sunday, July 08, 2012 7:05:16 am Peter Jeremy wrote: > > > > > BTW(2): Whilst studying busdma_machdep.c for arm and mips, I've > > > > > noticed they appear to potentially allocate substantial kernel stack > > > > > under some conditions as several bus_dma(9) functions include: > > > > > bus_dma_segment_t dm_segments[dmat->nsegments]; > > > > > What prevents this overflowing the kernel stack? > > > > > > > > That does seem dubious. x86 stores the array in the tag instead. > > > > > > I have an untested patch to change bus-dma on arm and mips to allocate a > > > dynamic S/G list in each DMA tag on first use instead of using on-stack > > > allocation (which I think is rather bogus). Can folks review and test this > > > patch please? Thanks. > > > > > > http://www.FreeBSD.org/~jhb/patches/arm_mips_dynamic_dma_segs.patch > > > > > > > I'm worried about changing a per-mapping-call resource to a per-dma-tag > > resource here. What prevents the situation where you have two > > bus_dmamap_load() calls in progress at the same time using different > > buffers but the same tag? > > > > I can't find anything in the docs that indicates you have to provide > > external locking of the tag for map load/unload calls, or that even > > implies the tag can be modified by a mapping operation. The lockfunc > > stuff related to creating the tag is documented as being used only > > during a deferred callback. > > Actually, I do think it is implicit that you won't do concurrent loads > on a DMA tag, though that may not be obvious. Keep in mind that this > is what x86's bus_dma has always done. For storage drivers you certainly > can't do this or risk completeing I/O requests out-of-order which can > break an upper-layer assumption in a filesystem. Note that all other > platforms do this as well, only arm and mips allocate on the stack. One thing I should note is that it is assumed that the lock specifed by the lockfunc stuff is held when you invoke bus_dmamap_load*(). > > The existing code seems to go out of its way to avoid modifying the tag > > during a mapping operation. For example, it decides at tag creation > > time whether any bounce pages might ever be needed for the tag, and if > > so it pre-sets a bounce zone in the tag, then at mapping time the bounce > > zone is protected with its own lock when it gets modified. To me this > > feels like a way to specifically avoid the need to lock or modify the > > tag during a mapping operation. > > > > Assuming that all of the foregoing is moot for some reason I've > > overlooked, then on a purely implementation level, could all the > > duplicated code to allocate the array when necessary be moved into > > bus_dmamap_load_buffer(), triggered by a NULL 'segs' pointer? > > Nope, bus_dmamap_load() doesn't know which of M_NOWAIT / M_WAITOK is > appropriate to use. > > > And just for the record, looking at the problem from an even more > > distant vantage... is there really a problem with stack-allocating the > > segments? On a 64-bit arch the struct is like 16 bytes. Typical usage > > is to allocate a tag allowing 1 or just a few segments. Is anyone > > really going to create a tag specifying hundreds of segments that would > > overflow the stack? If they try, wouldn't failing the tag create be > > good enough? > > I/O devices can allocate tags with several S/G elements. An mfi(4) tag > on i386 would use a 256 byte segments array (512 on amd64). That's not > entirely trivial. It would be worse if you couldn't depend on > dmat->nsegments and had to always allocate the full size. Presumably > though we require C99 at that point (and it requires that?). Also, we are moving to a model of using larger MAXPHYS (some places already modify FreeBSD to bump MAXPHYS up), and to looking at using direct dispatch for disk I/O rather than always bouncing requests through g_up/g_down to gain greater concurrency in the disk I/O subsystem. Both of these are going to place greater stress on the kernel stack and argue for less kernel stack use, not more. Scott noted that a 1MB I/O request can require up to 256 segments, which is 4k. That would be too much space on the stack. One other option is to move the S/G lists into the dma map structure instead of being in the tag, but to date that has not been done because it would introduce a significant amount of memory overhead. Also, I don't see a great need to allow concurrent loads within a single tag. Typically you allocate a single tag for each "thing" that can accept a stream of DMA requests (so 1 tag for each command queue on a HBA, or for each descriptor ring on a NIC). For example, in the case of a multiq NIC you want to use separate tags so that each tag uses a per-ring lock. But typically that stream of DMA requests requires serialization, so you would never end up wanting to have concurrent loads on the same tag. -- John Baldwin