From owner-freebsd-current@FreeBSD.ORG Thu Jul 19 04:56:52 2007 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0AE8116A402 for ; Thu, 19 Jul 2007 04:56:52 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id A914113C48D for ; Thu, 19 Jul 2007 04:56:51 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from scott-longs-computer.local (c-69-244-251-12.hsd1.fl.comcast.net [69.244.251.12]) (authenticated bits=0) by pooker.samsco.org (8.13.8/8.13.8) with ESMTP id l6J4udrG058383; Wed, 18 Jul 2007 22:56:45 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <469EEF02.7000804@samsco.org> Date: Thu, 19 Jul 2007 00:56:34 -0400 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.4) Gecko/20070509 SeaMonkey/1.1.2 MIME-Version: 1.0 To: David Christensen References: <09BFF2FA5EAB4A45B6655E151BBDD9030483F161@NT-IRVA-0750.brcm.ad.broadcom.com> <20070718021839.GA37935@cdnetworks.co.kr> <09BFF2FA5EAB4A45B6655E151BBDD9030483F437@NT-IRVA-0750.brcm.ad.broadcom.com> <20070719002218.GA42405@cdnetworks.co.kr> <09BFF2FA5EAB4A45B6655E151BBDD9030483F5D2@NT-IRVA-0750.brcm.ad.broadcom.com> In-Reply-To: <09BFF2FA5EAB4A45B6655E151BBDD9030483F5D2@NT-IRVA-0750.brcm.ad.broadcom.com> X-Enigmail-Version: 0.95.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (pooker.samsco.org [168.103.85.57]); Wed, 18 Jul 2007 22:56:46 -0600 (MDT) X-Spam-Status: No, score=1.7 required=5.5 tests=RCVD_IN_NJABL_DUL autolearn=no version=3.1.8 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org Cc: pyunyh@gmail.com, current@freebsd.org Subject: Re: Getting/Forcing Greater than 4KB Buffer Allocations X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Jul 2007 04:56:52 -0000 David Christensen wrote: >> > Thanks Pyun but I'm really just looking for a way to test >> that I can >> > handle the number of segments I've advertised that I can >> support. I >> > believe my code is correct but when all I see are allocations of 3 >> > segments I just can't prove it. I was hoping that running >> a utility >> > such as "stress" would help fragment memory and force more variable >> > responses but that hasn't happened yet. >> > >> >> It seems you've used the following code to create jumbo dma tag. >> /* >> * Create a DMA tag for RX mbufs. >> */ >> if (bus_dma_tag_create(sc->parent_tag, >> 1, >> BCE_DMA_BOUNDARY, >> sc->max_bus_addr, >> BUS_SPACE_MAXADDR, >> NULL, NULL, >> MJUM9BYTES, >> BCE_MAX_SEGMENTS, >> MJUM9BYTES, >> ^^^^^^^^^^ >> 0, >> NULL, NULL, >> &sc->rx_mbuf_tag)) { >> BCE_PRINTF("%s(%d): Could not allocate RX >> mbuf DMA tag!\n", >> __FILE__, __LINE__); >> rc = ENOMEM; >> goto bce_dma_alloc_exit; >> } >> If you want to have > 9 dma segements change maxsegsz(MJUM9BYTES) to >> 1024. bus_dma honors maxsegsz argument so you wouldn't get a dma >> segments larger than maxsegsz. With MJUM9BYTES maxsegsz you would get >> up to 4 dma segments on systems with 4K PAGE_SIZE.(You would have >> got up to 3 dma segements if you used PAGE_SIZE alignment argument.) > > I don't want more segments, I just want to get a distribution of > segments > up to the max size I specified. For example, since my BCE_MAX_SEGMENTS > size is 8, I want to make sure I get mbufs that are spread over 1, 2, 3, > 4, 5, 6, 7, and 8 segments. > > It turns out if I reduce the amount of memory in the system (from 8GB to > 2GB) I will get more mbufs coalesced into 2 segments, rather than the > more typical 3 segments, but that's good enough for my testing now. > Dave, I'm trying to catch up on this thread, but I'm utterly confused as to what you're looking for. Let's try talking through a few scenarios here: 1. Your hardware has slots for 3 SG elements, and all three MUST be filled without exception. Therefore, you want segments that are 4k, 4k, and 1k (or some slight variation of that if the buffer is misaligned). To do this, set the maxsegs to 3 and the maxsegsize to 4k. This will ensure that busdma does no coalescing (more on this topic later) and will always give you 3 segments for 9k of contiguous buffers. If the actual buffer winds up being <= 8k, busdma won't guarantee that you'll get 3 segments, and you'll have to fake something up in your driver. If the buffer winds up being an fragmented mbuf chain, it also won't guarantee that you'll get 3 segments either, but that's already handled now via m_defrag(). 2. Your hardware can only handle 4k segments, but is less restrictive on the min/max number of segements. The solution is the same as above. 3. Your hardware has slots for 8 SG elements, and all 8 MUST be filled without exception. There's no easy solution for this, as it's a fairly bizarre situation. I'll only discuss it further if you confirm that it's actually the case here. As for coalescing segments, I'm considering a new busdma back-end that greatly streamlines loads by eliminating cycle-consuming tasks like segment coalescing. The original justification for coalescing was that DMA engines operated faster with fewer segments. That might still be true, but the extra host CPU cycles and cache-line misses probably result in a net loss. I'm also going to axe bounce-buffer support since it bloats the I cache. The target for this new back-end is drivers that support hardware that don't need these services and that are also sensitive to the amount of host CPU cycles being consumed, i.e. modern 1Gb and 10Gb adapters. The question I have is whether this new back-end should be accessible directly through yet another bus_dmamap_load_foo variant that the drivers need to know specifically about, or indirectly and automatically via the existing bus_dmamap_load_foo variants. The tradeoff is further API pollution vs the opportunity for even more efficiency through no indirect function calls and no cache misses from accessing the busdma tag. I don't like API pollution since it makes it harder to maintain code, but the opportunity for the best performance possible is also appealing. Scott