From owner-freebsd-net Thu Jun 20 9:28: 5 2002 Delivered-To: freebsd-net@freebsd.org Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by hub.freebsd.org (Postfix) with ESMTP id CBB1B37B40B; Thu, 20 Jun 2002 09:26:30 -0700 (PDT) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id MAA04622; Thu, 20 Jun 2002 12:26:28 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.6/8.9.1) id g5KGPwu30393; Thu, 20 Jun 2002 12:25:58 -0400 (EDT) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15634.534.696063.241224@grasshopper.cs.duke.edu> Date: Thu, 20 Jun 2002 12:25:58 -0400 (EDT) To: Bosko Milekic Cc: "Kenneth D. Merry" , current@FreeBSD.ORG, net@FreeBSD.ORG Subject: Re: new zero copy sockets snapshot In-Reply-To: <20020620114511.A22413@unixdaemons.com> References: <20020618223635.A98350@panzer.kdm.org> <20020619090046.A2063@panzer.kdm.org> <20020619120641.A18434@unixdaemons.com> <15633.17238.109126.952673@grasshopper.cs.duke.edu> <20020619233721.A30669@unixdaemons.com> <15633.62357.79381.405511@grasshopper.cs.duke.edu> <20020620114511.A22413@unixdaemons.com> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Bosko Milekic writes: > > Years ago, I used Wollman's MCLBYTES > PAGE_SIZE support (introduced > > in rev 1.20 of uipc_mbuf.c) and it seemed to work OK then. But having > > 16K clusters is a huge waste of space. ;). > > Since then, the mbuf allocator in -CURRENT has totally changed. It is > still possible to provide allocations of > PAGE_SIZE buffers, however > they will likely not map physically contiguous memory. If you happen to > have a device that doesn't support scatter/gather for DMA, then these > buffers will be broken for it (I know that if_ti is not a problem). Actually, it will be a problem for if_ti. The original tigon 1s didn't support s/g DMA. I think we should just not support jumbo frames on tigon 1s.. > The other issue is that the mbuf allocator then as well as the new > mbuf allocator uses the kmem_malloc() interface that was also used by > malloc() to perform allocations of wired-down pages. I am not sure if > you'll be able to play those tricks where you unmap and remap the page > that is allocated for you once it comes out of the mbuf allocator. Do > you think it would work? I don't think so, but I haven't read the code carefully and I don't know for certain. However, my intent was to use a jumbo mbuf type for copyin and to clean up the existing infastructure for drivers w/brain dead firmware, not to use a new 10K cluster as a framework for zero-copy. > > Do you think it would be feasable to glue in a new jumbo (10K?) > > allocator on top of the existing mbuf and mcl allocators using the > > existing mechanisms and the existing MCLBYTES > PAGE_SIZE support > > (but broken out into separte functions and macros)? > > Assuming that you can still play those VM tricks with the pages spit > out by mb_alloc (kern/subr_mbuf.c in -CURRENT), then this wouldn't be a > problem at all. It's easy to add a new fixed-size type allocation to > mb_alloc. In fact, it would be beneficial. mb_alloc uses per-CPU > caches and also makes mbuf and cluster allocations share the same > per-CPU lock. What could be done is that the jumbo buffer allocations > could share the same lock as well (since they will likely usually be > allocated right after an mbuf is). This would give us jumbo-cluster > support, but it would only be useful for devices clued enough to break > up the cluster into PAGE_SIZE chunks and do scatter/gather. For most > worthy gigE devices, I don't think this should be a problem. I'm a bit worried about other devices.. Tradidtionally, mbufs have never crossed page boundaries so most drivers never bother to check for a transmit mbuf crossing a page boundary. Using physically discontigous mbufs could lead to a lot of subtle data corruption. One question. I've observed some really anomolous behaviour under -stable with my Myricom GM driver (2Gb/s + 2Gb/s link speed, Dual 1GHz pIII). When I use 4K mbufs for receives, the best speed I see is about 1300Mb/sec. However, if I use private 9K physically contiguous buffers I see 1850Mb/sec (iperf TCP). The obvious conclusion is that there's a lot of overhead in setting up the DMA engines, but that's not the case; we have a fairly quick chain dma engine. I've provided a "control" by breaking my contiguous buffers down into 4K chunks so that I do the same number of DMAs in both cases and I still see ~1850 Mb/sec for the 9K buffers. A coworker suggested that the problem was that when doing copyouts to userspace, the PIII was doing speculative reads and loading the cache with the next page. However, we then start copying from a totally different address using discontigous buffers, so we effectively take 2x the number of cache misses we'd need to. Does that sound reasonable to you? I need to try malloc'ing virtually contigous and physically discontigous buffers & see if I get the same (good) performance... Cheers, Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message