From owner-freebsd-net Sun Jul 9 21:23:50 2000 Delivered-To: freebsd-net@freebsd.org Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169]) by hub.freebsd.org (Postfix) with ESMTP id B41C937B730; Sun, 9 Jul 2000 21:23:45 -0700 (PDT) (envelope-from ken@panzer.kdm.org) Received: (from ken@localhost) by panzer.kdm.org (8.9.3/8.9.1) id WAA20554; Sun, 9 Jul 2000 22:23:42 -0600 (MDT) (envelope-from ken) Date: Sun, 9 Jul 2000 22:23:41 -0600 From: "Kenneth D. Merry" To: Alfred Perlstein Cc: net@FreeBSD.ORG, dg@FreeBSD.ORG, wollman@FreeBSD.ORG Subject: Re: argh! Re: weird things with M_EXT and large packets Message-ID: <20000709222341.A20360@panzer.kdm.org> References: <20000709140441.T25571@fw.wintelcom.net> <20000709205124.A25571@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <20000709205124.A25571@fw.wintelcom.net>; from bright@wintelcom.net on Sun, Jul 09, 2000 at 08:51:24PM -0700 Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sun, Jul 09, 2000 at 20:51:24 -0700, Alfred Perlstein wrote: > * Alfred Perlstein [000709 14:04] wrote: > > I have some code here sending a mbuf via: > > > > error = (*so->so_proto->pr_usrreqs->pru_send)(so, 0, m, 0, 0, p); > > > > m is setup like so: > > > > m->m_ext.ext_free = kblob_mbuf_free; > > m->m_ext.ext_ref = kblob_mbuf_ref; > > m->m_ext.ext_buf = (void *)kb; > > m->m_ext.ext_size = kb->kb_len; > > m->m_data = (char *) kb->kb_data + uap->offset; > > m->m_flags |= M_EXT; > > m->m_pkthdr.len = m->m_len = uap->nbytes; > > > > uap->nbytes is 59499. > > > > It looks like the packet is being broken up or referenced to be sent, > > but at a certain point it hangs. > > I'm 99.99% sure what's going on is that since I'm using normal kernel > malloc for these external clusters what's happening is that the device > driver is failing to notice that the data contained crosses a page > boundry and isn't breaking the data up properly. Since the memory is > fragmented it's passing garbage over the wire that doesn't match the > checksum (hence the resending of the data) > > Doing a transfer over localhost works fine. > > If use contigmalloc to allocate the buffers then it works, I would really > rather not use contigmalloc because frankly it scares me. I had the same problem earlier this year, except it was with pages passed from userland into the kernel. My solution was to walk each incoming buffer and detect boundaries between chunks of contiguous pages. (So I wound up with a set of physical pointers and lengths.) > Is there a specific reason the network drivers (or at least fxp) > don't seem to check page boundries so that discontig kmem can be > passed to the drivers in large chunks? I'd rather not have to > allocate size/PAGE_SIZE mbuf headers for each send. > > This may only fxp doing this incorrectly, or I may be just be > totally off, does this all make sense? It does make sense. I would bet that most, if not all, network drivers don't check for contiguous memory. There are numerous reasons for this, but I think the bottom line is that it's too much trouble for too little gain. Most network devices that FreeBSD supports have a MTU of 1500 bytes or so, and at least in standard mbufs, the drivers don't have to worry about the chunk of data they get crossing page boundaries. Even with drivers with larger MTUs, like gigabit ethernet drivers, they typically take chains of mbufs, and do a separate vtophys() on each element in the chain, ans pass it down to the card. Again, they expect that each mbuf points to a physically contiguous chunk of memory. One thing to keep in mind, at least about allocating huge chunks of memory and passing them down the network stack, is that the big chunks will get split up, either by the TCP layer or the IP layer. The big chunks will get split into multiple pieces, each with its own mbuf header. The zero copy send code that Drew Gallatin wrote uses page-sized chunks to pass things around. With a gigabit ethernet jumbo MTU (9000 bytes), that is very efficient on the Alpha, with its 8K page size, but less efficient on the i386, with its 4K page size. (Since you end up with double the number of chunks.) From the benchmarks I've done, increasing the chunk size from 4K to 8K on the i386 would cut CPU utilization in half on sends over gigabit ethernet. The problem in that instance (according to Drew) is getting the COW stuff right for chunks of data bigger than a page. Another thing I learned from doing benchmarks is that increasing the chunk size to something larger than your MTU doesn't help CPU utilization much, if at all, since the larger chunks eventually get broken up into MTU-sized chunks. The most efficient chunk size for larger MTU adapters (i.e. more than 4K) is the nearest page multiple that is less than the MTU size. I'm not sure if this will work with what you're trying to do, but you could use contigmalloc() to allocate a large chunk of memory (say multiple megabytes in size) and then break it up into smaller chunks of memory that are then tacked onto mbufs. The ti(4) driver uses that approach in its stock (i.e. non-zero-copy) jumbo receive buffer allocation code, since the type of jumbo receive buffers it uses by default are expected to consist of one contiguous piece of memory. (The Tigon firmware also supports another type of jumbo receive buffer with 4 S/G entries.) Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message