From owner-freebsd-net  Mon Jul 10  0: 2:43 2000
Delivered-To: freebsd-net@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id B90E637B7A7; Mon, 10 Jul 2000 00:02:39 -0700 (PDT)
	(envelope-from bright@fw.wintelcom.net)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id e6A72bc19546;
	Mon, 10 Jul 2000 00:02:37 -0700 (PDT)
Date: Mon, 10 Jul 2000 00:02:37 -0700
From: Alfred Perlstein <bright@wintelcom.net>
To: David Greenman <dg@root.com>
Cc: net@freebsd.org, dg@freebsd.org, wollman@freebsd.org, ken@kdm.org
Subject: Re: argh! Re: weird things with M_EXT and large packets
Message-ID: <20000710000237.C25571@fw.wintelcom.net>
References: <20000709205124.A25571@fw.wintelcom.net> <200007100553.WAA01544@implode.root.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <200007100553.WAA01544@implode.root.com>; from dg@root.com on Sun, Jul 09, 2000 at 10:53:46PM -0700
Sender: owner-freebsd-net@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

* David Greenman <dg@root.com> [000709 23:05] wrote:
> >* Alfred Perlstein <bright@wintelcom.net> [000709 14:04] wrote:
> >
> >I'm 99.99% sure what's going on is that since I'm using normal kernel
> >malloc for these external clusters what's happening is that the device
> >driver is failing to notice that the data contained crosses a page
> >boundry and isn't breaking the data up properly.  Since the memory is
> >fragmented it's passing garbage over the wire that doesn't match the
> >checksum (hence the resending of the data)
> >
> >Doing a transfer over localhost works fine.
> >
> >If use contigmalloc to allocate the buffers then it works, I would really
> >rather not use contigmalloc because frankly it scares me.
> >
> >Is there a specific reason the network drivers (or at least fxp)
> >don't seem to check page boundries so that discontig kmem can be
> >passed to the drivers in large chunks?  I'd rather not have to
> >allocate size/PAGE_SIZE mbuf headers for each send.
> >
> >This may only fxp doing this incorrectly, or I may be just be
> >totally off, does this all make sense?
> 
>    You're correct that the fxp driver doesn't try to detect page breaks. It
> (wrongfully in your case) assumes that buffers are 1 page or less in size and
> don't cross physical page boundries. The driver was written to be fast, and
> doing multiple vtophys operations to detect a case which previously would
> never happen isn't a good thing for performance.

Actually, it does happen, you still need to vtophys once per
mbuf/cluster, allowing someone to present super large clusters
reduces allocation in the mbuf systems and cache utilization.

The detection of this situation is simply:

if ((mh_flags & M_EXT) != 0 && m->m_ext.ext_free != NULL
	&& (m_data % PAGE_SIZE) + m_len > PAGE_SIZE) {
	/* split */
}

The mod operation acting on a power of two should reduce to a
bitmask, so the overhead of doing this isn't that great.

The test could probably dispense with the EXT and ext_free checks
and wind up faster than with them in.

Mike Smith pointed me at some macros/functions that NetBSD uses to
accomplish this, I'll be investigating them as I'd rather make a
single allocation than one per 4k page.

I would like to know ahead of time what your opinion on bringing
this in would be, it looks to me like the overhead is so small that
the increase in functionality would be very worth it.

-Alfred


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message