From owner-cvs-all@FreeBSD.ORG  Fri Nov 14 15:31:48 2003
Return-Path: <owner-cvs-all@FreeBSD.ORG>
Delivered-To: cvs-all@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id E9FA116A4CE; Fri, 14 Nov 2003 15:31:48 -0800 (PST)
Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 26AFA43FDF; Fri, 14 Nov 2003 15:31:46 -0800 (PST)
	(envelope-from rizzo@xorpc.icir.org)
Received: from xorpc.icir.org (localhost [127.0.0.1])
	by xorpc.icir.org (8.12.9p1/8.12.3) with ESMTP id hAENVjFw055712;
	Fri, 14 Nov 2003 15:31:45 -0800 (PST)
	(envelope-from rizzo@xorpc.icir.org)
Received: (from rizzo@localhost)
	by xorpc.icir.org (8.12.9p1/8.12.3/Submit) id hAENVjgP055711;
	Fri, 14 Nov 2003 15:31:45 -0800 (PST)
	(envelope-from rizzo)
Date: Fri, 14 Nov 2003 15:31:45 -0800
From: Luigi Rizzo <rizzo@icir.org>
To: Andre Oppermann <andre@FreeBSD.org>
Message-ID: <20031114153145.A54064@xorpc.icir.org>
References: <200311142102.hAEL2Nen073186@repoman.freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5.1i
In-Reply-To: <200311142102.hAEL2Nen073186@repoman.freebsd.org>;
	from andre@FreeBSD.org on Fri, Nov 14, 2003 at 01:02:23PM -0800
cc: cvs-src@FreeBSD.org
cc: src-committers@FreeBSD.org
cc: cvs-all@FreeBSD.org
Subject: Re: cvs commit: src/sys/netinet in_var.h ip_fastfwd.c ip_flow.c
	ip_flow.h ip_input.c ip_output.c src/sys/sys mbuf.h src/sys/conf
	files src/sys/net if_arcsubr.c if_ef.c if_ethersubr.c if_fddisubr.c
	if_iso88025subr.c if_ppp.c
X-BeenThere: cvs-all@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: CVS commit messages for the entire tree <cvs-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-all>
List-Post: <mailto:cvs-all@freebsd.org>
List-Help: <mailto:cvs-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Nov 2003 23:31:49 -0000

On Fri, Nov 14, 2003 at 01:02:23PM -0800, Andre Oppermann wrote:
...
>   Introduce ip_fastforward and remove ip_flow.
>   
>   Short description of ip_fastforward:
>   
>    o adds full direct process-to-completion IPv4 forwarding code
>    o handles ip fragmentation incl. hw support (ip_flow did not)
>    o sends icmp needfrag to source if DF is set (ip_flow did not)
>    o supports ipfw and ipfilter (ip_flow did not)
>    o supports divert, ipfw fwd and ipfilter nat (ip_flow did not)
>    o returns anything it can't handle back to normal ip_input


Sorry if i missed part of the discussion leading to this commit...

Let me state first that I do not mean to criticise your work, but 
having spent quite a bit of time trying to optimize the forwarding 
path myself, and having found a number of surprises on what really     
consumes most of the time, i tend to be a bit skeptical when it
comes to optimizations.

Obviously I see it as a very good thing the creation of a fully
featured fast forwarding path. However I would like to understand
if it is really worthwhile doing it, and whether this new code
isn't a replica of already existing optimizations.

In particular, did you measure any significant performance difference
by replacing calls to ip_fastforward() with calls to ip_input() ?
(say in a kernel without "options IPSEC" which used to be highly
inefficient even for non-ipsec packets). 

The reason I am asking is that I see some expensive ops (such as
doing route lookups, scanning through lists of local addresses,  
input and output firewall processing, fragmentation) in both pieces
of code. Fragmentation may not be the common case, but all the rest
definitely is, and with large routing tables, multiple interfaces
and even modestly complex firewall rulesets, most of the processing
time goes there.

Also, in the initial comments you claim:

 * ip_fastforward gets its speed from processing the forwarded packet to
 * completion (if_output on the other side) without any queues or netisr's.

now, netisr_dispatch() (which is invoked if ip_fastforward() fails)
does an inline call to ip_input() if netisr_enable=1 (ip_input
is NETISR_MPSAFE). So this seems to fall into the same category
of optimizations.

So to summarize, could you identify what are the differences
that make ip_fastforward() so much faster than
ip_input()/ip_forward()/ip_output() ?

Given that there are large segments of common code between
ip_fastforward() and ip_input()/ip_output() (i am thinking of the
entire ipfw handling code, for one, and also some basic integrity
checks, the fragmentation code, etc.) I also wonder if it wouldn't
be beneficial to put the optimizations into the standard path rather
than create a new (partial) replica of the same code, with the
potential of introducing bugs, and with some substantial I-cache
pollution which might well destroy the benefits of minor optimizations.

-----------

Minor comments on the code:

  + one of the initial comments in the new code states

        ... The only part of the packet we touch with the CPU is the
        IP header. ...

    this is not true if you use ipfw because that code touches many
    places in the packet (and can also do some expensive computation
    like trying to locate the uid/gid of a packet; the fact that we
    only deal with packets not for us does not prevent the existence
    of such firewall rules).

  + The code at line 344:

        if (fw_enable && IPFW_LOADED) {
                bzero(&args, sizeof(args));
                args.m = m;
                ipfw = 0;

                ipfw = ip_fw_chk_ptr(&args);
                m = args.m;
   the bzero is probably more expensive than individually zeroing the
    fields of args as done in ip_input() (I think once i measured the
    two solutions). The first "ipfw = 0;" is redundant.

  + could you clarify the divert logic ? I am a bit rusty with that
    part of the code, but i am under the impression that in
    ip_fastforward() you are passing along args.divert_rule and
    losing track of divert_info which is instead what you need too.

-------------

	cheers
	luigi