Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Dec 2004 02:05:14 -0500
From:      James <james@towardex.com>
To:        freebsd-net@freebsd.org
Subject:   Receive path for ip_fastforward
Message-ID:  <20041227070514.GA68890@scylla.towardex.com>

next in thread | raw e-mail | index | archive | help

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

As requested, here you go.

What is included in the email attachments:

1. Modified files in raw format (for easier reads)
   - ip_fastfwd.c (sys/netinet)
   - ip_input.c   (sys/netinet)
   - in.c         (sys/netinet)
   - ip_var.h     (sys/netinet)
   - inet.c       (usr.bin/netstat)

2. Unified diff files for each above in .diff format
   so you can see the changes better with developer's eyes.

Notes:
  - ip_fastfwd.c:
    Production code, proven to work very well; currently in use on actual
    production routers pushing 300Mb/s traffic on the network.

 - ip_input.c:
   No changes other than mbuf tagging for packets preprocessed by ip_fastfwd
   in Steps 1 and 2 (the basic sanity/fallback checks).

 - ip_var.h:
   Adds one additional variable (ipstat.ips_transit_re) to track packets
   forwarded to receive path by ip_fastforward.

 - netstat/inet.c:
   Adds tracking information for ipstat.ips_transit_re in netstat(1) program.

 - in.c:
   Quickie hack (the code we are using on production routers is vastly
   different, so had to be quickly hacked up for this patch) to add receive
   path routes to routing table during SIOCSIFADDR. Been tested so far without
   any problems -- network address, broadcast address, our_own_addresses are
   installed with lo0/127.0.0.1 as next-hop during SIOCSIFADDR and are
   properly deleted during SIOCDIFADDR. But please make changes on this as
   necessary as this is a hack that may present some broken issues.


What it is and what it does:

 - For more information about what Receive Path ACL is all about:
<http://www.cisco.com/en/US/tech/tk648/tk361/technologies_white_paper09186a00801a0a5e.shtml>;

 - The receive path installs IP addresses that should be forwarded to router's
  own control plane stack (ip_input and upwards) as /32 host routes to the
  routing table. During ip_fastforward stage, if the route to destination is
  a local/receive-path route (RTF_LOCAL), or if the packet needs to be punted
  to slow ip_input processing path because a further analysis is required,
  that packet is subject firewall rules that filter on the lo0 interface under
  INBOUND direction, before being released to ip_input. 

  The receive path work does _NOT_ actually forward the packet to lo0 driver.
  Doing so will actually break a number of protocols including OSPF and add
  further processing overhead for packets that need to be punted to ip_input.
  Instead, packets are simply subject to loopback filtering firewall rules
  before exiting ip_fastforward.

User's Guide:

--> Caveat before you start:
   The receive path uses pfil_hooks firewall API to subject control plane
   bound packets to loopback filtering rules. At this time, IPFW2 is *NOT*
   supported. pf(4) is fully supported and is proven to work fine for this
   application. IPFW does not work since it captures ifnet variable out of
   mbuf header instead of the ifnet provided by pfil_hooks.

 Step 1:
   sysctl -w net.inet.ip.fastforwarding=1
   Note: Fast forwarding MUST BE ENABLED in order for receive path to
   operate.

 Step 2:
   Setup pf(4) firewall rules to filter on lo0 at inbound direction.
   Be sure to allow packets sourced from 127.0.0.0/8 as many routing protocol
   software packages (including Zebra and Quagga) use loopback interface for
   their inter-process communications. Also be sure to allow any OSPF or
   routing protocols your router is running.

   Example of loopback filtering firewall out of a production router. The
   example below assumes your router is an edge router with just BGP running:

cr1.walt# pfctl -sr
pass quick on ge-0/0/0.2 all
pass quick on ge-0/1/0.12 all
pass quick on ge-0/1/0.203 all
pass in quick on lo0 proto tcp from any to any port = ssh keep state
pass in quick on lo0 proto tcp from any to any port = bgp keep state
pass in quick on lo0 proto tcp from any port = ftp-data to any
pass in quick on lo0 proto tcp from any port = ftp to any
pass in quick on lo0 proto tcp from any port = http to any
pass in quick on lo0 proto udp from any to any port 33434:33534
pass in quick on lo0 proto udp from any port = domain to any
pass in quick on lo0 proto icmp all
pass in quick on lo0 inet from 127.0.0.0/24 to any
block drop in quick on lo0 all
pass quick all
cr1.walt#

  Step 3:
    Packets successfully punted to ip_input either because they are too
    complex to be dealt with inside fast forwarding path, or because they
    are destined to router's own addresses, can be tracked by using the
    netstat(1) utility (after you patch it ofcourse). Example:

cr1.walt# netstat -sn -f inet | grep forward
        55647205 packets forwarded (52951423 packets fast forwarded)
        4927978 packets not forwardable
        345712 packets forwarded to receive path
cr1.walt#

As referenced by the BSD License, I am not liable for any damages arising
from your use of this feature submission.

Questions: let me know.

-J

-- 
James Jun                                            TowardEX Technologies, Inc.
Technical Lead                      Boston IPv4/IPv6 Web Hosting, Colocation and
james@towardex.com            Network design/consulting & configuration services
cell: 1(978)-394-2867           web: http://www.towardex.com , noc: www.twdx.net

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="in.c"

/*
 * Copyright (c) 1982, 1986, 1991, 1993
 *	The Regents of the University of California.  All rights reserved.
 * Copyright (C) 2001 WIDE Project.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 *	@(#)in.c	8.4 (Berkeley) 1/9/95
 * $FreeBSD: src/sys/netinet/in.c,v 1.77.2.1 2004/12/12 19:12:35 mlaier Exp $
 */

#include <sys/param.h>
#include <sys/systm.h>
#include <sys/sockio.h>
#include <sys/malloc.h>
#include <sys/socket.h>
#include <sys/kernel.h>
#include <sys/sysctl.h>

#include <net/if.h>
#include <net/if_types.h>
#include <net/route.h>

#include <netinet/in.h>
#include <netinet/in_var.h>
#include <netinet/in_pcb.h>

#include <netinet/igmp_var.h>

static MALLOC_DEFINE(M_IPMADDR, "in_multi", "internet multicast address");

static int in_mask2len(struct in_addr *);
static void in_len2mask(struct in_addr *, int);
static int in_lifaddr_ioctl(struct socket *, u_long, caddr_t,
	struct ifnet *, struct thread *);

static int	in_addprefix(struct in_ifaddr *, int);
static int	in_scrubprefix(struct in_ifaddr *);
static void	in_socktrim(struct sockaddr_in *);
static int	in_ifinit(struct ifnet *,
	    struct in_ifaddr *, struct sockaddr_in *, int);

static int subnetsarelocal = 0;
SYSCTL_INT(_net_inet_ip, OID_AUTO, subnets_are_local, CTLFLAG_RW,
	&subnetsarelocal, 0, "Treat all subnets as directly connected");

struct in_multihead in_multihead; /* XXX BSS initialization */

extern struct inpcbinfo ripcbinfo;
extern struct inpcbinfo udbinfo;

/*
 * Return 1 if an internet address is for a ``local'' host
 * (one to which we have a connection).  If subnetsarelocal
 * is true, this includes other subnets of the local net.
 * Otherwise, it includes only the directly-connected (sub)nets.
 */
int
in_localaddr(in)
	struct in_addr in;
{
	register u_long i = ntohl(in.s_addr);
	register struct in_ifaddr *ia;

	if (subnetsarelocal) {
		TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link)
			if ((i & ia->ia_netmask) == ia->ia_net)
				return (1);
	} else {
		TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link)
			if ((i & ia->ia_subnetmask) == ia->ia_subnet)
				return (1);
	}
	return (0);
}

/*
 * Return 1 if an internet address is for the local host and configured
 * on one of its interfaces.
 */
int
in_localip(in)
	struct in_addr in;
{
	struct in_ifaddr *ia;

	LIST_FOREACH(ia, INADDR_HASH(in.s_addr), ia_hash) {
		if (IA_SIN(ia)->sin_addr.s_addr == in.s_addr)
			return 1;
	}
	return 0;
}

/*
 * Determine whether an IP address is in a reserved set of addresses
 * that may not be forwarded, or whether datagrams to that destination
 * may be forwarded.
 */
int
in_canforward(in)
	struct in_addr in;
{
	register u_long i = ntohl(in.s_addr);
	register u_long net;

	if (IN_EXPERIMENTAL(i) || IN_MULTICAST(i))
		return (0);
	if (IN_CLASSA(i)) {
		net = i & IN_CLASSA_NET;
		if (net == 0 || net == (IN_LOOPBACKNET << IN_CLASSA_NSHIFT))
			return (0);
	}
	return (1);
}

/*
 * Sub-routine for in_ifaddrecv() and in_ifremrecv().
 * --james@towardex.com 12/17/2004
 */
static void
in_ifrecv_request(int call, int cmd, struct in_ifaddr *ia)
{
	struct sockaddr_in all1_sa;
	struct rtentry *nrt = NULL;
	struct ifaddr *ifa;
	int e = 0;
	struct sockaddr_in subnet = { sizeof(struct sockaddr_in), AF_INET };
	struct sockaddr_in loopback = { sizeof(struct sockaddr_in), AF_INET };

	ifa = &ia->ia_ifa;

       	bzero(&all1_sa, sizeof(all1_sa));
        all1_sa.sin_family = AF_INET;
        all1_sa.sin_len = sizeof(struct sockaddr_in);
        all1_sa.sin_addr.s_addr = (u_int32_t)0xffffffff;

	/* We need to manually specify loopback for network and broadcast
	 * addresses because we can't just let L2 rtrequest handlers to
	 * deal with ifa->if_addr set as gateway address.
	 */
        loopback.sin_family = AF_INET;
        loopback.sin_addr.s_addr = ntohl(INADDR_LOOPBACK);

	/*
	 * Set the rtflags to RTF_LLINFO so existing apps are happy
	 * with our changes.
	 */
	switch (call) {
	case 0:  /* own address request */
        	rtrequest(cmd, ifa->ifa_addr, sintosa(&loopback),
        	  (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt);
		break;
	case 1:  /* network address request */
        	rtrequest(cmd, sintosa(&ia->ia_dstaddr), sintosa(&loopback),
        	  (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt);
		break;
	case 2:  /* broadcast address request */
		subnet.sin_addr.s_addr = htonl(ia->ia_subnet);
		subnet.sin_family = AF_INET;

        	rtrequest(cmd, sintosa(&subnet), sintosa(&loopback),
        	  (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt);
		break;
	default:
		break;
	}

        if (nrt) {
                RT_LOCK(nrt);
                /*
                 * Make sure rt_ifa be equal to IFA, the second argument of
                 * the function.  We need this because when we refer to
                 * rt_ifa->ia_flags, we assume that the rt_ifa points to
		 * the address, not the loopback.
                 */
                if (cmd == RTM_ADD && ifa != nrt->rt_ifa) {
                        IFAFREE(nrt->rt_ifa);
                        IFAREF(ifa);
                        nrt->rt_ifa = ifa;
                }
                /*
		 * Report to routing socket.
                 */
                rt_newaddrmsg(cmd, ifa, e, nrt);
                if (cmd == RTM_DELETE) {
                        rtfree(nrt);
                } else {
                        /* the cmd must be RTM_ADD here */
                        RT_REMREF(nrt);
                        RT_UNLOCK(nrt);
                }
        }
}


/*
 * Add own address as loopback rtentry (receive path). We previously add
 * the route only if necessary (such as point to point circuit), or when
 * triggered by route cloning. However, a proper RIB and FIB implementation
 * must contain own-addrs as receive paths, allowing software to manage
 * its own addresses separately from prefixes. This is required for receive
 * adjacency/path in ip_fastforward() --james@towardex.com 2004/12/17
 */
static void
in_ifaddrecv(struct in_ifaddr *ia)
{
	struct rtentry *rt;
	int need_loop, need_netdst, need_bcast;
	struct sockaddr_in subnet = { sizeof(struct sockaddr_in), AF_INET };

	/* If there is no loopback entry, allocate one */
	rt = rtalloc1(ia->ia_ifa.ifa_addr, 0, 0);
	need_loop = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 ||
	  (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0);

	/* If there is no network entry, allocate one */
	if(rt) rtfree(rt);
	rt = rtalloc1(sintosa(&ia->ia_dstaddr), 0, 0);
	need_netdst = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 ||
	  (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0);
	
	/* If there is no broadcast entry, allocate one */
	subnet.sin_addr.s_addr = htonl(ia->ia_subnet);
	subnet.sin_family = AF_INET;
	if(rt) rtfree(rt);
	rt = rtalloc1(sintosa(&subnet), 0, 0);
	need_bcast = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 ||
	  (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0);

	if(rt)
	  rtfree(rt);

	if(need_loop)
	  in_ifrecv_request(0, RTM_ADD, ia);
	if(need_netdst)
	  in_ifrecv_request(1, RTM_ADD, ia);
	if(need_bcast)
	  in_ifrecv_request(2, RTM_ADD, ia);
}


/*
 * Remove loopback rtentry's of receive path generated by in_ifaddrecv()
 * if they exist. -- james 12/17/2004
 */
static void
in_ifremrecv(struct in_ifaddr *ia)
{
        struct rtentry *rt;
        
	/*
	 * Delete the route for ownaddr if it really exists.
	 */ 
        rt = rtalloc1(ia->ia_ifa.ifa_addr, 0, 0);
        if (rt != NULL && (rt->rt_flags & RTF_HOST) != 0 &&
             (rt->rt_ifp->if_flags & IFF_LOOPBACK) != 0) {
                  rtfree(rt);
                  in_ifrecv_request(0, RTM_DELETE, ia);
	}

	/* XXX
	 * Broadcast and network addresses are removed by
	 * by regular interface detach handlers, but we
	 * need to verify the design aspect of this more
	 * later.
	 */
}

/*
 * Trim a mask in a sockaddr
 */
static void
in_socktrim(ap)
struct sockaddr_in *ap;
{
    register char *cplim = (char *) &ap->sin_addr;
    register char *cp = (char *) (&ap->sin_addr + 1);

    ap->sin_len = 0;
    while (--cp >= cplim)
	if (*cp) {
	    (ap)->sin_len = cp - (char *) (ap) + 1;
	    break;
	}
}

static int
in_mask2len(mask)
	struct in_addr *mask;
{
	int x, y;
	u_char *p;

	p = (u_char *)mask;
	for (x = 0; x < sizeof(*mask); x++) {
		if (p[x] != 0xff)
			break;
	}
	y = 0;
	if (x < sizeof(*mask)) {
		for (y = 0; y < 8; y++) {
			if ((p[x] & (0x80 >> y)) == 0)
				break;
		}
	}
	return x * 8 + y;
}

static void
in_len2mask(mask, len)
	struct in_addr *mask;
	int len;
{
	int i;
	u_char *p;

	p = (u_char *)mask;
	bzero(mask, sizeof(*mask));
	for (i = 0; i < len / 8; i++)
		p[i] = 0xff;
	if (len % 8)
		p[i] = (0xff00 >> (len % 8)) & 0xff;
}

/*
 * Generic internet control operations (ioctl's).
 * Ifp is 0 if not an interface-specific ioctl.
 */
/* ARGSUSED */
int
in_control(so, cmd, data, ifp, td)
	struct socket *so;
	u_long cmd;
	caddr_t data;
	register struct ifnet *ifp;
	struct thread *td;
{
	register struct ifreq *ifr = (struct ifreq *)data;
	register struct in_ifaddr *ia = 0, *iap;
	register struct ifaddr *ifa;
	struct in_addr dst;
	struct in_ifaddr *oia;
	struct in_aliasreq *ifra = (struct in_aliasreq *)data;
	struct sockaddr_in oldaddr;
	int error, hostIsNew, iaIsNew, maskIsNew, s;

	iaIsNew = 0;

	switch (cmd) {
	case SIOCALIFADDR:
	case SIOCDLIFADDR:
		if (td && (error = suser(td)) != 0)
			return error;
		/*fall through*/
	case SIOCGLIFADDR:
		if (!ifp)
			return EINVAL;
		return in_lifaddr_ioctl(so, cmd, data, ifp, td);
	}

	/*
	 * Find address for this interface, if it exists.
	 *
	 * If an alias address was specified, find that one instead of
	 * the first one on the interface, if possible.
	 */
	if (ifp) {
		dst = ((struct sockaddr_in *)&ifr->ifr_addr)->sin_addr;
		LIST_FOREACH(iap, INADDR_HASH(dst.s_addr), ia_hash)
			if (iap->ia_ifp == ifp &&
			    iap->ia_addr.sin_addr.s_addr == dst.s_addr) {
				ia = iap;
				break;
			}
		if (ia == NULL)
			TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
				iap = ifatoia(ifa);
				if (iap->ia_addr.sin_family == AF_INET) {
					ia = iap;
					break;
				}
			}
	}

	switch (cmd) {

	case SIOCAIFADDR:
	case SIOCDIFADDR:
		if (ifp == 0)
			return (EADDRNOTAVAIL);
		if (ifra->ifra_addr.sin_family == AF_INET) {
			for (oia = ia; ia; ia = TAILQ_NEXT(ia, ia_link)) {
				if (ia->ia_ifp == ifp  &&
				    ia->ia_addr.sin_addr.s_addr ==
				    ifra->ifra_addr.sin_addr.s_addr)
					break;
			}
			if ((ifp->if_flags & IFF_POINTOPOINT)
			    && (cmd == SIOCAIFADDR)
			    && (ifra->ifra_dstaddr.sin_addr.s_addr
				== INADDR_ANY)) {
				return EDESTADDRREQ;
			}
		}
		if (cmd == SIOCDIFADDR && ia == 0)
			return (EADDRNOTAVAIL);
		/* FALLTHROUGH */
	case SIOCSIFADDR:
	case SIOCSIFNETMASK:
	case SIOCSIFDSTADDR:
		if (td && (error = suser(td)) != 0)
			return error;

		if (ifp == 0)
			return (EADDRNOTAVAIL);
		if (ia == (struct in_ifaddr *)0) {
			ia = (struct in_ifaddr *)
				malloc(sizeof *ia, M_IFADDR, M_WAITOK | M_ZERO);
			if (ia == (struct in_ifaddr *)NULL)
				return (ENOBUFS);
			/*
			 * Protect from ipintr() traversing address list
			 * while we're modifying it.
			 */
			s = splnet();
			TAILQ_INSERT_TAIL(&in_ifaddrhead, ia, ia_link);

			ifa = &ia->ia_ifa;
			IFA_LOCK_INIT(ifa);
			ifa->ifa_addr = (struct sockaddr *)&ia->ia_addr;
			ifa->ifa_dstaddr = (struct sockaddr *)&ia->ia_dstaddr;
			ifa->ifa_netmask = (struct sockaddr *)&ia->ia_sockmask;
			ifa->ifa_refcnt = 1;
			TAILQ_INSERT_TAIL(&ifp->if_addrhead, ifa, ifa_link);

			ia->ia_sockmask.sin_len = 8;
			ia->ia_sockmask.sin_family = AF_INET;
			if (ifp->if_flags & IFF_BROADCAST) {
				ia->ia_broadaddr.sin_len = sizeof(ia->ia_addr);
				ia->ia_broadaddr.sin_family = AF_INET;
			}
			ia->ia_ifp = ifp;
			splx(s);
			iaIsNew = 1;
		}
		break;

	case SIOCSIFBRDADDR:
		if (td && (error = suser(td)) != 0)
			return error;
		/* FALLTHROUGH */

	case SIOCGIFADDR:
	case SIOCGIFNETMASK:
	case SIOCGIFDSTADDR:
	case SIOCGIFBRDADDR:
		if (ia == (struct in_ifaddr *)0)
			return (EADDRNOTAVAIL);
		break;
	}
	switch (cmd) {

	case SIOCGIFADDR:
		*((struct sockaddr_in *)&ifr->ifr_addr) = ia->ia_addr;
		return (0);

	case SIOCGIFBRDADDR:
		if ((ifp->if_flags & IFF_BROADCAST) == 0)
			return (EINVAL);
		*((struct sockaddr_in *)&ifr->ifr_dstaddr) = ia->ia_broadaddr;
		return (0);

	case SIOCGIFDSTADDR:
		if ((ifp->if_flags & IFF_POINTOPOINT) == 0)
			return (EINVAL);
		*((struct sockaddr_in *)&ifr->ifr_dstaddr) = ia->ia_dstaddr;
		return (0);

	case SIOCGIFNETMASK:
		*((struct sockaddr_in *)&ifr->ifr_addr) = ia->ia_sockmask;
		return (0);

	case SIOCSIFDSTADDR:
		if ((ifp->if_flags & IFF_POINTOPOINT) == 0)
			return (EINVAL);
		oldaddr = ia->ia_dstaddr;
		ia->ia_dstaddr = *(struct sockaddr_in *)&ifr->ifr_dstaddr;
		if (ifp->if_ioctl && (error = (*ifp->if_ioctl)
					(ifp, SIOCSIFDSTADDR, (caddr_t)ia))) {
			ia->ia_dstaddr = oldaddr;
			return (error);
		}
		if (ia->ia_flags & IFA_ROUTE) {
			ia->ia_ifa.ifa_dstaddr = (struct sockaddr *)&oldaddr;
			rtinit(&(ia->ia_ifa), (int)RTM_DELETE, RTF_HOST);
			ia->ia_ifa.ifa_dstaddr =
					(struct sockaddr *)&ia->ia_dstaddr;
			rtinit(&(ia->ia_ifa), (int)RTM_ADD, RTF_HOST|RTF_UP);
		}
		return (0);

	case SIOCSIFBRDADDR:
		if ((ifp->if_flags & IFF_BROADCAST) == 0)
			return (EINVAL);
		ia->ia_broadaddr = *(struct sockaddr_in *)&ifr->ifr_broadaddr;
		return (0);

	case SIOCSIFADDR:
		error = in_ifinit(ifp, ia,
		    (struct sockaddr_in *) &ifr->ifr_addr, 1);
		if (error != 0 && iaIsNew)
			break;
		if (error == 0)
			EVENTHANDLER_INVOKE(ifaddr_event, ifp);
		return (0);

	case SIOCSIFNETMASK:
		ia->ia_sockmask.sin_addr = ifra->ifra_addr.sin_addr;
		ia->ia_subnetmask = ntohl(ia->ia_sockmask.sin_addr.s_addr);
		return (0);

	case SIOCAIFADDR:
		maskIsNew = 0;
		hostIsNew = 1;
		error = 0;
		if (ia->ia_addr.sin_family == AF_INET) {
			if (ifra->ifra_addr.sin_len == 0) {
				ifra->ifra_addr = ia->ia_addr;
				hostIsNew = 0;
			} else if (ifra->ifra_addr.sin_addr.s_addr ==
					       ia->ia_addr.sin_addr.s_addr)
				hostIsNew = 0;
		}
		if (ifra->ifra_mask.sin_len) {
			in_ifscrub(ifp, ia);
			ia->ia_sockmask = ifra->ifra_mask;
			ia->ia_sockmask.sin_family = AF_INET;
			ia->ia_subnetmask =
			     ntohl(ia->ia_sockmask.sin_addr.s_addr);
			maskIsNew = 1;
		}
		if ((ifp->if_flags & IFF_POINTOPOINT) &&
		    (ifra->ifra_dstaddr.sin_family == AF_INET)) {
			in_ifscrub(ifp, ia);
			ia->ia_dstaddr = ifra->ifra_dstaddr;
			maskIsNew  = 1; /* We lie; but the effect's the same */
		}
		if (ifra->ifra_addr.sin_family == AF_INET &&
		    (hostIsNew || maskIsNew))
			error = in_ifinit(ifp, ia, &ifra->ifra_addr, 0);
		if (error != 0 && iaIsNew)
			break;

		if ((ifp->if_flags & IFF_BROADCAST) &&
		    (ifra->ifra_broadaddr.sin_family == AF_INET))
			ia->ia_broadaddr = ifra->ifra_broadaddr;
		if (error == 0)
			EVENTHANDLER_INVOKE(ifaddr_event, ifp);
		return (error);

	case SIOCDIFADDR:
		/*
		 * in_ifscrub kills the interface route.
		 */
		in_ifscrub(ifp, ia);
		/*
		 * in_ifadown gets rid of all the rest of
		 * the routes.  This is not quite the right
		 * thing to do, but at least if we are running
		 * a routing process they will come back.
		 */
		in_ifadown(&ia->ia_ifa, 1);
		/*
		 * XXX horrible hack to detect that we are being called
		 * from if_detach()
		 */
		if (ifaddr_byindex(ifp->if_index) == NULL) {
			in_pcbpurgeif0(&ripcbinfo, ifp);
			in_pcbpurgeif0(&udbinfo, ifp);
		}
		EVENTHANDLER_INVOKE(ifaddr_event, ifp);
		error = 0;
		break;

	default:
		if (ifp == 0 || ifp->if_ioctl == 0)
			return (EOPNOTSUPP);
		return ((*ifp->if_ioctl)(ifp, cmd, data));
	}

	/*
	 * Protect from ipintr() traversing address list while we're modifying
	 * it.
	 */
	s = splnet();
	TAILQ_REMOVE(&ifp->if_addrhead, &ia->ia_ifa, ifa_link);
	TAILQ_REMOVE(&in_ifaddrhead, ia, ia_link);
	LIST_REMOVE(ia, ia_hash);
	IFAFREE(&ia->ia_ifa);
	splx(s);

	return (error);
}

/*
 * SIOC[GAD]LIFADDR.
 *	SIOCGLIFADDR: get first address. (?!?)
 *	SIOCGLIFADDR with IFLR_PREFIX:
 *		get first address that matches the specified prefix.
 *	SIOCALIFADDR: add the specified address.
 *	SIOCALIFADDR with IFLR_PREFIX:
 *		EINVAL since we can't deduce hostid part of the address.
 *	SIOCDLIFADDR: delete the specified address.
 *	SIOCDLIFADDR with IFLR_PREFIX:
 *		delete the first address that matches the specified prefix.
 * return values:
 *	EINVAL on invalid parameters
 *	EADDRNOTAVAIL on prefix match failed/specified address not found
 *	other values may be returned from in_ioctl()
 */
static int
in_lifaddr_ioctl(so, cmd, data, ifp, td)
	struct socket *so;
	u_long cmd;
	caddr_t	data;
	struct ifnet *ifp;
	struct thread *td;
{
	struct if_laddrreq *iflr = (struct if_laddrreq *)data;
	struct ifaddr *ifa;

	/* sanity checks */
	if (!data || !ifp) {
		panic("invalid argument to in_lifaddr_ioctl");
		/*NOTRECHED*/
	}

	switch (cmd) {
	case SIOCGLIFADDR:
		/* address must be specified on GET with IFLR_PREFIX */
		if ((iflr->flags & IFLR_PREFIX) == 0)
			break;
		/*FALLTHROUGH*/
	case SIOCALIFADDR:
	case SIOCDLIFADDR:
		/* address must be specified on ADD and DELETE */
		if (iflr->addr.ss_family != AF_INET)
			return EINVAL;
		if (iflr->addr.ss_len != sizeof(struct sockaddr_in))
			return EINVAL;
		/* XXX need improvement */
		if (iflr->dstaddr.ss_family
		 && iflr->dstaddr.ss_family != AF_INET)
			return EINVAL;
		if (iflr->dstaddr.ss_family
		 && iflr->dstaddr.ss_len != sizeof(struct sockaddr_in))
			return EINVAL;
		break;
	default: /*shouldn't happen*/
		return EOPNOTSUPP;
	}
	if (sizeof(struct in_addr) * 8 < iflr->prefixlen)
		return EINVAL;

	switch (cmd) {
	case SIOCALIFADDR:
	    {
		struct in_aliasreq ifra;

		if (iflr->flags & IFLR_PREFIX)
			return EINVAL;

		/* copy args to in_aliasreq, perform ioctl(SIOCAIFADDR_IN6). */
		bzero(&ifra, sizeof(ifra));
		bcopy(iflr->iflr_name, ifra.ifra_name,
			sizeof(ifra.ifra_name));

		bcopy(&iflr->addr, &ifra.ifra_addr, iflr->addr.ss_len);

		if (iflr->dstaddr.ss_family) {	/*XXX*/
			bcopy(&iflr->dstaddr, &ifra.ifra_dstaddr,
				iflr->dstaddr.ss_len);
		}

		ifra.ifra_mask.sin_family = AF_INET;
		ifra.ifra_mask.sin_len = sizeof(struct sockaddr_in);
		in_len2mask(&ifra.ifra_mask.sin_addr, iflr->prefixlen);

		return in_control(so, SIOCAIFADDR, (caddr_t)&ifra, ifp, td);
	    }
	case SIOCGLIFADDR:
	case SIOCDLIFADDR:
	    {
		struct in_ifaddr *ia;
		struct in_addr mask, candidate, match;
		struct sockaddr_in *sin;
		int cmp;

		bzero(&mask, sizeof(mask));
		if (iflr->flags & IFLR_PREFIX) {
			/* lookup a prefix rather than address. */
			in_len2mask(&mask, iflr->prefixlen);

			sin = (struct sockaddr_in *)&iflr->addr;
			match.s_addr = sin->sin_addr.s_addr;
			match.s_addr &= mask.s_addr;

			/* if you set extra bits, that's wrong */
			if (match.s_addr != sin->sin_addr.s_addr)
				return EINVAL;

			cmp = 1;
		} else {
			if (cmd == SIOCGLIFADDR) {
				/* on getting an address, take the 1st match */
				cmp = 0;	/*XXX*/
			} else {
				/* on deleting an address, do exact match */
				in_len2mask(&mask, 32);
				sin = (struct sockaddr_in *)&iflr->addr;
				match.s_addr = sin->sin_addr.s_addr;

				cmp = 1;
			}
		}

		TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link)	{
			if (ifa->ifa_addr->sa_family != AF_INET6)
				continue;
			if (!cmp)
				break;
			candidate.s_addr = ((struct sockaddr_in *)&ifa->ifa_addr)->sin_addr.s_addr;
			candidate.s_addr &= mask.s_addr;
			if (candidate.s_addr == match.s_addr)
				break;
		}
		if (!ifa)
			return EADDRNOTAVAIL;
		ia = (struct in_ifaddr *)ifa;

		if (cmd == SIOCGLIFADDR) {
			/* fill in the if_laddrreq structure */
			bcopy(&ia->ia_addr, &iflr->addr, ia->ia_addr.sin_len);

			if ((ifp->if_flags & IFF_POINTOPOINT) != 0) {
				bcopy(&ia->ia_dstaddr, &iflr->dstaddr,
					ia->ia_dstaddr.sin_len);
			} else
				bzero(&iflr->dstaddr, sizeof(iflr->dstaddr));

			iflr->prefixlen =
				in_mask2len(&ia->ia_sockmask.sin_addr);

			iflr->flags = 0;	/*XXX*/

			return 0;
		} else {
			struct in_aliasreq ifra;

			/* fill in_aliasreq and do ioctl(SIOCDIFADDR_IN6) */
			bzero(&ifra, sizeof(ifra));
			bcopy(iflr->iflr_name, ifra.ifra_name,
				sizeof(ifra.ifra_name));

			bcopy(&ia->ia_addr, &ifra.ifra_addr,
				ia->ia_addr.sin_len);
			if ((ifp->if_flags & IFF_POINTOPOINT) != 0) {
				bcopy(&ia->ia_dstaddr, &ifra.ifra_dstaddr,
					ia->ia_dstaddr.sin_len);
			}
			bcopy(&ia->ia_sockmask, &ifra.ifra_dstaddr,
				ia->ia_sockmask.sin_len);

			return in_control(so, SIOCDIFADDR, (caddr_t)&ifra,
					  ifp, td);
		}
	    }
	}

	return EOPNOTSUPP;	/*just for safety*/
}

/*
 * Delete any existing route for an interface.
 */
void
in_ifscrub(ifp, ia)
	register struct ifnet *ifp;
	register struct in_ifaddr *ia;
{
	in_scrubprefix(ia);

	/*
	 * delete receive path rtentry's if they exist.
	 */
	in_ifremrecv(ia);
}

/*
 * Initialize an interface's internet address
 * and routing table entry.
 */
static int
in_ifinit(ifp, ia, sin, scrub)
	register struct ifnet *ifp;
	register struct in_ifaddr *ia;
	struct sockaddr_in *sin;
	int scrub;
{
	register u_long i = ntohl(sin->sin_addr.s_addr);
	struct sockaddr_in oldaddr;
	int s = splimp(), flags = RTF_UP, error = 0;

	oldaddr = ia->ia_addr;
	if (oldaddr.sin_family == AF_INET)
		LIST_REMOVE(ia, ia_hash);
	ia->ia_addr = *sin;
	if (ia->ia_addr.sin_family == AF_INET)
		LIST_INSERT_HEAD(INADDR_HASH(ia->ia_addr.sin_addr.s_addr),
		    ia, ia_hash);
	/*
	 * Give the interface a chance to initialize
	 * if this is its first address,
	 * and to validate the address if necessary.
	 */
	if (ifp->if_ioctl &&
	    (error = (*ifp->if_ioctl)(ifp, SIOCSIFADDR, (caddr_t)ia))) {
		splx(s);
		/* LIST_REMOVE(ia, ia_hash) is done in in_control */
		ia->ia_addr = oldaddr;
		if (ia->ia_addr.sin_family == AF_INET)
			LIST_INSERT_HEAD(INADDR_HASH(ia->ia_addr.sin_addr.s_addr),
			    ia, ia_hash);
		return (error);
	}
	splx(s);
	if (scrub) {
		ia->ia_ifa.ifa_addr = (struct sockaddr *)&oldaddr;
		in_ifscrub(ifp, ia);
		ia->ia_ifa.ifa_addr = (struct sockaddr *)&ia->ia_addr;
	}
	if (IN_CLASSA(i))
		ia->ia_netmask = IN_CLASSA_NET;
	else if (IN_CLASSB(i))
		ia->ia_netmask = IN_CLASSB_NET;
	else
		ia->ia_netmask = IN_CLASSC_NET;
	/*
	 * The subnet mask usually includes at least the standard network part,
	 * but may may be smaller in the case of supernetting.
	 * If it is set, we believe it.
	 */
	if (ia->ia_subnetmask == 0) {
		ia->ia_subnetmask = ia->ia_netmask;
		ia->ia_sockmask.sin_addr.s_addr = htonl(ia->ia_subnetmask);
	} else
		ia->ia_netmask &= ia->ia_subnetmask;
	ia->ia_net = i & ia->ia_netmask;
	ia->ia_subnet = i & ia->ia_subnetmask;
	in_socktrim(&ia->ia_sockmask);
	/*
	 * Add route for the network.
	 */
	ia->ia_ifa.ifa_metric = ifp->if_metric;
	if (ifp->if_flags & IFF_BROADCAST) {
		ia->ia_broadaddr.sin_addr.s_addr =
			htonl(ia->ia_subnet | ~ia->ia_subnetmask);
		ia->ia_netbroadcast.s_addr =
			htonl(ia->ia_net | ~ ia->ia_netmask);
	} else if (ifp->if_flags & IFF_LOOPBACK) {
		ia->ia_dstaddr = ia->ia_addr;
		flags |= RTF_HOST;
	} else if (ifp->if_flags & IFF_POINTOPOINT) {
		if (ia->ia_dstaddr.sin_family != AF_INET)
			return (0);
		flags |= RTF_HOST;
	}
	if ((error = in_addprefix(ia, flags)) != 0)
		return (error);

	/*
	 * If the interface supports multicast, join the "all hosts"
	 * multicast group on that interface.
	 */
	if (ifp->if_flags & IFF_MULTICAST) {
		struct in_addr addr;

		addr.s_addr = htonl(INADDR_ALLHOSTS_GROUP);
		in_addmulti(&addr, ifp);
	}

	/*
	 * Bring online receive adjacency routes.
	 * -james 2004/12/17
	 *
	 * Deleted old 2004-09-09 kludge code; this is a cleaner
	 * approach, derived from KAME implementation for INET6.
	 */
	in_ifaddrecv(ia);

	return (error);
}

#define rtinitflags(x) \
	((((x)->ia_ifp->if_flags & (IFF_LOOPBACK | IFF_POINTOPOINT)) != 0) \
	    ? RTF_HOST : 0)
/*
 * Check if we have a route for the given prefix already or add a one
 * accordingly.
 */
static int
in_addprefix(target, flags)
	struct in_ifaddr *target;
	int flags;
{
	struct in_ifaddr *ia;
	struct in_addr prefix, mask, p;
	int error;

	if ((flags & RTF_HOST) != 0)
		prefix = target->ia_dstaddr.sin_addr;
	else {
		prefix = target->ia_addr.sin_addr;
		mask = target->ia_sockmask.sin_addr;
		prefix.s_addr &= mask.s_addr;
	}

	TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link) {
		if (rtinitflags(ia))
			p = ia->ia_dstaddr.sin_addr;
		else {
			p = ia->ia_addr.sin_addr;
			p.s_addr &= ia->ia_sockmask.sin_addr.s_addr;
		}

		if (prefix.s_addr != p.s_addr)
			continue;

		/*
		 * If we got a matching prefix route inserted by other
		 * interface address, we are done here.
		 */
		if (ia->ia_flags & IFA_ROUTE)
			return 0;
	}

	/*
	 * No-one seem to have this prefix route, so we try to insert it.
	 */
	error = rtinit(&target->ia_ifa, (int)RTM_ADD, flags);
	if (!error)
		target->ia_flags |= IFA_ROUTE;
	return error;
}



/*
 * If there is no other address in the system that can serve a route to the
 * same prefix, remove the route.  Hand over the route to the new address
 * otherwise.
 */
static int
in_scrubprefix(target)
	struct in_ifaddr *target;
{
	struct in_ifaddr *ia;
	struct in_addr prefix, mask, p;
	int error;

	if ((target->ia_flags & IFA_ROUTE) == 0)
		return 0;

	if (rtinitflags(target))
		prefix = target->ia_dstaddr.sin_addr;
	else {
		prefix = target->ia_addr.sin_addr;
		mask = target->ia_sockmask.sin_addr;
		prefix.s_addr &= mask.s_addr;
	}

	TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link) {
		if (rtinitflags(ia))
			p = ia->ia_dstaddr.sin_addr;
		else {
			p = ia->ia_addr.sin_addr;
			p.s_addr &= ia->ia_sockmask.sin_addr.s_addr;
		}

		if (prefix.s_addr != p.s_addr)
			continue;

		/*
		 * If we got a matching prefix address, move IFA_ROUTE and
		 * the route itself to it.  Make sure that routing daemons
		 * get a heads-up.
		 */
		if ((ia->ia_flags & IFA_ROUTE) == 0) {
			rtinit(&(target->ia_ifa), (int)RTM_DELETE,
			    rtinitflags(target));
			target->ia_flags &= ~IFA_ROUTE;

			error = rtinit(&ia->ia_ifa, (int)RTM_ADD,
			    rtinitflags(ia) | RTF_UP);
			if (error == 0)
				ia->ia_flags |= IFA_ROUTE;
			return error;
		}
	}

	/*
	 * As no-one seem to have this prefix, we can remove the route.
	 */
	rtinit(&(target->ia_ifa), (int)RTM_DELETE, rtinitflags(target));
	target->ia_flags &= ~IFA_ROUTE;
	return 0;
}

#undef rtinitflags

/*
 * Return 1 if the address might be a local broadcast address.
 */
int
in_broadcast(in, ifp)
	struct in_addr in;
	struct ifnet *ifp;
{
	register struct ifaddr *ifa;
	u_long t;

	if (in.s_addr == INADDR_BROADCAST ||
	    in.s_addr == INADDR_ANY)
		return 1;
	if ((ifp->if_flags & IFF_BROADCAST) == 0)
		return 0;
	t = ntohl(in.s_addr);
	/*
	 * Look through the list of addresses for a match
	 * with a broadcast address.
	 */
#define ia ((struct in_ifaddr *)ifa)
	TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link)
		if (ifa->ifa_addr->sa_family == AF_INET &&
		    (in.s_addr == ia->ia_broadaddr.sin_addr.s_addr ||
		     in.s_addr == ia->ia_netbroadcast.s_addr ||
		     /*
		      * Check for old-style (host 0) broadcast.
		      */
		     t == ia->ia_subnet || t == ia->ia_net) &&
		     /*
		      * Check for an all one subnetmask. These
		      * only exist when an interface gets a secondary
		      * address.
		      */
		     ia->ia_subnetmask != (u_long)0xffffffff)
			    return 1;
	return (0);
#undef ia
}
/*
 * Add an address to the list of IP multicast addresses for a given interface.
 */
struct in_multi *
in_addmulti(ap, ifp)
	register struct in_addr *ap;
	register struct ifnet *ifp;
{
	register struct in_multi *inm;
	int error;
	struct sockaddr_in sin;
	struct ifmultiaddr *ifma;
	int s = splnet();

	/*
	 * Call generic routine to add membership or increment
	 * refcount.  It wants addresses in the form of a sockaddr,
	 * so we build one here (being careful to zero the unused bytes).
	 */
	bzero(&sin, sizeof sin);
	sin.sin_family = AF_INET;
	sin.sin_len = sizeof sin;
	sin.sin_addr = *ap;
	error = if_addmulti(ifp, (struct sockaddr *)&sin, &ifma);
	if (error) {
		splx(s);
		return 0;
	}

	/*
	 * If ifma->ifma_protospec is null, then if_addmulti() created
	 * a new record.  Otherwise, we are done.
	 */
	if (ifma->ifma_protospec != 0) {
		splx(s);
		return ifma->ifma_protospec;
	}

	/* XXX - if_addmulti uses M_WAITOK.  Can this really be called
	   at interrupt time?  If so, need to fix if_addmulti. XXX */
	inm = (struct in_multi *)malloc(sizeof(*inm), M_IPMADDR,
	    M_NOWAIT | M_ZERO);
	if (inm == NULL) {
		splx(s);
		return (NULL);
	}

	inm->inm_addr = *ap;
	inm->inm_ifp = ifp;
	inm->inm_ifma = ifma;
	ifma->ifma_protospec = inm;
	LIST_INSERT_HEAD(&in_multihead, inm, inm_link);

	/*
	 * Let IGMP know that we have joined a new IP multicast group.
	 */
	igmp_joingroup(inm);
	splx(s);
	return (inm);
}

/*
 * Delete a multicast address record.
 */
void
in_delmulti(inm)
	register struct in_multi *inm;
{
	struct ifmultiaddr *ifma = inm->inm_ifma;
	struct in_multi my_inm;
	int s = splnet();

	my_inm.inm_ifp = NULL ; /* don't send the leave msg */
	if (ifma->ifma_refcount == 1) {
		/*
		 * No remaining claims to this record; let IGMP know that
		 * we are leaving the multicast group.
		 * But do it after the if_delmulti() which might reset
		 * the interface and nuke the packet.
		 */
		my_inm = *inm ;
		ifma->ifma_protospec = 0;
		LIST_REMOVE(inm, inm_link);
		free(inm, M_IPMADDR);
	}
	/* XXX - should be separate API for when we have an ifma? */
	if_delmulti(ifma->ifma_ifp, ifma->ifma_addr);
	if (my_inm.inm_ifp != NULL)
		igmp_leavegroup(&my_inm);
	splx(s);
}

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="in.c.diff"

--- in.org.c	Mon Dec 27 01:43:19 2004
+++ in.c	Mon Dec 27 01:42:40 2004
@@ -28,7 +28,7 @@
  * SUCH DAMAGE.
  *
  *	@(#)in.c	8.4 (Berkeley) 1/9/95
- * $FreeBSD: /repoman/r/ncvs/src/sys/netinet/in.c,v 1.77.2.1 2004/12/12 19:12:35 mlaier Exp $
+ * $FreeBSD: src/sys/netinet/in.c,v 1.77.2.1 2004/12/12 19:12:35 mlaier Exp $
  */
 
 #include <sys/param.h>
@@ -136,6 +136,159 @@
 }
 
 /*
+ * Sub-routine for in_ifaddrecv() and in_ifremrecv().
+ * --james@towardex.com 12/17/2004
+ */
+static void
+in_ifrecv_request(int call, int cmd, struct in_ifaddr *ia)
+{
+	struct sockaddr_in all1_sa;
+	struct rtentry *nrt = NULL;
+	struct ifaddr *ifa;
+	int e = 0;
+	struct sockaddr_in subnet = { sizeof(struct sockaddr_in), AF_INET };
+	struct sockaddr_in loopback = { sizeof(struct sockaddr_in), AF_INET };
+
+	ifa = &ia->ia_ifa;
+
+       	bzero(&all1_sa, sizeof(all1_sa));
+        all1_sa.sin_family = AF_INET;
+        all1_sa.sin_len = sizeof(struct sockaddr_in);
+        all1_sa.sin_addr.s_addr = (u_int32_t)0xffffffff;
+
+	/* We need to manually specify loopback for network and broadcast
+	 * addresses because we can't just let L2 rtrequest handlers to
+	 * deal with ifa->if_addr set as gateway address.
+	 */
+        loopback.sin_family = AF_INET;
+        loopback.sin_addr.s_addr = ntohl(INADDR_LOOPBACK);
+
+	/*
+	 * Set the rtflags to RTF_LLINFO so existing apps are happy
+	 * with our changes.
+	 */
+	switch (call) {
+	case 0:  /* own address request */
+        	rtrequest(cmd, ifa->ifa_addr, sintosa(&loopback),
+        	  (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt);
+		break;
+	case 1:  /* network address request */
+        	rtrequest(cmd, sintosa(&ia->ia_dstaddr), sintosa(&loopback),
+        	  (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt);
+		break;
+	case 2:  /* broadcast address request */
+		subnet.sin_addr.s_addr = htonl(ia->ia_subnet);
+		subnet.sin_family = AF_INET;
+
+        	rtrequest(cmd, sintosa(&subnet), sintosa(&loopback),
+        	  (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt);
+		break;
+	default:
+		break;
+	}
+
+        if (nrt) {
+                RT_LOCK(nrt);
+                /*
+                 * Make sure rt_ifa be equal to IFA, the second argument of
+                 * the function.  We need this because when we refer to
+                 * rt_ifa->ia_flags, we assume that the rt_ifa points to
+		 * the address, not the loopback.
+                 */
+                if (cmd == RTM_ADD && ifa != nrt->rt_ifa) {
+                        IFAFREE(nrt->rt_ifa);
+                        IFAREF(ifa);
+                        nrt->rt_ifa = ifa;
+                }
+                /*
+		 * Report to routing socket.
+                 */
+                rt_newaddrmsg(cmd, ifa, e, nrt);
+                if (cmd == RTM_DELETE) {
+                        rtfree(nrt);
+                } else {
+                        /* the cmd must be RTM_ADD here */
+                        RT_REMREF(nrt);
+                        RT_UNLOCK(nrt);
+                }
+        }
+}
+
+
+/*
+ * Add own address as loopback rtentry (receive path). We previously add
+ * the route only if necessary (such as point to point circuit), or when
+ * triggered by route cloning. However, a proper RIB and FIB implementation
+ * must contain own-addrs as receive paths, allowing software to manage
+ * its own addresses separately from prefixes. This is required for receive
+ * adjacency/path in ip_fastforward() --james@towardex.com 2004/12/17
+ */
+static void
+in_ifaddrecv(struct in_ifaddr *ia)
+{
+	struct rtentry *rt;
+	int need_loop, need_netdst, need_bcast;
+	struct sockaddr_in subnet = { sizeof(struct sockaddr_in), AF_INET };
+
+	/* If there is no loopback entry, allocate one */
+	rt = rtalloc1(ia->ia_ifa.ifa_addr, 0, 0);
+	need_loop = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 ||
+	  (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0);
+
+	/* If there is no network entry, allocate one */
+	if(rt) rtfree(rt);
+	rt = rtalloc1(sintosa(&ia->ia_dstaddr), 0, 0);
+	need_netdst = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 ||
+	  (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0);
+	
+	/* If there is no broadcast entry, allocate one */
+	subnet.sin_addr.s_addr = htonl(ia->ia_subnet);
+	subnet.sin_family = AF_INET;
+	if(rt) rtfree(rt);
+	rt = rtalloc1(sintosa(&subnet), 0, 0);
+	need_bcast = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 ||
+	  (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0);
+
+	if(rt)
+	  rtfree(rt);
+
+	if(need_loop)
+	  in_ifrecv_request(0, RTM_ADD, ia);
+	if(need_netdst)
+	  in_ifrecv_request(1, RTM_ADD, ia);
+	if(need_bcast)
+	  in_ifrecv_request(2, RTM_ADD, ia);
+}
+
+
+/*
+ * Remove loopback rtentry's of receive path generated by in_ifaddrecv()
+ * if they exist. -- james 12/17/2004
+ */
+static void
+in_ifremrecv(struct in_ifaddr *ia)
+{
+        struct rtentry *rt;
+        
+	/*
+	 * Delete the route for ownaddr if it really exists.
+	 */ 
+        rt = rtalloc1(ia->ia_ifa.ifa_addr, 0, 0);
+        if (rt != NULL && (rt->rt_flags & RTF_HOST) != 0 &&
+             (rt->rt_ifp->if_flags & IFF_LOOPBACK) != 0) {
+                  rtfree(rt);
+                  in_ifrecv_request(0, RTM_DELETE, ia);
+	}
+
+	/* XXX
+	 * Broadcast and network addresses are removed by
+	 * by regular interface detach handlers, but we
+	 * need to verify the design aspect of this more
+	 * later.
+	 */
+}
+
+/*
  * Trim a mask in a sockaddr
  */
 static void
@@ -658,6 +811,11 @@
 	register struct in_ifaddr *ia;
 {
 	in_scrubprefix(ia);
+
+	/*
+	 * delete receive path rtentry's if they exist.
+	 */
+	in_ifremrecv(ia);
 }
 
 /*
@@ -752,6 +910,16 @@
 		addr.s_addr = htonl(INADDR_ALLHOSTS_GROUP);
 		in_addmulti(&addr, ifp);
 	}
+
+	/*
+	 * Bring online receive adjacency routes.
+	 * -james 2004/12/17
+	 *
+	 * Deleted old 2004-09-09 kludge code; this is a cleaner
+	 * approach, derived from KAME implementation for INET6.
+	 */
+	in_ifaddrecv(ia);
+
 	return (error);
 }
 
@@ -806,6 +974,8 @@
 		target->ia_flags |= IFA_ROUTE;
 	return error;
 }
+
+
 
 /*
  * If there is no other address in the system that can serve a route to the

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="inet.c"

/*
 * Copyright (c) 1983, 1988, 1993, 1995
 *	The Regents of the University of California.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. All advertising materials mentioning features or use of this software
 *    must display the following acknowledgement:
 *	This product includes software developed by the University of
 *	California, Berkeley and its contributors.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

#if 0
#ifndef lint
static char sccsid[] = "@(#)inet.c	8.5 (Berkeley) 5/24/95";
#endif /* not lint */
#endif

#include <sys/cdefs.h>
__FBSDID("$FreeBSD: src/usr.bin/netstat/inet.c,v 1.67 2004/07/26 20:18:11 charnier Exp $");

#include <sys/param.h>
#include <sys/queue.h>
#include <sys/socket.h>
#include <sys/socketvar.h>
#include <sys/sysctl.h>
#include <sys/protosw.h>

#include <net/route.h>
#include <netinet/in.h>
#include <netinet/in_systm.h>
#include <netinet/ip.h>
#ifdef INET6
#include <netinet/ip6.h>
#endif /* INET6 */
#include <netinet/in_pcb.h>
#include <netinet/ip_icmp.h>
#include <netinet/icmp_var.h>
#include <netinet/igmp_var.h>
#include <netinet/ip_var.h>
#include <netinet/pim_var.h>
#include <netinet/tcp.h>
#include <netinet/tcpip.h>
#include <netinet/tcp_seq.h>
#define TCPSTATES
#include <netinet/tcp_fsm.h>
#include <netinet/tcp_timer.h>
#include <netinet/tcp_var.h>
#include <netinet/tcp_debug.h>
#include <netinet/udp.h>
#include <netinet/udp_var.h>

#include <arpa/inet.h>
#include <err.h>
#include <errno.h>
#include <libutil.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "netstat.h"

char	*inetname (struct in_addr *);
void	inetprint (struct in_addr *, int, const char *, int);
#ifdef INET6
static int udp_done, tcp_done;
#endif /* INET6 */

/*
 * Print a summary of connections related to an Internet
 * protocol.  For TCP, also give state of connection.
 * Listening processes (aflag) are suppressed unless the
 * -a (all) flag is specified.
 */
void
protopr(u_long proto,		/* for sysctl version we pass proto # */
	const char *name, int af1)
{
	int istcp;
	static int first = 1;
	char *buf;
	const char *mibvar, *vchar;
	struct tcpcb *tp = NULL;
	struct inpcb *inp;
	struct xinpgen *xig, *oxig;
	struct xsocket *so;
	size_t len;

	istcp = 0;
	switch (proto) {
	case IPPROTO_TCP:
#ifdef INET6
		if (tcp_done != 0)
			return;
		else
			tcp_done = 1;
#endif
		istcp = 1;
		mibvar = "net.inet.tcp.pcblist";
		break;
	case IPPROTO_UDP:
#ifdef INET6
		if (udp_done != 0)
			return;
		else
			udp_done = 1;
#endif
		mibvar = "net.inet.udp.pcblist";
		break;
	case IPPROTO_DIVERT:
		mibvar = "net.inet.divert.pcblist";
		break;
	default:
		mibvar = "net.inet.raw.pcblist";
		break;
	}
	len = 0;
	if (sysctlbyname(mibvar, 0, &len, 0, 0) < 0) {
		if (errno != ENOENT)
			warn("sysctl: %s", mibvar);
		return;
	}
	if ((buf = malloc(len)) == 0) {
		warnx("malloc %lu bytes", (u_long)len);
		return;
	}
	if (sysctlbyname(mibvar, buf, &len, 0, 0) < 0) {
		warn("sysctl: %s", mibvar);
		free(buf);
		return;
	}

	oxig = xig = (struct xinpgen *)buf;
	for (xig = (struct xinpgen *)((char *)xig + xig->xig_len);
	     xig->xig_len > sizeof(struct xinpgen);
	     xig = (struct xinpgen *)((char *)xig + xig->xig_len)) {
		if (istcp) {
			tp = &((struct xtcpcb *)xig)->xt_tp;
			inp = &((struct xtcpcb *)xig)->xt_inp;
			so = &((struct xtcpcb *)xig)->xt_socket;
		} else {
			inp = &((struct xinpcb *)xig)->xi_inp;
			so = &((struct xinpcb *)xig)->xi_socket;
		}

		/* Ignore sockets for protocols other than the desired one. */
		if (so->xso_protocol != (int)proto)
			continue;

		/* Ignore PCBs which were freed during copyout. */
		if (inp->inp_gencnt > oxig->xig_gen)
			continue;

		if ((af1 == AF_INET && (inp->inp_vflag & INP_IPV4) == 0)
#ifdef INET6
		    || (af1 == AF_INET6 && (inp->inp_vflag & INP_IPV6) == 0)
#endif /* INET6 */
		    || (af1 == AF_UNSPEC && ((inp->inp_vflag & INP_IPV4) == 0
#ifdef INET6
					    && (inp->inp_vflag &
						INP_IPV6) == 0
#endif /* INET6 */
			))
		    )
			continue;
		if (!aflag &&
		    (
		     (istcp && tp->t_state == TCPS_LISTEN)
		     || (af1 == AF_INET &&
		      inet_lnaof(inp->inp_laddr) == INADDR_ANY)
#ifdef INET6
		     || (af1 == AF_INET6 &&
			 IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr))
#endif /* INET6 */
		     || (af1 == AF_UNSPEC &&
			 (((inp->inp_vflag & INP_IPV4) != 0 &&
			   inet_lnaof(inp->inp_laddr) == INADDR_ANY)
#ifdef INET6
			  || ((inp->inp_vflag & INP_IPV6) != 0 &&
			      IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr))
#endif
			  ))
		     ))
			continue;

		if (first) {
			if (!Lflag) {
				printf("Active Internet connections");
				if (aflag)
					printf(" (including servers)");
			} else
				printf(
	"Current listen queue sizes (qlen/incqlen/maxqlen)");
			putchar('\n');
			if (Aflag)
				printf("%-8.8s ", "Socket");
			if (Lflag)
				printf("%-5.5s %-14.14s %-22.22s\n",
					"Proto", "Listen", "Local Address");
			else
				printf((Aflag && !Wflag) ?
		"%-5.5s %-6.6s %-6.6s  %-18.18s %-18.18s %s\n" :
		"%-5.5s %-6.6s %-6.6s  %-22.22s %-22.22s %s\n",
					"Proto", "Recv-Q", "Send-Q",
					"Local Address", "Foreign Address",
					"(state)");
			first = 0;
		}
		if (Lflag && so->so_qlimit == 0)
			continue;
		if (Aflag) {
			if (istcp)
				printf("%8lx ", (u_long)inp->inp_ppcb);
			else
				printf("%8lx ", (u_long)so->so_pcb);
		}
#ifdef INET6
		if ((inp->inp_vflag & INP_IPV6) != 0)
			vchar = ((inp->inp_vflag & INP_IPV4) != 0)
				? "46" : "6 ";
		else
#endif
		vchar = ((inp->inp_vflag & INP_IPV4) != 0)
				? "4 " : "  ";
		printf("%-3.3s%-2.2s ", name, vchar);
		if (Lflag) {
			char buf1[15];

			snprintf(buf1, 15, "%d/%d/%d", so->so_qlen,
				 so->so_incqlen, so->so_qlimit);
			printf("%-14.14s ", buf1);
		} else {
			printf("%6u %6u  ",
			       so->so_rcv.sb_cc,
			       so->so_snd.sb_cc);
		}
		if (numeric_port) {
			if (inp->inp_vflag & INP_IPV4) {
				inetprint(&inp->inp_laddr, (int)inp->inp_lport,
					  name, 1);
				if (!Lflag)
					inetprint(&inp->inp_faddr,
						  (int)inp->inp_fport, name, 1);
			}
#ifdef INET6
			else if (inp->inp_vflag & INP_IPV6) {
				inet6print(&inp->in6p_laddr,
					   (int)inp->inp_lport, name, 1);
				if (!Lflag)
					inet6print(&inp->in6p_faddr,
						   (int)inp->inp_fport, name, 1);
			} /* else nothing printed now */
#endif /* INET6 */
		} else if (inp->inp_flags & INP_ANONPORT) {
			if (inp->inp_vflag & INP_IPV4) {
				inetprint(&inp->inp_laddr, (int)inp->inp_lport,
					  name, 1);
				if (!Lflag)
					inetprint(&inp->inp_faddr,
						  (int)inp->inp_fport, name, 0);
			}
#ifdef INET6
			else if (inp->inp_vflag & INP_IPV6) {
				inet6print(&inp->in6p_laddr,
					   (int)inp->inp_lport, name, 1);
				if (!Lflag)
					inet6print(&inp->in6p_faddr,
						   (int)inp->inp_fport, name, 0);
			} /* else nothing printed now */
#endif /* INET6 */
		} else {
			if (inp->inp_vflag & INP_IPV4) {
				inetprint(&inp->inp_laddr, (int)inp->inp_lport,
					  name, 0);
				if (!Lflag)
					inetprint(&inp->inp_faddr,
						  (int)inp->inp_fport, name,
						  inp->inp_lport !=
							inp->inp_fport);
			}
#ifdef INET6
			else if (inp->inp_vflag & INP_IPV6) {
				inet6print(&inp->in6p_laddr,
					   (int)inp->inp_lport, name, 0);
				if (!Lflag)
					inet6print(&inp->in6p_faddr,
						   (int)inp->inp_fport, name,
						   inp->inp_lport !=
							inp->inp_fport);
			} /* else nothing printed now */
#endif /* INET6 */
		}
		if (istcp && !Lflag) {
			if (tp->t_state < 0 || tp->t_state >= TCP_NSTATES)
				printf("%d", tp->t_state);
                      else {
				printf("%s", tcpstates[tp->t_state]);
#if defined(TF_NEEDSYN) && defined(TF_NEEDFIN)
                              /* Show T/TCP `hidden state' */
                              if (tp->t_flags & (TF_NEEDSYN|TF_NEEDFIN))
                                      putchar('*');
#endif /* defined(TF_NEEDSYN) && defined(TF_NEEDFIN) */
                      }
		}
		putchar('\n');
	}
	if (xig != oxig && xig->xig_gen != oxig->xig_gen) {
		if (oxig->xig_count > xig->xig_count) {
			printf("Some %s sockets may have been deleted.\n",
			       name);
		} else if (oxig->xig_count < xig->xig_count) {
			printf("Some %s sockets may have been created.\n",
			       name);
		} else {
			printf("Some %s sockets may have been created or deleted.\n",
			       name);
		}
	}
	free(buf);
}

/*
 * Dump TCP statistics structure.
 */
void
tcp_stats(u_long off __unused, const char *name, int af1 __unused)
{
	struct tcpstat tcpstat, zerostat;
	size_t len = sizeof tcpstat;
	
	if (zflag)
		memset(&zerostat, 0, len);
	if (sysctlbyname("net.inet.tcp.stats", &tcpstat, &len,
	    zflag ? &zerostat : NULL, zflag ? len : 0) < 0) {
		warn("sysctl: net.inet.tcp.stats");
		return;
	}

#ifdef INET6
	if (tcp_done != 0)
		return;
	else
		tcp_done = 1;
#endif

	printf ("%s:\n", name);

#define	p(f, m) if (tcpstat.f || sflag <= 1) \
    printf(m, tcpstat.f, plural(tcpstat.f))
#define	p1a(f, m) if (tcpstat.f || sflag <= 1) \
    printf(m, tcpstat.f)
#define	p2(f1, f2, m) if (tcpstat.f1 || tcpstat.f2 || sflag <= 1) \
    printf(m, tcpstat.f1, plural(tcpstat.f1), tcpstat.f2, plural(tcpstat.f2))
#define	p2a(f1, f2, m) if (tcpstat.f1 || tcpstat.f2 || sflag <= 1) \
    printf(m, tcpstat.f1, plural(tcpstat.f1), tcpstat.f2)
#define	p3(f, m) if (tcpstat.f || sflag <= 1) \
    printf(m, tcpstat.f, plurales(tcpstat.f))

	p(tcps_sndtotal, "\t%lu packet%s sent\n");
	p2(tcps_sndpack,tcps_sndbyte,
		"\t\t%lu data packet%s (%lu byte%s)\n");
	p2(tcps_sndrexmitpack, tcps_sndrexmitbyte,
		"\t\t%lu data packet%s (%lu byte%s) retransmitted\n");
	p(tcps_sndrexmitbad,
		"\t\t%lu data packet%s unnecessarily retransmitted\n");
	p(tcps_mturesent, "\t\t%lu resend%s initiated by MTU discovery\n");
	p2a(tcps_sndacks, tcps_delack,
		"\t\t%lu ack-only packet%s (%lu delayed)\n");
	p(tcps_sndurg, "\t\t%lu URG only packet%s\n");
	p(tcps_sndprobe, "\t\t%lu window probe packet%s\n");
	p(tcps_sndwinup, "\t\t%lu window update packet%s\n");
	p(tcps_sndctrl, "\t\t%lu control packet%s\n");
	p(tcps_rcvtotal, "\t%lu packet%s received\n");
	p2(tcps_rcvackpack, tcps_rcvackbyte, "\t\t%lu ack%s (for %lu byte%s)\n");
	p(tcps_rcvdupack, "\t\t%lu duplicate ack%s\n");
	p(tcps_rcvacktoomuch, "\t\t%lu ack%s for unsent data\n");
	p2(tcps_rcvpack, tcps_rcvbyte,
		"\t\t%lu packet%s (%lu byte%s) received in-sequence\n");
	p2(tcps_rcvduppack, tcps_rcvdupbyte,
		"\t\t%lu completely duplicate packet%s (%lu byte%s)\n");
	p(tcps_pawsdrop, "\t\t%lu old duplicate packet%s\n");
	p2(tcps_rcvpartduppack, tcps_rcvpartdupbyte,
		"\t\t%lu packet%s with some dup. data (%lu byte%s duped)\n");
	p2(tcps_rcvoopack, tcps_rcvoobyte,
		"\t\t%lu out-of-order packet%s (%lu byte%s)\n");
	p2(tcps_rcvpackafterwin, tcps_rcvbyteafterwin,
		"\t\t%lu packet%s (%lu byte%s) of data after window\n");
	p(tcps_rcvwinprobe, "\t\t%lu window probe%s\n");
	p(tcps_rcvwinupd, "\t\t%lu window update packet%s\n");
	p(tcps_rcvafterclose, "\t\t%lu packet%s received after close\n");
	p(tcps_rcvbadsum, "\t\t%lu discarded for bad checksum%s\n");
	p(tcps_rcvbadoff, "\t\t%lu discarded for bad header offset field%s\n");
	p1a(tcps_rcvshort, "\t\t%lu discarded because packet too short\n");
	p(tcps_connattempt, "\t%lu connection request%s\n");
	p(tcps_accepts, "\t%lu connection accept%s\n");
	p(tcps_badsyn, "\t%lu bad connection attempt%s\n");
	p(tcps_listendrop, "\t%lu listen queue overflow%s\n");
	p(tcps_badrst, "\t%lu ignored RSTs in the window%s\n");
	p(tcps_connects, "\t%lu connection%s established (including accepts)\n");
	p2(tcps_closed, tcps_drops,
		"\t%lu connection%s closed (including %lu drop%s)\n");
	p(tcps_cachedrtt, "\t\t%lu connection%s updated cached RTT on close\n");
	p(tcps_cachedrttvar, 
	  "\t\t%lu connection%s updated cached RTT variance on close\n");
	p(tcps_cachedssthresh,
	  "\t\t%lu connection%s updated cached ssthresh on close\n");
	p(tcps_conndrops, "\t%lu embryonic connection%s dropped\n");
	p2(tcps_rttupdated, tcps_segstimed,
		"\t%lu segment%s updated rtt (of %lu attempt%s)\n");
	p(tcps_rexmttimeo, "\t%lu retransmit timeout%s\n");
	p(tcps_timeoutdrop, "\t\t%lu connection%s dropped by rexmit timeout\n");
	p(tcps_persisttimeo, "\t%lu persist timeout%s\n");
	p(tcps_persistdrop, "\t\t%lu connection%s dropped by persist timeout\n");
	p(tcps_keeptimeo, "\t%lu keepalive timeout%s\n");
	p(tcps_keepprobe, "\t\t%lu keepalive probe%s sent\n");
	p(tcps_keepdrops, "\t\t%lu connection%s dropped by keepalive\n");
	p(tcps_predack, "\t%lu correct ACK header prediction%s\n");
	p(tcps_preddat, "\t%lu correct data packet header prediction%s\n");

	p(tcps_sc_added, "\t%lu syncache entrie%s added\n"); 
	p1a(tcps_sc_retransmitted, "\t\t%lu retransmitted\n"); 
	p1a(tcps_sc_dupsyn, "\t\t%lu dupsyn\n"); 
	p1a(tcps_sc_dropped, "\t\t%lu dropped\n"); 
	p1a(tcps_sc_completed, "\t\t%lu completed\n"); 
	p1a(tcps_sc_bucketoverflow, "\t\t%lu bucket overflow\n"); 
	p1a(tcps_sc_cacheoverflow, "\t\t%lu cache overflow\n"); 
	p1a(tcps_sc_reset, "\t\t%lu reset\n"); 
	p1a(tcps_sc_stale, "\t\t%lu stale\n"); 
	p1a(tcps_sc_aborted, "\t\t%lu aborted\n"); 
	p1a(tcps_sc_badack, "\t\t%lu badack\n"); 
	p1a(tcps_sc_unreach, "\t\t%lu unreach\n"); 
	p(tcps_sc_zonefail, "\t\t%lu zone failure%s\n"); 
	p(tcps_sc_sendcookie, "\t%lu cookie%s sent\n"); 
	p(tcps_sc_recvcookie, "\t%lu cookie%s received\n"); 

	p(tcps_sack_recovery_episode, "\t%lu SACK recovery episode%s\n"); 
	p(tcps_sack_rexmits,
		"\t%lu segment rexmit%s in SACK recovery episodes\n");
	p(tcps_sack_rexmit_bytes,
		"\t%lu byte rexmit%s in SACK recovery episodes\n"); 
	p(tcps_sack_rcv_blocks,
		"\t%lu SACK option%s (SACK blocks) received\n"); 
	p(tcps_sack_send_blocks, "\t%lu SACK option%s (SACK blocks) sent\n"); 

#undef p
#undef p1a
#undef p2
#undef p2a
#undef p3
}

/*
 * Dump UDP statistics structure.
 */
void
udp_stats(u_long off __unused, const char *name, int af1 __unused)
{
	struct udpstat udpstat, zerostat;
	size_t len = sizeof udpstat;
	u_long delivered;

	if (zflag)
		memset(&zerostat, 0, len);
	if (sysctlbyname("net.inet.udp.stats", &udpstat, &len,
	    zflag ? &zerostat : NULL, zflag ? len : 0) < 0) {
		warn("sysctl: net.inet.udp.stats");
		return;
	}

#ifdef INET6
	if (udp_done != 0)
		return;
	else
		udp_done = 1;
#endif

	printf("%s:\n", name);
#define	p(f, m) if (udpstat.f || sflag <= 1) \
    printf(m, udpstat.f, plural(udpstat.f))
#define	p1a(f, m) if (udpstat.f || sflag <= 1) \
    printf(m, udpstat.f)
	p(udps_ipackets, "\t%lu datagram%s received\n");
	p1a(udps_hdrops, "\t%lu with incomplete header\n");
	p1a(udps_badlen, "\t%lu with bad data length field\n");
	p1a(udps_badsum, "\t%lu with bad checksum\n");
	p1a(udps_nosum, "\t%lu with no checksum\n");
	p1a(udps_noport, "\t%lu dropped due to no socket\n");
	p(udps_noportbcast,
	    "\t%lu broadcast/multicast datagram%s dropped due to no socket\n");
	p1a(udps_fullsock, "\t%lu dropped due to full socket buffers\n");
	p1a(udpps_pcbhashmiss, "\t%lu not for hashed pcb\n");
	delivered = udpstat.udps_ipackets -
		    udpstat.udps_hdrops -
		    udpstat.udps_badlen -
		    udpstat.udps_badsum -
		    udpstat.udps_noport -
		    udpstat.udps_noportbcast -
		    udpstat.udps_fullsock;
	if (delivered || sflag <= 1)
		printf("\t%lu delivered\n", delivered);
	p(udps_opackets, "\t%lu datagram%s output\n");
#undef p
#undef p1a
}

/*
 * Dump IP statistics structure.
 */
void
ip_stats(u_long off __unused, const char *name, int af1 __unused)
{
	struct ipstat ipstat, zerostat;
	size_t len = sizeof ipstat;

	if (zflag)
		memset(&zerostat, 0, len);
	if (sysctlbyname("net.inet.ip.stats", &ipstat, &len,
	    zflag ? &zerostat : NULL, zflag ? len : 0) < 0) {
		warn("sysctl: net.inet.ip.stats");
		return;
	}

	printf("%s:\n", name);

#define	p(f, m) if (ipstat.f || sflag <= 1) \
    printf(m, ipstat.f, plural(ipstat.f))
#define	p1a(f, m) if (ipstat.f || sflag <= 1) \
    printf(m, ipstat.f)

	p(ips_total, "\t%lu total packet%s received\n");
	p(ips_badsum, "\t%lu bad header checksum%s\n");
	p1a(ips_toosmall, "\t%lu with size smaller than minimum\n");
	p1a(ips_tooshort, "\t%lu with data size < data length\n");
	p1a(ips_toolong, "\t%lu with ip length > max ip packet size\n");
	p1a(ips_badhlen, "\t%lu with header length < data size\n");
	p1a(ips_badlen, "\t%lu with data length < header length\n");
	p1a(ips_badoptions, "\t%lu with bad options\n");
	p1a(ips_badvers, "\t%lu with incorrect version number\n");
	p(ips_fragments, "\t%lu fragment%s received\n");
	p(ips_fragdropped, "\t%lu fragment%s dropped (dup or out of space)\n");
	p(ips_fragtimeout, "\t%lu fragment%s dropped after timeout\n");
	p(ips_reassembled, "\t%lu packet%s reassembled ok\n");
	p(ips_delivered, "\t%lu packet%s for this host\n");
	p(ips_noproto, "\t%lu packet%s for unknown/unsupported protocol\n");
	p(ips_forward, "\t%lu packet%s forwarded");
	p(ips_fastforward, " (%lu packet%s fast forwarded)");
	if (ipstat.ips_forward || sflag <= 1) 
		putchar('\n');
	p(ips_cantforward, "\t%lu packet%s not forwardable\n");
	p(ips_transit_re, "\t%lu packet%s forwarded to receive path\n");
	p(ips_notmember,
	  "\t%lu packet%s received for unknown multicast group\n");
	p(ips_redirectsent, "\t%lu redirect%s sent\n");
	p(ips_localout, "\t%lu packet%s sent from this host\n");
	p(ips_rawout, "\t%lu packet%s sent with fabricated ip header\n");
	p(ips_odropped,
	  "\t%lu output packet%s dropped due to no bufs, etc.\n");
	p(ips_noroute, "\t%lu output packet%s discarded due to no route\n");
	p(ips_fragmented, "\t%lu output datagram%s fragmented\n");
	p(ips_ofragments, "\t%lu fragment%s created\n");
	p(ips_cantfrag, "\t%lu datagram%s that can't be fragmented\n");
	p(ips_nogif, "\t%lu tunneling packet%s that can't find gif\n");
	p(ips_badaddr, "\t%lu datagram%s with bad address in header\n");
#undef p
#undef p1a
}

static	const char *icmpnames[] = {
	"echo reply",
	"#1",
	"#2",
	"destination unreachable",
	"source quench",
	"routing redirect",
	"#6",
	"#7",
	"echo",
	"router advertisement",
	"router solicitation",
	"time exceeded",
	"parameter problem",
	"time stamp",
	"time stamp reply",
	"information request",
	"information request reply",
	"address mask request",
	"address mask reply",
};

/*
 * Dump ICMP statistics.
 */
void
icmp_stats(u_long off __unused, const char *name, int af1 __unused)
{
	struct icmpstat icmpstat, zerostat;
	int i, first;
	int mib[4];		/* CTL_NET + PF_INET + IPPROTO_ICMP + req */
	size_t len;

	mib[0] = CTL_NET;
	mib[1] = PF_INET;
	mib[2] = IPPROTO_ICMP;
	mib[3] = ICMPCTL_STATS;

	len = sizeof icmpstat;
	if (zflag)
		memset(&zerostat, 0, len);
	if (sysctl(mib, 4, &icmpstat, &len,
	    zflag ? &zerostat : NULL, zflag ? len : 0) < 0) {
		warn("sysctl: net.inet.icmp.stats");
		return;
	}

	printf("%s:\n", name);

#define	p(f, m) if (icmpstat.f || sflag <= 1) \
    printf(m, icmpstat.f, plural(icmpstat.f))
#define	p1a(f, m) if (icmpstat.f || sflag <= 1) \
    printf(m, icmpstat.f)
#define	p2(f, m) if (icmpstat.f || sflag <= 1) \
    printf(m, icmpstat.f, plurales(icmpstat.f))

	p(icps_error, "\t%lu call%s to icmp_error\n");
	p(icps_oldicmp,
	    "\t%lu error%s not generated in response to an icmp message\n");
	for (first = 1, i = 0; i < ICMP_MAXTYPE + 1; i++)
		if (icmpstat.icps_outhist[i] != 0) {
			if (first) {
				printf("\tOutput histogram:\n");
				first = 0;
			}
			printf("\t\t%s: %lu\n", icmpnames[i],
				icmpstat.icps_outhist[i]);
		}
	p(icps_badcode, "\t%lu message%s with bad code fields\n");
	p(icps_tooshort, "\t%lu message%s < minimum length\n");
	p(icps_checksum, "\t%lu bad checksum%s\n");
	p(icps_badlen, "\t%lu message%s with bad length\n");
	p1a(icps_bmcastecho, "\t%lu multicast echo requests ignored\n");
	p1a(icps_bmcasttstamp, "\t%lu multicast timestamp requests ignored\n");
	for (first = 1, i = 0; i < ICMP_MAXTYPE + 1; i++)
		if (icmpstat.icps_inhist[i] != 0) {
			if (first) {
				printf("\tInput histogram:\n");
				first = 0;
			}
			printf("\t\t%s: %lu\n", icmpnames[i],
				icmpstat.icps_inhist[i]);
		}
	p(icps_reflect, "\t%lu message response%s generated\n");
	p2(icps_badaddr, "\t%lu invalid return address%s\n");
	p(icps_noroute, "\t%lu no return route%s\n");
#undef p
#undef p1a
#undef p2
	mib[3] = ICMPCTL_MASKREPL;
	len = sizeof i;
	if (sysctl(mib, 4, &i, &len, (void *)0, 0) < 0)
		return;
	printf("\tICMP address mask responses are %sabled\n", 
	       i ? "en" : "dis");
}

/*
 * Dump IGMP statistics structure.
 */
void
igmp_stats(u_long off __unused, const char *name, int af1 __unused)
{
	struct igmpstat igmpstat, zerostat;
	size_t len = sizeof igmpstat;

	if (zflag)
		memset(&zerostat, 0, len);
	if (sysctlbyname("net.inet.igmp.stats", &igmpstat, &len,
	    zflag ? &zerostat : NULL, zflag ? len : 0) < 0) {
		warn("sysctl: net.inet.igmp.stats");
		return;
	}

	printf("%s:\n", name);

#define	p(f, m) if (igmpstat.f || sflag <= 1) \
    printf(m, igmpstat.f, plural(igmpstat.f))
#define	py(f, m) if (igmpstat.f || sflag <= 1) \
    printf(m, igmpstat.f, igmpstat.f != 1 ? "ies" : "y")
	p(igps_rcv_total, "\t%u message%s received\n");
        p(igps_rcv_tooshort, "\t%u message%s received with too few bytes\n");
        p(igps_rcv_badsum, "\t%u message%s received with bad checksum\n");
        py(igps_rcv_queries, "\t%u membership quer%s received\n");
        py(igps_rcv_badqueries, "\t%u membership quer%s received with invalid field(s)\n");
        p(igps_rcv_reports, "\t%u membership report%s received\n");
        p(igps_rcv_badreports, "\t%u membership report%s received with invalid field(s)\n");
        p(igps_rcv_ourreports, "\t%u membership report%s received for groups to which we belong\n");
        p(igps_snd_reports, "\t%u membership report%s sent\n");
#undef p
#undef py
}

/*
 * Dump PIM statistics structure.
 */
void
pim_stats(u_long off __unused, const char *name, int af1 __unused)
{
	struct pimstat pimstat, zerostat;
	size_t len = sizeof pimstat;

	if (zflag)
		memset(&zerostat, 0, len);
	if (sysctlbyname("net.inet.pim.stats", &pimstat, &len,
	    zflag ? &zerostat : NULL, zflag ? len : 0) < 0) {
		if (errno != ENOENT)
			warn("sysctl: net.inet.pim.stats");
		return;
	}

	printf("%s:\n", name);

#define	p(f, m) if (pimstat.f || sflag <= 1) \
    printf(m, pimstat.f, plural(pimstat.f))
#define	py(f, m) if (pimstat.f || sflag <= 1) \
    printf(m, pimstat.f, pimstat.f != 1 ? "ies" : "y")
	p(pims_rcv_total_msgs, "\t%llu message%s received\n");
	p(pims_rcv_total_bytes, "\t%llu byte%s received\n");
	p(pims_rcv_tooshort, "\t%llu message%s received with too few bytes\n");
        p(pims_rcv_badsum, "\t%llu message%s received with bad checksum\n");
	p(pims_rcv_badversion, "\t%llu message%s received with bad version\n");
	p(pims_rcv_registers_msgs, "\t%llu data register message%s received\n");
	p(pims_rcv_registers_bytes, "\t%llu data register byte%s received\n");
	p(pims_rcv_registers_wrongiif, "\t%llu data register message%s received on wrong iif\n");
	p(pims_rcv_badregisters, "\t%llu bad register%s received\n");
	p(pims_snd_registers_msgs, "\t%llu data register message%s sent\n");
	p(pims_snd_registers_bytes, "\t%llu data register byte%s sent\n");
#undef p
#undef py
}

/*
 * Pretty print an Internet address (net address + port).
 */
void
inetprint(struct in_addr *in, int port, const char *proto, int num_port)
{
	struct servent *sp = 0;
	char line[80], *cp;
	int width;

	if (Wflag)
	    sprintf(line, "%s.", inetname(in));
	else
	    sprintf(line, "%.*s.", (Aflag && !num_port) ? 12 : 16, inetname(in));
	cp = index(line, '\0');
	if (!num_port && port)
		sp = getservbyport((int)port, proto);
	if (sp || port == 0)
		sprintf(cp, "%.15s ", sp ? sp->s_name : "*");
	else
		sprintf(cp, "%d ", ntohs((u_short)port));
	width = (Aflag && !Wflag) ? 18 : 22;
	if (Wflag)
	    printf("%-*s ", width, line);
	else
	    printf("%-*.*s ", width, width, line);
}

/*
 * Construct an Internet address representation.
 * If numeric_addr has been supplied, give
 * numeric value, otherwise try for symbolic name.
 */
char *
inetname(struct in_addr *inp)
{
	char *cp;
	static char line[MAXHOSTNAMELEN];
	struct hostent *hp;
	struct netent *np;

	cp = 0;
	if (!numeric_addr && inp->s_addr != INADDR_ANY) {
		int net = inet_netof(*inp);
		int lna = inet_lnaof(*inp);

		if (lna == INADDR_ANY) {
			np = getnetbyaddr(net, AF_INET);
			if (np)
				cp = np->n_name;
		}
		if (cp == 0) {
			hp = gethostbyaddr((char *)inp, sizeof (*inp), AF_INET);
			if (hp) {
				cp = hp->h_name;
				trimdomain(cp, strlen(cp));
			}
		}
	}
	if (inp->s_addr == INADDR_ANY)
		strcpy(line, "*");
	else if (cp) {
		strncpy(line, cp, sizeof(line) - 1);
		line[sizeof(line) - 1] = '\0';
	} else {
		inp->s_addr = ntohl(inp->s_addr);
#define C(x)	((u_int)((x) & 0xff))
		sprintf(line, "%u.%u.%u.%u", C(inp->s_addr >> 24),
		    C(inp->s_addr >> 16), C(inp->s_addr >> 8), C(inp->s_addr));
	}
	return (line);
}

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="inet.c.diff"

--- inet.org.c	Mon Dec 27 01:48:58 2004
+++ inet.c	Sun Dec 26 22:33:20 2004
@@ -38,7 +38,7 @@
 #endif
 
 #include <sys/cdefs.h>
-__FBSDID("$FreeBSD: /repoman/r/ncvs/src/usr.bin/netstat/inet.c,v 1.67 2004/07/26 20:18:11 charnier Exp $");
+__FBSDID("$FreeBSD: src/usr.bin/netstat/inet.c,v 1.67 2004/07/26 20:18:11 charnier Exp $");
 
 #include <sys/param.h>
 #include <sys/queue.h>
@@ -569,6 +569,7 @@
 	if (ipstat.ips_forward || sflag <= 1) 
 		putchar('\n');
 	p(ips_cantforward, "\t%lu packet%s not forwardable\n");
+	p(ips_transit_re, "\t%lu packet%s forwarded to receive path\n");
 	p(ips_notmember,
 	  "\t%lu packet%s received for unknown multicast group\n");
 	p(ips_redirectsent, "\t%lu redirect%s sent\n");

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ip_fastfwd.c"

/*
 * Copyright (c) 2003 Andre Oppermann, Internet Business Solutions AG
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. The name of the author may not be used to endorse or promote
 *    products derived from this software without specific prior written
 *    permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 * $FreeBSD: src/sys/netinet/ip_fastfwd.c,v 1.17.2.3 2004/10/03 17:04:40 mlaier Exp $
 * $Wolfowitz: snap5d/src/sys/netinet/apc_ip_fastfwd.c,v 1.35.2 2004/12/04 15:32:21 jenkins Exp $
 * $Wolfowitz: freebsd5/src/sys/netinet/ip_fastfwd.c,v 1.18.0.3 2004/12/15 17:04:40 blahdy Exp $
 */

/*
 * ip_fastforward gets its speed from processing the forwarded packet to
 * completion (if_output on the other side) without any queues or netisr's.
 * The receiving interface DMAs the packet into memory, the upper half of
 * driver calls ip_fastforward, we do our routing table lookup and directly
 * send it off to the outgoing interface which DMAs the packet to the
 * network card. The only part of the packet we touch with the CPU is the
 * IP header (unless there are complex firewall rules touching other parts
 * of the packet, but that is up to you). We are essentially limited by bus
 * bandwidth and how fast the network card/driver can set up receives and
 * transmits.
 *
 * We handle basic errors, ip header errors, checksum errors,
 * destination unreachable, fragmentation and fragmentation needed and
 * report them via icmp to the sender.
 *
 * Else if something is not pure IPv4 unicast forwarding we fall back to
 * the normal ip_input processing path. We should only be called from
 * interfaces connected to the outside world.
 *
 * Firewalling is fully supported including divert, ipfw fwd and ipfilter
 * ipnat and address rewrite.
 *
 * IPSEC is not supported if this host is a tunnel broker. IPSEC is
 * supported for connections to/from local host.
 *
 * We try to do the least expensive (in CPU ops) checks and operations
 * first to catch junk with as little overhead as possible.
 * 
 * We take full advantage of hardware support for ip checksum and
 * fragmentation offloading.
 *
 * We don't do ICMP redirect in the fast forwarding path. I have had my own
 * cases where two core routers with Zebra routing suite would send millions
 * ICMP redirects to connected hosts if the router to dest was not the default
 * gateway. In one case it was filling the routing table of a host with close
 * 300'000 cloned redirect entries until it ran out of kernel memory. However
 * the networking code proved very robust and it didn't crash or went ill
 * otherwise.
 */

/*
 * Many thanks to Matt Thomas of NetBSD for basic structure of ip_flow.c which
 * is being followed here.
 */

#include "opt_ipfw.h"
#include "opt_ipstealth.h"

#include <sys/param.h>
#include <sys/systm.h>
#include <sys/kernel.h>
#include <sys/malloc.h>
#include <sys/mbuf.h>
#include <sys/protosw.h>
#include <sys/socket.h>
#include <sys/sysctl.h>

#include <net/pfil.h>
#include <net/if.h>
#include <net/if_types.h>
#include <net/if_var.h>
#include <net/if_dl.h>
#include <net/route.h>
/* include <net/fib.h> */

#include <netinet/in.h>
#include <netinet/in_systm.h>
#include <netinet/in_var.h>
#include <netinet/ip.h>
#include <netinet/ip_var.h>
#include <netinet/ip_icmp.h>

#include <machine/in_cksum.h>

static int ipfastforward_active = 0;
SYSCTL_INT(_net_inet_ip, OID_AUTO, fastforwarding, CTLFLAG_RW,
    &ipfastforward_active, 0, "Enable fast IP forwarding");

static struct sockaddr_in *
ip_findroute(struct route *ro, struct in_addr dest, struct mbuf *m)
{
	struct sockaddr_in *dst;
	struct rtentry *rt;
 	/* struct mtrie *mt; */

	/*
	 * Find route to destination.
	 */
	bzero(ro, sizeof(*ro));
	dst = (struct sockaddr_in *)&ro->ro_dst;
	dst->sin_family = AF_INET;
	dst->sin_len = sizeof(*dst);
	dst->sin_addr.s_addr = dest.s_addr;
	rtalloc_ign(ro, RTF_CLONING);
	/* fiballoc(pfx, mt); */

	/*
	 * Prefix there and valid adjacency?
	 */
	rt = ro->ro_rt;
	if (rt && (rt->rt_flags & RTF_UP) &&
	    (rt->rt_ifp->if_flags & IFF_UP) &&
	    (rt->rt_ifp->if_flags & IFF_RUNNING)) {
		if (rt->rt_flags & RTF_GATEWAY)
			dst = (struct sockaddr_in *)rt->rt_gateway;
	} else {
		ipstat.ips_noroute++;
		ipstat.ips_cantforward++;
		if (rt)
			RTFREE(rt);

		/*
		 * The old ip_fastforward() violated RFC1812 by responding
		 * with !H instead of !N when there is no destination 
		 * route found. Behaviors observed from both Cisco Cat6509/Sup720
		 * and Juniper M20 result in !N (correctly complying to
		 * RFC1812) when there is no route available. --james 2004/09/17
		 */
		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NET, 0, NULL);
		return NULL;
	}
	return dst;
}

/*
 * Try to forward a packet based on the destination address.
 * This is a fast path optimized for the plain forwarding case.
 * If the packet is handled (and consumed) here then we return 1;
 * otherwise 0 is returned and the packet should be delivered
 * to ip_input for full processing.
 */
int
ip_fastforward(struct mbuf *m)
{
	struct ip *ip;
	struct mbuf *m0 = NULL;
	struct route ro;
	/* struct fentry *pfx = NULL;  */
	struct sockaddr_in *dst = NULL;
	struct ifnet *ifp;
	struct in_addr odest, dest;
	u_short sum, ip_len;
	int error = 0;
	int hlen, mtu;
#ifdef IPFIREWALL_FORWARD
	struct m_tag *fwd_tag;
#endif

	/*
	 * Are we active and forwarding packets?
	 */
	if (!ipfastforward_active || !ipforwarding)
		return 0;

	M_ASSERTVALID(m);
	M_ASSERTPKTHDR(m);

	ro.ro_rt = NULL;

	/*
	 * Step 1: check for packet drop conditions (and sanity checks)
	 */

	ipstat.ips_total++;

	/*
	 * Is entire packet big enough?
	 */
	if (m->m_pkthdr.len < sizeof(struct ip)) {
		ipstat.ips_tooshort++;
		goto drop;
	}

	/*
	 * Is first mbuf large enough for ip header and is header present?
	 */
	if (m->m_len < sizeof (struct ip) &&
	   (m = m_pullup(m, sizeof (struct ip))) == 0) {
		ipstat.ips_toosmall++;
		return 1;
	}

	ip = mtod(m, struct ip *);

	/*
	 * Is it IPv4?
	 */
	if (ip->ip_v != IPVERSION) {
		ipstat.ips_badvers++;
		goto drop;
	}

	/*
	 * Is IP header length correct and is it in first mbuf?
	 */
	hlen = ip->ip_hl << 2;
	if (hlen < sizeof(struct ip)) {	/* minimum header length */
		ipstat.ips_badlen++;
		goto drop;
	}
	if (hlen > m->m_len) {
		if ((m = m_pullup(m, hlen)) == 0) {
			ipstat.ips_badhlen++;
			return 1;
		}
		ip = mtod(m, struct ip *);
	}

	/*
	 * Checksum correct?
	 */
	if (m->m_pkthdr.csum_flags & CSUM_IP_CHECKED)
		sum = !(m->m_pkthdr.csum_flags & CSUM_IP_VALID);
	else {
		if (hlen == sizeof(struct ip))
			sum = in_cksum_hdr(ip);
		else
			sum = in_cksum(m, hlen);
	}
	if (sum) {
		ipstat.ips_badsum++;
		goto drop;
	}
	m->m_pkthdr.csum_flags |= (CSUM_IP_CHECKED | CSUM_IP_VALID);

	ip_len = ntohs(ip->ip_len);

	/*
	 * Is IP length longer than packet we have got?
	 */
	if (m->m_pkthdr.len < ip_len) {
		ipstat.ips_tooshort++;
		goto drop;
	}

	/*
	 * Is packet longer than IP header tells us? If yes, truncate packet.
	 */
	if (m->m_pkthdr.len > ip_len) {
		if (m->m_len == m->m_pkthdr.len) {
			m->m_len = ip_len;
			m->m_pkthdr.len = ip_len;
		} else
			m_adj(m, ip_len - m->m_pkthdr.len);
	}

	/*
	 * Is packet from or to 127/8?
	 */
	if ((ntohl(ip->ip_dst.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET ||
	    (ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET) {
		ipstat.ips_badaddr++;
		goto drop;
	}

#ifdef ALTQ
	/*
	 * Is packet dropped by traffic conditioner?
	 */
	if (altq_input != NULL && (*altq_input)(m, AF_INET) == 0)
		return 1;
#endif

	/*
	 * Step 2: fallback conditions to normal ip_input path processing
	 */

	/*
	 * Only IP packets without options
	 */
	if (ip->ip_hl != (sizeof(struct ip) >> 2)) {
		if (ip_doopts == 1){
			goto prercvpath;
		} else if (ip_doopts == 2) {
			icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_FILTER_PROHIB,
				0, NULL);
			return 1;
		}
		/* else ignore IP options and continue */
	}

	/*
	 * Only unicast IP, not from loopback, no L2 or IP broadcast,
	 * no multicast, no INADDR_ANY
	 *
	 * XXX: Probably some of these checks could be direct drop
	 * conditions.  However it is not clear whether there are some
	 * hacks or obscure behaviours which make it neccessary to
	 * let ip_input handle it.  We play safe here and let ip_input
	 * deal with it until it is proven that we can directly drop it.
	 *
	 * If packet originated from loopback interface, don't even
	 * bother with receive path. Receive acl must only validate
	 * "From-Wire -> To-ControlPlane" destined traffic, not the
	 * packets we created on our own.
	 */
	if (m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) 
		return 0;  

	if (ntohl(ip->ip_src.s_addr) == (u_long)INADDR_BROADCAST ||
	    ntohl(ip->ip_dst.s_addr) == (u_long)INADDR_BROADCAST ||
	    IN_MULTICAST(ntohl(ip->ip_src.s_addr)) ||
	    IN_MULTICAST(ntohl(ip->ip_dst.s_addr)) ||
	    ip->ip_dst.s_addr == INADDR_ANY )
		goto prercvpath;


	/*
	 * Step 3: incoming packet firewall processing
	 */

	/*
	 * Convert to host representation
	 */
	ip->ip_len = ntohs(ip->ip_len);
	ip->ip_off = ntohs(ip->ip_off);

	odest.s_addr = dest.s_addr = ip->ip_dst.s_addr;

	/*
	 * Run through list of ipfilter hooks for input packets
	 */
	if (inet_pfil_hook.ph_busy_count == -1)
		goto passin;

	if (pfil_run_hooks(&inet_pfil_hook, &m, m->m_pkthdr.rcvif, PFIL_IN, NULL) ||
	    m == NULL)
		return 1;

	M_ASSERTVALID(m);
	M_ASSERTPKTHDR(m);

	ip = mtod(m, struct ip *);	/* m may have changed by pfil hook */
	dest.s_addr = ip->ip_dst.s_addr;

passin:
	/*
	 * Step 4: Look up and analyze route then decrement TTL.
	 */

	/*
	 * Find route to destination.
	 * Note: If firewall call above changed destination to another
	 * address, lookup of kernel RIB will be acted upon the new
	 * destination address -- hence saving us a hash lookup here.
	 */
	if ((dst = ip_findroute(&ro, dest, m)) == NULL)
		return 1;	/* icmp unreach already sent */
	ifp = ro.ro_rt->rt_ifp;

	/*
	 * Destination address changed by firewall? (policy routing)
	 */
	if (odest.s_addr != dest.s_addr) {
		/*
		 * Is the new destination for a local address on this host?
		 */
		if (ro.ro_rt->rt_flags & RTF_LOCAL)
			goto forwardlocal;
		/*
		 * Go on with new destination address
		 */
	}
#ifdef IPFIREWALL_FORWARD
	if (m->m_flags & M_FASTFWD_OURS) {
		/*
		 * ipfw changed it for a local address on this host.
		 */
		goto forwardlocal;
	}
#endif /* IPFIREWALL_FORWARD */

	/*
	 * Is packet destined to us or broadcast address(es)?
	 * SIOCSIFADDR installs /32 lo0 routes so let's check if
	 * this is a route that is bound to loopback.
	 */
	if (ro.ro_rt->rt_flags & RTF_LOCAL)
		goto rcvpath;

	/*
	 * Drop blackhole and reject routes while we are in the
	 * fast forwarding path.
	 */
	if (ro.ro_rt->rt_flags & RTF_BLACKHOLE)
		goto drop;

	/*
	 * XXX Need L2 info off the kernel routing table.. This is a
	 * makeshift kludge, so please use 2nd consideration before
	 * committing the line below into main cvs tree.
	 *
	 * Administratively installed reject routes should have 
	 * rmx_expire unset.
	 */
	if ((ro.ro_rt->rt_flags & RTF_REJECT) && 
            ro.ro_rt->rt_rmx.rmx_expire == 0){
		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NET, 0, NULL);
		goto consumed;
	}

	/*
	 * Check TTL
	 */
#ifdef IPSTEALTH
	if (!ipstealth) {
#endif
	if (ip->ip_ttl <= IPTTLDEC) {
		icmp_error(m, ICMP_TIMXCEED, ICMP_TIMXCEED_INTRANS, 0, NULL);
		goto consumed;
	}

	/*
	 * Decrement the TTL and incrementally change the checksum.
	 * Don't bother doing this with hw checksum offloading.
	 */
	ip->ip_ttl -= IPTTLDEC;
	if (ip->ip_sum >= (u_int16_t) ~htons(IPTTLDEC << 8))
		ip->ip_sum -= ~htons(IPTTLDEC << 8);
	else
		ip->ip_sum += htons(IPTTLDEC << 8);
#ifdef IPSTEALTH
	}
#endif

	/*
	 * Step 5: outgoing firewall packet processing
	 */

	/*
	 * Run through list of hooks for output packets.
	 */
	if (inet_pfil_hook.ph_busy_count == -1)
		goto passout;

	if (pfil_run_hooks(&inet_pfil_hook, &m, ifp, PFIL_OUT, NULL) || m == NULL) {
		goto consumed;
	}

	M_ASSERTVALID(m);
	M_ASSERTPKTHDR(m);

	ip = mtod(m, struct ip *);
	dest.s_addr = ip->ip_dst.s_addr;

	/*
	 * Destination address changed?
	 */
#ifndef IPFIREWALL_FORWARD
	if (odest.s_addr != dest.s_addr) {
#else
	fwd_tag = m_tag_find(m, PACKET_TAG_IPFORWARD, NULL);
	if (odest.s_addr != dest.s_addr || fwd_tag != NULL) {
#endif /* IPFIREWALL_FORWARD */
		/*
		 * Is it now for a local address on this host?
		 *
		 * We'll simply rely on in_localip() to determine whether
		 * address is destined to us this time around -- because
		 * I really don't think running radix lookup two more
		 * times in the outbound sections will outperform hash
		 * lookup of system interface addrs.
		 *
		 * In the above ingress checks, we were able to get rid
		 * of a hash lookup (in_localip() call that is) because
		 * we are doing a radix lookup after the initial firewall
		 * operation.
		 */
#ifndef IPFIREWALL_FORWARD
		if (in_localip(dest)) {
#else
		if (in_localip(dest) || m->m_flags & M_FASTFWD_OURS) {
#endif /* IPFIREWALL_FORWARD */
forwardlocal:
			/*
			 * Return packet for processing by ip_input().
			 * Keep host byte order as expected at ip_input's
			 * "ours"-label.
			 */
			m->m_flags |= M_FASTFWD_OURS;
			goto rcvpath;
		}
		/*
		 * Redo route lookup with new destination address
		 */
#ifdef IPFIREWALL_FORWARD
		if (fwd_tag) {
			if (!in_localip(ip->ip_src) && !in_localaddr(ip->ip_dst))
				dest.s_addr = ((struct sockaddr_in *)(fwd_tag+1))->sin_addr.s_addr;
			m_tag_delete(m, fwd_tag);
		}
#endif /* IPFIREWALL_FORWARD */
		RTFREE(ro.ro_rt);
		if ((dst = ip_findroute(&ro, dest, m)) == NULL)
			return 1;	/* icmp unreach already sent */
		ifp = ro.ro_rt->rt_ifp;
	}

passout:
	/*
	 * Step 6: send off the packet
	 */

#ifndef ALTQ
	/*
	 * Check if there is enough space in the interface queue
	 */
	if ((ifp->if_snd.ifq_len + ip->ip_len / ifp->if_mtu + 1) >=
	    ifp->if_snd.ifq_maxlen) {
		ipstat.ips_odropped++;
		/* would send source quench here but that is depreciated */
		goto drop;
	}
#endif

	/*
	 * Check if media link state of interface is not down
	 */
	if (ifp->if_link_state == LINK_STATE_DOWN) {
		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, NULL);
		goto consumed;
	}

	/*
	 * Check if packet fits MTU or if hardware will fragement for us
	 */
	if (ro.ro_rt->rt_rmx.rmx_mtu)
		mtu = min(ro.ro_rt->rt_rmx.rmx_mtu, ifp->if_mtu);
	else
		mtu = ifp->if_mtu;

	if (ip->ip_len <= mtu ||
	    (ifp->if_hwassist & CSUM_FRAGMENT && (ip->ip_off & IP_DF) == 0)) {
		/*
		 * Restore packet header fields to original values
		 */
		ip->ip_len = htons(ip->ip_len);
		ip->ip_off = htons(ip->ip_off);
		/*
		 * Send off the packet via outgoing interface
		 */
		error = (*ifp->if_output)(ifp, m,
				(struct sockaddr *)dst, ro.ro_rt);
	} else {
		/*
		 * Handle EMSGSIZE with icmp reply needfrag for TCP MTU discovery
		 */
		if (ip->ip_off & IP_DF) {
			ipstat.ips_cantfrag++;
			icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NEEDFRAG,
				0, ifp);
			goto consumed;
		} else {
			/*
			 * We have to fragement the packet
			 */
			m->m_pkthdr.csum_flags |= CSUM_IP;
			/*
			 * ip_fragment expects ip_len and ip_off in host byte
			 * order but returns all packets in network byte order
			 */
			if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
					(~ifp->if_hwassist & CSUM_DELAY_IP))) {
				goto drop;
			}
			KASSERT(m != NULL, ("null mbuf and no error"));
			/*
			 * Send off the fragments via outgoing interface
			 */
			error = 0;
			do {
				m0 = m->m_nextpkt;
				m->m_nextpkt = NULL;

				error = (*ifp->if_output)(ifp, m,
					(struct sockaddr *)dst, ro.ro_rt);
				if (error)
					break;
			} while ((m = m0) != NULL);
			if (error) {
				/* Reclaim remaining fragments */
				for (; m; m = m0) {
					m0 = m->m_nextpkt;
					m->m_nextpkt = NULL;
					m_freem(m);
				}
			} else
				ipstat.ips_fragmented++;
		}
	}

	if (error != 0)
		ipstat.ips_odropped++;
	else {
		ipstat.ips_forward++;
		ipstat.ips_fastforward++;
	}
consumed:
	RTFREE(ro.ro_rt);
	return 1;
prercvpath:
	/*
	 * Convert to host representation
	 */
	ip->ip_len = ntohs(ip->ip_len);
	ip->ip_off = ntohs(ip->ip_off);

	odest.s_addr = dest.s_addr = ip->ip_dst.s_addr;
rcvpath:
	/*
	 * Receive adjacency. If the packet needs to be punted up to
	 * ip_input path for further analysis or because it is destined to
	 * one of our own addresses, run it through the receive-path
	 * firewall. To actually use this, the user must set up a firewall
	 * rule using pf(4), ipfw(2), etc that checks on lo0 interface
	 * under INBOUND direction (e.g. `<action> in quick on lo0` in pf)
	 *
	 * Cisco calls this Receive Path ACL, Juniper calls this Loopback
	 * Filter. The fact that this is FreeBSD makes us behave like
	 * Juniper (filtering on lo0) instead of Cisco (filtering via
	 * "ip receive <acl number>" command).  --james 2004/10/23
	 */

	/*
	 * Set coordinates to loopback interface, inbound direction,
	 * then call in the pfil_hooks.
	 */

	if (ro.ro_rt)
	  RTFREE(ro.ro_rt);

	if (inet_pfil_hook.ph_busy_count == -1)
		goto punt;

	if (pfil_run_hooks(&inet_pfil_hook, &m, loif, PFIL_IN, NULL) ||
	    m == NULL)
		return 1;

	ip = mtod(m, struct ip *);	/* m may have changed by pfil hook */
	dest.s_addr = ip->ip_dst.s_addr;

	/* We do not support policy routing inside the receive path.
	 * If the user requests it, drop the packet. Ensure that this
	 * is documented in the user manual.
	 */
	if (odest.s_addr != dest.s_addr) 
		goto drop;

punt:
	/* 
	 * Packet has been pre-processed by ip_fastforward for 
	 * control plane evaluations.
	 */
	m->m_flags |= M_FASTFWD_PREPROC;

	ipstat.ips_transit_re++;
	return 0;
drop:
	if (m)
		m_freem(m);
	if (ro.ro_rt)
		RTFREE(ro.ro_rt);
	return 1;
}

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ip_fastfwd.c.diff"

--- ip_fastfwd.org.c	Mon Dec 27 01:42:27 2004
+++ ip_fastfwd.c	Sun Dec 26 22:33:15 2004
@@ -26,7 +26,9 @@
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
- * $FreeBSD: /repoman/r/ncvs/src/sys/netinet/ip_fastfwd.c,v 1.25 2004/11/09 09:40:32 andre Exp $
+ * $FreeBSD: src/sys/netinet/ip_fastfwd.c,v 1.17.2.3 2004/10/03 17:04:40 mlaier Exp $
+ * $Wolfowitz: snap5d/src/sys/netinet/apc_ip_fastfwd.c,v 1.35.2 2004/12/04 15:32:21 jenkins Exp $
+ * $Wolfowitz: freebsd5/src/sys/netinet/ip_fastfwd.c,v 1.18.0.3 2004/12/15 17:04:40 blahdy Exp $
  */
 
 /*
@@ -93,6 +95,7 @@
 #include <net/if_var.h>
 #include <net/if_dl.h>
 #include <net/route.h>
+/* include <net/fib.h> */
 
 #include <netinet/in.h>
 #include <netinet/in_systm.h>
@@ -112,6 +115,7 @@
 {
 	struct sockaddr_in *dst;
 	struct rtentry *rt;
+ 	/* struct mtrie *mt; */
 
 	/*
 	 * Find route to destination.
@@ -122,9 +126,10 @@
 	dst->sin_len = sizeof(*dst);
 	dst->sin_addr.s_addr = dest.s_addr;
 	rtalloc_ign(ro, RTF_CLONING);
+	/* fiballoc(pfx, mt); */
 
 	/*
-	 * Route there and interface still up?
+	 * Prefix there and valid adjacency?
 	 */
 	rt = ro->ro_rt;
 	if (rt && (rt->rt_flags & RTF_UP) &&
@@ -137,7 +142,15 @@
 		ipstat.ips_cantforward++;
 		if (rt)
 			RTFREE(rt);
-		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, NULL);
+
+		/*
+		 * The old ip_fastforward() violated RFC1812 by responding
+		 * with !H instead of !N when there is no destination 
+		 * route found. Behaviors observed from both Cisco Cat6509/Sup720
+		 * and Juniper M20 result in !N (correctly complying to
+		 * RFC1812) when there is no route available. --james 2004/09/17
+		 */
+		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NET, 0, NULL);
 		return NULL;
 	}
 	return dst;
@@ -156,9 +169,8 @@
 	struct ip *ip;
 	struct mbuf *m0 = NULL;
 	struct route ro;
+	/* struct fentry *pfx = NULL;  */
 	struct sockaddr_in *dst = NULL;
-	struct in_ifaddr *ia = NULL;
-	struct ifaddr *ifa = NULL;
 	struct ifnet *ifp;
 	struct in_addr odest, dest;
 	u_short sum, ip_len;
@@ -183,6 +195,8 @@
 	 * Step 1: check for packet drop conditions (and sanity checks)
 	 */
 
+	ipstat.ips_total++;
+
 	/*
 	 * Is entire packet big enough?
 	 */
@@ -195,9 +209,9 @@
 	 * Is first mbuf large enough for ip header and is header present?
 	 */
 	if (m->m_len < sizeof (struct ip) &&
-	   (m = m_pullup(m, sizeof (struct ip))) == NULL) {
+	   (m = m_pullup(m, sizeof (struct ip))) == 0) {
 		ipstat.ips_toosmall++;
-		return 1;	/* mbuf already free'd */
+		return 1;
 	}
 
 	ip = mtod(m, struct ip *);
@@ -241,10 +255,6 @@
 		ipstat.ips_badsum++;
 		goto drop;
 	}
-
-	/*
-	 * Remeber that we have checked the IP header and found it valid.
-	 */
 	m->m_pkthdr.csum_flags |= (CSUM_IP_CHECKED | CSUM_IP_VALID);
 
 	ip_len = ntohs(ip->ip_len);
@@ -293,9 +303,9 @@
 	 * Only IP packets without options
 	 */
 	if (ip->ip_hl != (sizeof(struct ip) >> 2)) {
-		if (ip_doopts == 1)
-			return 0;
-		else if (ip_doopts == 2) {
+		if (ip_doopts == 1){
+			goto prercvpath;
+		} else if (ip_doopts == 2) {
 			icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_FILTER_PROHIB,
 				0, NULL);
 			return 1;
@@ -312,38 +322,22 @@
 	 * hacks or obscure behaviours which make it neccessary to
 	 * let ip_input handle it.  We play safe here and let ip_input
 	 * deal with it until it is proven that we can directly drop it.
+	 *
+	 * If packet originated from loopback interface, don't even
+	 * bother with receive path. Receive acl must only validate
+	 * "From-Wire -> To-ControlPlane" destined traffic, not the
+	 * packets we created on our own.
 	 */
-	if ((m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) ||
-	    ntohl(ip->ip_src.s_addr) == (u_long)INADDR_BROADCAST ||
+	if (m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) 
+		return 0;  
+
+	if (ntohl(ip->ip_src.s_addr) == (u_long)INADDR_BROADCAST ||
 	    ntohl(ip->ip_dst.s_addr) == (u_long)INADDR_BROADCAST ||
 	    IN_MULTICAST(ntohl(ip->ip_src.s_addr)) ||
 	    IN_MULTICAST(ntohl(ip->ip_dst.s_addr)) ||
 	    ip->ip_dst.s_addr == INADDR_ANY )
-		return 0;
-
-	/*
-	 * Is it for a local address on this host?
-	 */
-	if (in_localip(ip->ip_dst))
-		return 0;
+		goto prercvpath;
 
-	/*
-	 * Or is it for a local IP broadcast address on this host?
-	 */
-	if ((m->m_flags & M_BCAST) &&
-	    (m->m_pkthdr.rcvif->if_flags & IFF_BROADCAST)) {
-	        TAILQ_FOREACH(ifa, &m->m_pkthdr.rcvif->if_addrhead, ifa_link) {
-			if (ifa->ifa_addr->sa_family != AF_INET)
-				continue;
-			ia = ifatoia(ifa);
-			if (ia->ia_netbroadcast.s_addr == ip->ip_dst.s_addr)
-				return 0;
-			if (satosin(&ia->ia_broadaddr)->sin_addr.s_addr ==
-			    ip->ip_dst.s_addr)
-				return 0;
-		}
-	}
-	ipstat.ips_total++;
 
 	/*
 	 * Step 3: incoming packet firewall processing
@@ -373,14 +367,29 @@
 	ip = mtod(m, struct ip *);	/* m may have changed by pfil hook */
 	dest.s_addr = ip->ip_dst.s_addr;
 
+passin:
 	/*
-	 * Destination address changed?
+	 * Step 4: Look up and analyze route then decrement TTL.
+	 */
+
+	/*
+	 * Find route to destination.
+	 * Note: If firewall call above changed destination to another
+	 * address, lookup of kernel RIB will be acted upon the new
+	 * destination address -- hence saving us a hash lookup here.
+	 */
+	if ((dst = ip_findroute(&ro, dest, m)) == NULL)
+		return 1;	/* icmp unreach already sent */
+	ifp = ro.ro_rt->rt_ifp;
+
+	/*
+	 * Destination address changed by firewall? (policy routing)
 	 */
 	if (odest.s_addr != dest.s_addr) {
 		/*
-		 * Is it now for a local address on this host?
+		 * Is the new destination for a local address on this host?
 		 */
-		if (in_localip(dest))
+		if (ro.ro_rt->rt_flags & RTF_LOCAL)
 			goto forwardlocal;
 		/*
 		 * Go on with new destination address
@@ -395,10 +404,34 @@
 	}
 #endif /* IPFIREWALL_FORWARD */
 
-passin:
 	/*
-	 * Step 4: decrement TTL and look up route
+	 * Is packet destined to us or broadcast address(es)?
+	 * SIOCSIFADDR installs /32 lo0 routes so let's check if
+	 * this is a route that is bound to loopback.
 	 */
+	if (ro.ro_rt->rt_flags & RTF_LOCAL)
+		goto rcvpath;
+
+	/*
+	 * Drop blackhole and reject routes while we are in the
+	 * fast forwarding path.
+	 */
+	if (ro.ro_rt->rt_flags & RTF_BLACKHOLE)
+		goto drop;
+
+	/*
+	 * XXX Need L2 info off the kernel routing table.. This is a
+	 * makeshift kludge, so please use 2nd consideration before
+	 * committing the line below into main cvs tree.
+	 *
+	 * Administratively installed reject routes should have 
+	 * rmx_expire unset.
+	 */
+	if ((ro.ro_rt->rt_flags & RTF_REJECT) && 
+            ro.ro_rt->rt_rmx.rmx_expire == 0){
+		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NET, 0, NULL);
+		goto consumed;
+	}
 
 	/*
 	 * Check TTL
@@ -408,13 +441,12 @@
 #endif
 	if (ip->ip_ttl <= IPTTLDEC) {
 		icmp_error(m, ICMP_TIMXCEED, ICMP_TIMXCEED_INTRANS, 0, NULL);
-		return 1;
+		goto consumed;
 	}
 
 	/*
-	 * Decrement the TTL and incrementally change the IP header checksum.
-	 * Don't bother doing this with hw checksum offloading, it's faster
-	 * doing it right here.
+	 * Decrement the TTL and incrementally change the checksum.
+	 * Don't bother doing this with hw checksum offloading.
 	 */
 	ip->ip_ttl -= IPTTLDEC;
 	if (ip->ip_sum >= (u_int16_t) ~htons(IPTTLDEC << 8))
@@ -426,19 +458,6 @@
 #endif
 
 	/*
-	 * Find route to destination.
-	 */
-	if ((dst = ip_findroute(&ro, dest, m)) == NULL)
-		return 1;	/* icmp unreach already sent */
-	ifp = ro.ro_rt->rt_ifp;
-
-	/*
-	 * Immediately drop blackholed traffic.
-	 */
-	if (ro.ro_rt->rt_flags & RTF_BLACKHOLE)
-		goto drop;
-
-	/*
 	 * Step 5: outgoing firewall packet processing
 	 */
 
@@ -469,11 +488,22 @@
 #endif /* IPFIREWALL_FORWARD */
 		/*
 		 * Is it now for a local address on this host?
+		 *
+		 * We'll simply rely on in_localip() to determine whether
+		 * address is destined to us this time around -- because
+		 * I really don't think running radix lookup two more
+		 * times in the outbound sections will outperform hash
+		 * lookup of system interface addrs.
+		 *
+		 * In the above ingress checks, we were able to get rid
+		 * of a hash lookup (in_localip() call that is) because
+		 * we are doing a radix lookup after the initial firewall
+		 * operation.
 		 */
 #ifndef IPFIREWALL_FORWARD
 		if (in_localip(dest)) {
 #else
-		if (m->m_flags & M_FASTFWD_OURS || in_localip(dest)) {
+		if (in_localip(dest) || m->m_flags & M_FASTFWD_OURS) {
 #endif /* IPFIREWALL_FORWARD */
 forwardlocal:
 			/*
@@ -482,9 +512,7 @@
 			 * "ours"-label.
 			 */
 			m->m_flags |= M_FASTFWD_OURS;
-			if (ro.ro_rt)
-				RTFREE(ro.ro_rt);
-			return 0;
+			goto rcvpath;
 		}
 		/*
 		 * Redo route lookup with new destination address
@@ -507,15 +535,6 @@
 	 * Step 6: send off the packet
 	 */
 
-	/*
-	 * Check if route is dampned (when ARP is unable to resolve)
-	 */
-	if ((ro.ro_rt->rt_flags & RTF_REJECT) &&
-	    ro.ro_rt->rt_rmx.rmx_expire >= time_second) {
-		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, NULL);
-		goto consumed;
-	}
-
 #ifndef ALTQ
 	/*
 	 * Check if there is enough space in the interface queue
@@ -607,13 +626,69 @@
 	if (error != 0)
 		ipstat.ips_odropped++;
 	else {
-		ro.ro_rt->rt_rmx.rmx_pksent++;
 		ipstat.ips_forward++;
 		ipstat.ips_fastforward++;
 	}
 consumed:
 	RTFREE(ro.ro_rt);
 	return 1;
+prercvpath:
+	/*
+	 * Convert to host representation
+	 */
+	ip->ip_len = ntohs(ip->ip_len);
+	ip->ip_off = ntohs(ip->ip_off);
+
+	odest.s_addr = dest.s_addr = ip->ip_dst.s_addr;
+rcvpath:
+	/*
+	 * Receive adjacency. If the packet needs to be punted up to
+	 * ip_input path for further analysis or because it is destined to
+	 * one of our own addresses, run it through the receive-path
+	 * firewall. To actually use this, the user must set up a firewall
+	 * rule using pf(4), ipfw(2), etc that checks on lo0 interface
+	 * under INBOUND direction (e.g. `<action> in quick on lo0` in pf)
+	 *
+	 * Cisco calls this Receive Path ACL, Juniper calls this Loopback
+	 * Filter. The fact that this is FreeBSD makes us behave like
+	 * Juniper (filtering on lo0) instead of Cisco (filtering via
+	 * "ip receive <acl number>" command).  --james 2004/10/23
+	 */
+
+	/*
+	 * Set coordinates to loopback interface, inbound direction,
+	 * then call in the pfil_hooks.
+	 */
+
+	if (ro.ro_rt)
+	  RTFREE(ro.ro_rt);
+
+	if (inet_pfil_hook.ph_busy_count == -1)
+		goto punt;
+
+	if (pfil_run_hooks(&inet_pfil_hook, &m, loif, PFIL_IN, NULL) ||
+	    m == NULL)
+		return 1;
+
+	ip = mtod(m, struct ip *);	/* m may have changed by pfil hook */
+	dest.s_addr = ip->ip_dst.s_addr;
+
+	/* We do not support policy routing inside the receive path.
+	 * If the user requests it, drop the packet. Ensure that this
+	 * is documented in the user manual.
+	 */
+	if (odest.s_addr != dest.s_addr) 
+		goto drop;
+
+punt:
+	/* 
+	 * Packet has been pre-processed by ip_fastforward for 
+	 * control plane evaluations.
+	 */
+	m->m_flags |= M_FASTFWD_PREPROC;
+
+	ipstat.ips_transit_re++;
+	return 0;
 drop:
 	if (m)
 		m_freem(m);

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ip_input.c"

/*
 * Copyright (c) 1982, 1986, 1988, 1993
 *	The Regents of the University of California.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 *	@(#)ip_input.c	8.2 (Berkeley) 1/4/94
 * $FreeBSD: src/sys/netinet/ip_input.c,v 1.283.2.7 2004/10/03 17:04:40 mlaier Exp $
 */

#include "opt_bootp.h"
#include "opt_ipfw.h"
#include "opt_ipstealth.h"
#include "opt_ipsec.h"
#include "opt_mac.h"

#include <sys/param.h>
#include <sys/systm.h>
#include <sys/mac.h>
#include <sys/mbuf.h>
#include <sys/malloc.h>
#include <sys/domain.h>
#include <sys/protosw.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <sys/kernel.h>
#include <sys/syslog.h>
#include <sys/sysctl.h>

#include <net/pfil.h>
#include <net/if.h>
#include <net/if_types.h>
#include <net/if_var.h>
#include <net/if_dl.h>
#include <net/route.h>
#include <net/netisr.h>

#include <netinet/in.h>
#include <netinet/in_systm.h>
#include <netinet/in_var.h>
#include <netinet/ip.h>
#include <netinet/in_pcb.h>
#include <netinet/ip_var.h>
#include <netinet/ip_icmp.h>
#include <machine/in_cksum.h>

#include <sys/socketvar.h>

/* XXX: Temporary until ipfw_ether and ipfw_bridge are converted. */
#include <netinet/ip_fw.h>
#include <netinet/ip_dummynet.h>

#ifdef IPSEC
#include <netinet6/ipsec.h>
#include <netkey/key.h>
#endif

#ifdef FAST_IPSEC
#include <netipsec/ipsec.h>
#include <netipsec/key.h>
#endif

int rsvp_on = 0;

int	ipforwarding = 0;
SYSCTL_INT(_net_inet_ip, IPCTL_FORWARDING, forwarding, CTLFLAG_RW,
    &ipforwarding, 0, "Enable IP forwarding between interfaces");

static int	ipsendredirects = 1; /* XXX */
SYSCTL_INT(_net_inet_ip, IPCTL_SENDREDIRECTS, redirect, CTLFLAG_RW,
    &ipsendredirects, 0, "Enable sending IP redirects");

int	ip_defttl = IPDEFTTL;
SYSCTL_INT(_net_inet_ip, IPCTL_DEFTTL, ttl, CTLFLAG_RW,
    &ip_defttl, 0, "Maximum TTL on IP packets");

static int	ip_dosourceroute = 0;
SYSCTL_INT(_net_inet_ip, IPCTL_SOURCEROUTE, sourceroute, CTLFLAG_RW,
    &ip_dosourceroute, 0, "Enable forwarding source routed IP packets");

static int	ip_acceptsourceroute = 0;
SYSCTL_INT(_net_inet_ip, IPCTL_ACCEPTSOURCEROUTE, accept_sourceroute, 
    CTLFLAG_RW, &ip_acceptsourceroute, 0, 
    "Enable accepting source routed IP packets");

int		ip_doopts = 1;	/* 0 = ignore, 1 = process, 2 = reject */
SYSCTL_INT(_net_inet_ip, OID_AUTO, process_options, CTLFLAG_RW,
    &ip_doopts, 0, "Enable IP options processing ([LS]SRR, RR, TS)");

static int	ip_keepfaith = 0;
SYSCTL_INT(_net_inet_ip, IPCTL_KEEPFAITH, keepfaith, CTLFLAG_RW,
	&ip_keepfaith,	0,
	"Enable packet capture for FAITH IPv4->IPv6 translater daemon");

static int    nipq = 0;         /* total # of reass queues */
static int    maxnipq;
SYSCTL_INT(_net_inet_ip, OID_AUTO, maxfragpackets, CTLFLAG_RW,
	&maxnipq, 0,
	"Maximum number of IPv4 fragment reassembly queue entries");

static int    maxfragsperpacket;
SYSCTL_INT(_net_inet_ip, OID_AUTO, maxfragsperpacket, CTLFLAG_RW,
	&maxfragsperpacket, 0,
	"Maximum number of IPv4 fragments allowed per packet");

static int	ip_sendsourcequench = 0;
SYSCTL_INT(_net_inet_ip, OID_AUTO, sendsourcequench, CTLFLAG_RW,
	&ip_sendsourcequench, 0,
	"Enable the transmission of source quench packets");

int	ip_do_randomid = 0;
SYSCTL_INT(_net_inet_ip, OID_AUTO, random_id, CTLFLAG_RW,
	&ip_do_randomid, 0,
	"Assign random ip_id values");

/*
 * XXX - Setting ip_checkinterface mostly implements the receive side of
 * the Strong ES model described in RFC 1122, but since the routing table
 * and transmit implementation do not implement the Strong ES model,
 * setting this to 1 results in an odd hybrid.
 *
 * XXX - ip_checkinterface currently must be disabled if you use ipnat
 * to translate the destination address to another local interface.
 *
 * XXX - ip_checkinterface must be disabled if you add IP aliases
 * to the loopback interface instead of the interface where the
 * packets for those addresses are received.
 */
static int	ip_checkinterface = 0;
SYSCTL_INT(_net_inet_ip, OID_AUTO, check_interface, CTLFLAG_RW,
    &ip_checkinterface, 0, "Verify packet arrives on correct interface");

#ifdef DIAGNOSTIC
static int	ipprintfs = 0;
#endif

struct pfil_head inet_pfil_hook;

static struct	ifqueue ipintrq;
static int	ipqmaxlen = IFQ_MAXLEN;

extern	struct domain inetdomain;
extern	struct protosw inetsw[];
u_char	ip_protox[IPPROTO_MAX];
struct	in_ifaddrhead in_ifaddrhead; 		/* first inet address */
struct	in_ifaddrhashhead *in_ifaddrhashtbl;	/* inet addr hash table  */
u_long 	in_ifaddrhmask;				/* mask for hash table */

SYSCTL_INT(_net_inet_ip, IPCTL_INTRQMAXLEN, intr_queue_maxlen, CTLFLAG_RW,
    &ipintrq.ifq_maxlen, 0, "Maximum size of the IP input queue");
SYSCTL_INT(_net_inet_ip, IPCTL_INTRQDROPS, intr_queue_drops, CTLFLAG_RD,
    &ipintrq.ifq_drops, 0, "Number of packets dropped from the IP input queue");

struct ipstat ipstat;
SYSCTL_STRUCT(_net_inet_ip, IPCTL_STATS, stats, CTLFLAG_RW,
    &ipstat, ipstat, "IP statistics (struct ipstat, netinet/ip_var.h)");

/* Packet reassembly stuff */
#define IPREASS_NHASH_LOG2      6
#define IPREASS_NHASH           (1 << IPREASS_NHASH_LOG2)
#define IPREASS_HMASK           (IPREASS_NHASH - 1)
#define IPREASS_HASH(x,y) \
	(((((x) & 0xF) | ((((x) >> 8) & 0xF) << 4)) ^ (y)) & IPREASS_HMASK)

static TAILQ_HEAD(ipqhead, ipq) ipq[IPREASS_NHASH];
struct mtx ipqlock;

#define	IPQ_LOCK()	mtx_lock(&ipqlock)
#define	IPQ_UNLOCK()	mtx_unlock(&ipqlock)
#define	IPQ_LOCK_INIT()	mtx_init(&ipqlock, "ipqlock", NULL, MTX_DEF)
#define	IPQ_LOCK_ASSERT()	mtx_assert(&ipqlock, MA_OWNED)

#ifdef IPCTL_DEFMTU
SYSCTL_INT(_net_inet_ip, IPCTL_DEFMTU, mtu, CTLFLAG_RW,
    &ip_mtu, 0, "Default MTU");
#endif

#ifdef IPSTEALTH
int	ipstealth = 0;
SYSCTL_INT(_net_inet_ip, OID_AUTO, stealth, CTLFLAG_RW,
    &ipstealth, 0, "");
#endif

/*
 * ipfw_ether and ipfw_bridge hooks.
 * XXX: Temporary until those are converted to pfil_hooks as well.
 */
ip_fw_chk_t *ip_fw_chk_ptr = NULL;
ip_dn_io_t *ip_dn_io_ptr = NULL;
int fw_enable = 1;
int fw_one_pass = 1;

/*
 * XXX this is ugly.  IP options source routing magic.
 */
struct ipoptrt {
	struct	in_addr dst;			/* final destination */
	char	nop;				/* one NOP to align */
	char	srcopt[IPOPT_OFFSET + 1];	/* OPTVAL, OLEN and OFFSET */
	struct	in_addr route[MAX_IPOPTLEN/sizeof(struct in_addr)];
};

struct ipopt_tag {
	struct	m_tag tag;
	int	ip_nhops;
	struct	ipoptrt ip_srcrt;
};

static void	save_rte(struct mbuf *, u_char *, struct in_addr);
static int	ip_dooptions(struct mbuf *m, int);
static void	ip_forward(struct mbuf *m, int srcrt);
static void	ip_freef(struct ipqhead *, struct ipq *);

/*
 * IP initialization: fill in IP protocol switch table.
 * All protocols not implemented in kernel go to raw IP protocol handler.
 */
void
ip_init()
{
	register struct protosw *pr;
	register int i;

	TAILQ_INIT(&in_ifaddrhead);
	in_ifaddrhashtbl = hashinit(INADDR_NHASH, M_IFADDR, &in_ifaddrhmask);
	pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW);
	if (pr == 0)
		panic("ip_init: PF_INET not found");

	/* Initialize the entire ip_protox[] array to IPPROTO_RAW. */
	for (i = 0; i < IPPROTO_MAX; i++)
		ip_protox[i] = pr - inetsw;
	/*
	 * Cycle through IP protocols and put them into the appropriate place
	 * in ip_protox[].
	 */
	for (pr = inetdomain.dom_protosw;
	    pr < inetdomain.dom_protoswNPROTOSW; pr++)
		if (pr->pr_domain->dom_family == PF_INET &&
		    pr->pr_protocol && pr->pr_protocol != IPPROTO_RAW) {
			/* Be careful to only index valid IP protocols. */
			if (pr->pr_protocol && pr->pr_protocol < IPPROTO_MAX)
				ip_protox[pr->pr_protocol] = pr - inetsw;
		}

	/* Initialize packet filter hooks. */
	inet_pfil_hook.ph_type = PFIL_TYPE_AF;
	inet_pfil_hook.ph_af = AF_INET;
	if ((i = pfil_head_register(&inet_pfil_hook)) != 0)
		printf("%s: WARNING: unable to register pfil hook, "
			"error %d\n", __func__, i);

	/* Initialize IP reassembly queue. */
	IPQ_LOCK_INIT();
	for (i = 0; i < IPREASS_NHASH; i++)
	    TAILQ_INIT(&ipq[i]);
	maxnipq = nmbclusters / 32;
	maxfragsperpacket = 16;

	/* Initialize various other remaining things. */
	ip_id = time_second & 0xffff;
	ipintrq.ifq_maxlen = ipqmaxlen;
	mtx_init(&ipintrq.ifq_mtx, "ip_inq", NULL, MTX_DEF);
	netisr_register(NETISR_IP, ip_input, &ipintrq, NETISR_MPSAFE);
}

/*
 * Ip input routine.  Checksum and byte swap header.  If fragmented
 * try to reassemble.  Process options.  Pass to next level.
 */
void
ip_input(struct mbuf *m)
{
	struct ip *ip = NULL;
	struct in_ifaddr *ia = NULL;
	struct ifaddr *ifa;
	int    checkif, hlen = 0;
	u_short sum;
	int dchg = 0;				/* dest changed after fw */
	struct in_addr odst;			/* original dst address */
#ifdef FAST_IPSEC
	struct m_tag *mtag;
	struct tdb_ident *tdbi;
	struct secpolicy *sp;
	int s, error;
#endif /* FAST_IPSEC */

  	M_ASSERTPKTHDR(m);
  	
	if (m->m_flags & M_FASTFWD_OURS) {
		/*
		 * ip_fastforward firewall changed dest to local.
		 * We expect ip_len and ip_off in host byte order.
		 */
		m->m_flags &= ~M_FASTFWD_OURS;	/* for reflected mbufs */
		/* Set up some basic stuff */
		ip = mtod(m, struct ip *);
		hlen = ip->ip_hl << 2;
  		goto ours;
  	}

	if (m->m_flags & M_FASTFWD_PREPROC){
		/*
		 * Packets that require further analysis or destined
		 * to our own addresses in ip_fastforward.
		 * We expect ip_len and ip_off in host byte order.
		 */
		m->m_flags &= ~M_FASTFWD_PREPROC; /* for reflected mbufs */
		/* Setup some basic stuff */
		ip = mtod(m, struct ip *);
		hlen = ip->ip_hl << 2;
		goto preprocessed;
	}

	ipstat.ips_total++;

	if (m->m_pkthdr.len < sizeof(struct ip))
		goto tooshort;

	if (m->m_len < sizeof (struct ip) &&
	    (m = m_pullup(m, sizeof (struct ip))) == NULL) {
		ipstat.ips_toosmall++;
		return;
	}
	ip = mtod(m, struct ip *);

	if (ip->ip_v != IPVERSION) {
		ipstat.ips_badvers++;
		goto bad;
	}

	hlen = ip->ip_hl << 2;
	if (hlen < sizeof(struct ip)) {	/* minimum header length */
		ipstat.ips_badhlen++;
		goto bad;
	}
	if (hlen > m->m_len) {
		if ((m = m_pullup(m, hlen)) == NULL) {
			ipstat.ips_badhlen++;
			return;
		}
		ip = mtod(m, struct ip *);
	}

	/* 127/8 must not appear on wire - RFC1122 */
	if ((ntohl(ip->ip_dst.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET ||
	    (ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET) {
		if ((m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) == 0) {
			ipstat.ips_badaddr++;
			goto bad;
		}
	}

	if (m->m_pkthdr.csum_flags & CSUM_IP_CHECKED) {
		sum = !(m->m_pkthdr.csum_flags & CSUM_IP_VALID);
	} else {
		if (hlen == sizeof(struct ip)) {
			sum = in_cksum_hdr(ip);
		} else {
			sum = in_cksum(m, hlen);
		}
	}
	if (sum) {
		ipstat.ips_badsum++;
		goto bad;
	}

#ifdef ALTQ
	if (altq_input != NULL && (*altq_input)(m, AF_INET) == 0)
		/* packet is dropped by traffic conditioner */
		return;
#endif

	/*
	 * Convert fields to host representation.
	 */
	ip->ip_len = ntohs(ip->ip_len);
	if (ip->ip_len < hlen) {
		ipstat.ips_badlen++;
		goto bad;
	}
	ip->ip_off = ntohs(ip->ip_off);

	/*
	 * Check that the amount of data in the buffers
	 * is as at least much as the IP header would have us expect.
	 * Trim mbufs if longer than we expect.
	 * Drop packet if shorter than we expect.
	 */
	if (m->m_pkthdr.len < ip->ip_len) {
tooshort:
		ipstat.ips_tooshort++;
		goto bad;
	}
	if (m->m_pkthdr.len > ip->ip_len) {
		if (m->m_len == m->m_pkthdr.len) {
			m->m_len = ip->ip_len;
			m->m_pkthdr.len = ip->ip_len;
		} else
			m_adj(m, ip->ip_len - m->m_pkthdr.len);
	}

preprocessed:

#if defined(IPSEC) && !defined(IPSEC_FILTERGIF)
	/*
	 * Bypass packet filtering for packets from a tunnel (gif).
	 */
	if (ipsec_getnhist(m))
		goto passin;
#endif
#if defined(FAST_IPSEC) && !defined(IPSEC_FILTERGIF)
	/*
	 * Bypass packet filtering for packets from a tunnel (gif).
	 */
	if (m_tag_find(m, PACKET_TAG_IPSEC_IN_DONE, NULL) != NULL)
		goto passin;
#endif

	/*
	 * Run through list of hooks for input packets.
	 *
	 * NB: Beware of the destination address changing (e.g.
	 *     by NAT rewriting).  When this happens, tell
	 *     ip_forward to do the right thing.
	 */

	/* Jump over all PFIL processing if hooks are not active. */
	if (inet_pfil_hook.ph_busy_count == -1)
		goto passin;

	odst = ip->ip_dst;
	if (pfil_run_hooks(&inet_pfil_hook, &m, m->m_pkthdr.rcvif,
	    PFIL_IN, NULL) != 0)
		return;
	if (m == NULL)			/* consumed by filter */
		return;

	ip = mtod(m, struct ip *);
	dchg = (odst.s_addr != ip->ip_dst.s_addr);

#ifdef IPFIREWALL_FORWARD
	if (m->m_flags & M_FASTFWD_OURS) {
		m->m_flags &= ~M_FASTFWD_OURS;
		goto ours;
	}
	dchg = (m_tag_find(m, PACKET_TAG_IPFORWARD, NULL) != NULL);
#endif /* IPFIREWALL_FORWARD */

passin:
	/*
	 * Process options and, if not destined for us,
	 * ship it on.  ip_dooptions returns 1 when an
	 * error was detected (causing an icmp message
	 * to be sent and the original packet to be freed).
	 */
	if (hlen > sizeof (struct ip) && ip_dooptions(m, 0))
		return;

        /* greedy RSVP, snatches any PATH packet of the RSVP protocol and no
         * matter if it is destined to another node, or whether it is 
         * a multicast one, RSVP wants it! and prevents it from being forwarded
         * anywhere else. Also checks if the rsvp daemon is running before
	 * grabbing the packet.
         */
	if (rsvp_on && ip->ip_p==IPPROTO_RSVP) 
		goto ours;

	/*
	 * Check our list of addresses, to see if the packet is for us.
	 * If we don't have any addresses, assume any unicast packet
	 * we receive might be for us (and let the upper layers deal
	 * with it).
	 */
	if (TAILQ_EMPTY(&in_ifaddrhead) &&
	    (m->m_flags & (M_MCAST|M_BCAST)) == 0)
		goto ours;

	/*
	 * Enable a consistency check between the destination address
	 * and the arrival interface for a unicast packet (the RFC 1122
	 * strong ES model) if IP forwarding is disabled and the packet
	 * is not locally generated and the packet is not subject to
	 * 'ipfw fwd'.
	 *
	 * XXX - Checking also should be disabled if the destination
	 * address is ipnat'ed to a different interface.
	 *
	 * XXX - Checking is incompatible with IP aliases added
	 * to the loopback interface instead of the interface where
	 * the packets are received.
	 */
	checkif = ip_checkinterface && (ipforwarding == 0) && 
	    m->m_pkthdr.rcvif != NULL &&
	    ((m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) == 0) &&
	    (dchg == 0);

	/*
	 * Check for exact addresses in the hash bucket.
	 */
	LIST_FOREACH(ia, INADDR_HASH(ip->ip_dst.s_addr), ia_hash) {
		/*
		 * If the address matches, verify that the packet
		 * arrived via the correct interface if checking is
		 * enabled.
		 */
		if (IA_SIN(ia)->sin_addr.s_addr == ip->ip_dst.s_addr && 
		    (!checkif || ia->ia_ifp == m->m_pkthdr.rcvif))
			goto ours;
	}
	/*
	 * Check for broadcast addresses.
	 *
	 * Only accept broadcast packets that arrive via the matching
	 * interface.  Reception of forwarded directed broadcasts would
	 * be handled via ip_forward() and ether_output() with the loopback
	 * into the stack for SIMPLEX interfaces handled by ether_output().
	 */
	if (m->m_pkthdr.rcvif != NULL &&
	    m->m_pkthdr.rcvif->if_flags & IFF_BROADCAST) {
	        TAILQ_FOREACH(ifa, &m->m_pkthdr.rcvif->if_addrhead, ifa_link) {
			if (ifa->ifa_addr->sa_family != AF_INET)
				continue;
			ia = ifatoia(ifa);
			if (satosin(&ia->ia_broadaddr)->sin_addr.s_addr ==
			    ip->ip_dst.s_addr)
				goto ours;
			if (ia->ia_netbroadcast.s_addr == ip->ip_dst.s_addr)
				goto ours;
#ifdef BOOTP_COMPAT
			if (IA_SIN(ia)->sin_addr.s_addr == INADDR_ANY)
				goto ours;
#endif
		}
	}
	if (IN_MULTICAST(ntohl(ip->ip_dst.s_addr))) {
		struct in_multi *inm;
		if (ip_mrouter) {
			/*
			 * If we are acting as a multicast router, all
			 * incoming multicast packets are passed to the
			 * kernel-level multicast forwarding function.
			 * The packet is returned (relatively) intact; if
			 * ip_mforward() returns a non-zero value, the packet
			 * must be discarded, else it may be accepted below.
			 */
			if (ip_mforward &&
			    ip_mforward(ip, m->m_pkthdr.rcvif, m, 0) != 0) {
				ipstat.ips_cantforward++;
				m_freem(m);
				return;
			}

			/*
			 * The process-level routing daemon needs to receive
			 * all multicast IGMP packets, whether or not this
			 * host belongs to their destination groups.
			 */
			if (ip->ip_p == IPPROTO_IGMP)
				goto ours;
			ipstat.ips_forward++;
		}
		/*
		 * See if we belong to the destination multicast group on the
		 * arrival interface.
		 */
		IN_LOOKUP_MULTI(ip->ip_dst, m->m_pkthdr.rcvif, inm);
		if (inm == NULL) {
			ipstat.ips_notmember++;
			m_freem(m);
			return;
		}
		goto ours;
	}
	if (ip->ip_dst.s_addr == (u_long)INADDR_BROADCAST)
		goto ours;
	if (ip->ip_dst.s_addr == INADDR_ANY)
		goto ours;

	/*
	 * FAITH(Firewall Aided Internet Translator)
	 */
	if (m->m_pkthdr.rcvif && m->m_pkthdr.rcvif->if_type == IFT_FAITH) {
		if (ip_keepfaith) {
			if (ip->ip_p == IPPROTO_TCP || ip->ip_p == IPPROTO_ICMP) 
				goto ours;
		}
		m_freem(m);
		return;
	}

	/*
	 * Not for us; forward if possible and desirable.
	 */
	if (ipforwarding == 0) {
		ipstat.ips_cantforward++;
		m_freem(m);
	} else {
#ifdef IPSEC
		/*
		 * Enforce inbound IPsec SPD.
		 */
		if (ipsec4_in_reject(m, NULL)) {
			ipsecstat.in_polvio++;
			goto bad;
		}
#endif /* IPSEC */
#ifdef FAST_IPSEC
		mtag = m_tag_find(m, PACKET_TAG_IPSEC_IN_DONE, NULL);
		s = splnet();
		if (mtag != NULL) {
			tdbi = (struct tdb_ident *)(mtag + 1);
			sp = ipsec_getpolicy(tdbi, IPSEC_DIR_INBOUND);
		} else {
			sp = ipsec_getpolicybyaddr(m, IPSEC_DIR_INBOUND,
						   IP_FORWARDING, &error);   
		}
		if (sp == NULL) {	/* NB: can happen if error */
			splx(s);
			/*XXX error stat???*/
			DPRINTF(("ip_input: no SP for forwarding\n"));	/*XXX*/
			goto bad;
		}

		/*
		 * Check security policy against packet attributes.
		 */
		error = ipsec_in_reject(sp, m);
		KEY_FREESP(&sp);
		splx(s);
		if (error) {
			ipstat.ips_cantforward++;
			goto bad;
		}
#endif /* FAST_IPSEC */
		ip_forward(m, dchg);
	}
	return;

ours:
#ifdef IPSTEALTH
	/*
	 * IPSTEALTH: Process non-routing options only
	 * if the packet is destined for us.
	 */
	if (ipstealth && hlen > sizeof (struct ip) &&
	    ip_dooptions(m, 1))
		return;
#endif /* IPSTEALTH */

	/* Count the packet in the ip address stats */
	if (ia != NULL) {
		ia->ia_ifa.if_ipackets++;
		ia->ia_ifa.if_ibytes += m->m_pkthdr.len;
	}

	/*
	 * Attempt reassembly; if it succeeds, proceed.
	 * ip_reass() will return a different mbuf.
	 */
	if (ip->ip_off & (IP_MF | IP_OFFMASK)) {
		m = ip_reass(m);
		if (m == NULL)
			return;
		ip = mtod(m, struct ip *);
		/* Get the header length of the reassembled packet */
		hlen = ip->ip_hl << 2;
	}

	/*
	 * Further protocols expect the packet length to be w/o the
	 * IP header.
	 */
	ip->ip_len -= hlen;

#ifdef IPSEC
	/*
	 * enforce IPsec policy checking if we are seeing last header.
	 * note that we do not visit this with protocols with pcb layer
	 * code - like udp/tcp/raw ip.
	 */
	if ((inetsw[ip_protox[ip->ip_p]].pr_flags & PR_LASTHDR) != 0 &&
	    ipsec4_in_reject(m, NULL)) {
		ipsecstat.in_polvio++;
		goto bad;
	}
#endif
#if FAST_IPSEC
	/*
	 * enforce IPsec policy checking if we are seeing last header.
	 * note that we do not visit this with protocols with pcb layer
	 * code - like udp/tcp/raw ip.
	 */
	if ((inetsw[ip_protox[ip->ip_p]].pr_flags & PR_LASTHDR) != 0) {
		/*
		 * Check if the packet has already had IPsec processing
		 * done.  If so, then just pass it along.  This tag gets
		 * set during AH, ESP, etc. input handling, before the
		 * packet is returned to the ip input queue for delivery.
		 */ 
		mtag = m_tag_find(m, PACKET_TAG_IPSEC_IN_DONE, NULL);
		s = splnet();
		if (mtag != NULL) {
			tdbi = (struct tdb_ident *)(mtag + 1);
			sp = ipsec_getpolicy(tdbi, IPSEC_DIR_INBOUND);
		} else {
			sp = ipsec_getpolicybyaddr(m, IPSEC_DIR_INBOUND,
						   IP_FORWARDING, &error);   
		}
		if (sp != NULL) {
			/*
			 * Check security policy against packet attributes.
			 */
			error = ipsec_in_reject(sp, m);
			KEY_FREESP(&sp);
		} else {
			/* XXX error stat??? */
			error = EINVAL;
DPRINTF(("ip_input: no SP, packet discarded\n"));/*XXX*/
			goto bad;
		}
		splx(s);
		if (error)
			goto bad;
	}
#endif /* FAST_IPSEC */

	/*
	 * Switch out to protocol's input routine.
	 */
	ipstat.ips_delivered++;

	(*inetsw[ip_protox[ip->ip_p]].pr_input)(m, hlen);
	return;
bad:
	m_freem(m);
}

/*
 * Take incoming datagram fragment and try to reassemble it into
 * whole datagram.  If the argument is the first fragment or one
 * in between the function will return NULL and store the mbuf
 * in the fragment chain.  If the argument is the last fragment
 * the packet will be reassembled and the pointer to the new
 * mbuf returned for further processing.  Only m_tags attached
 * to the first packet/fragment are preserved.
 * The IP header is *NOT* adjusted out of iplen.
 */

struct mbuf *
ip_reass(struct mbuf *m)
{
	struct ip *ip;
	struct mbuf *p, *q, *nq, *t;
	struct ipq *fp = NULL;
	struct ipqhead *head;
	int i, hlen, next;
	u_int8_t ecn, ecn0;
	u_short hash;

	/* If maxnipq is 0, never accept fragments. */
	if (maxnipq == 0) {
		ipstat.ips_fragments++;
		ipstat.ips_fragdropped++;
		m_freem(m);
		return (NULL);
	}

	ip = mtod(m, struct ip *);
	hlen = ip->ip_hl << 2;

	hash = IPREASS_HASH(ip->ip_src.s_addr, ip->ip_id);
	head = &ipq[hash];
	IPQ_LOCK();

	/*
	 * Look for queue of fragments
	 * of this datagram.
	 */
	TAILQ_FOREACH(fp, head, ipq_list)
		if (ip->ip_id == fp->ipq_id &&
		    ip->ip_src.s_addr == fp->ipq_src.s_addr &&
		    ip->ip_dst.s_addr == fp->ipq_dst.s_addr &&
#ifdef MAC
		    mac_fragment_match(m, fp) &&
#endif
		    ip->ip_p == fp->ipq_p)
			goto found;

	fp = NULL;

	/*
	 * Enforce upper bound on number of fragmented packets
	 * for which we attempt reassembly;
	 * If maxnipq is -1, accept all fragments without limitation.
	 */
	if ((nipq > maxnipq) && (maxnipq > 0)) {
		/*
		 * drop something from the tail of the current queue
		 * before proceeding further
		 */
		struct ipq *q = TAILQ_LAST(head, ipqhead);
		if (q == NULL) {   /* gak */
			for (i = 0; i < IPREASS_NHASH; i++) {
				struct ipq *r = TAILQ_LAST(&ipq[i], ipqhead);
				if (r) {
					ipstat.ips_fragtimeout += r->ipq_nfrags;
					ip_freef(&ipq[i], r);
					break;
				}
			}
		} else {
			ipstat.ips_fragtimeout += q->ipq_nfrags;
			ip_freef(head, q);
		}
	}

found:
	/*
	 * Adjust ip_len to not reflect header,
	 * convert offset of this to bytes.
	 */
	ip->ip_len -= hlen;
	if (ip->ip_off & IP_MF) {
		/*
		 * Make sure that fragments have a data length
		 * that's a non-zero multiple of 8 bytes.
		 */
		if (ip->ip_len == 0 || (ip->ip_len & 0x7) != 0) {
			ipstat.ips_toosmall++; /* XXX */
			goto dropfrag;
		}
		m->m_flags |= M_FRAG;
	} else
		m->m_flags &= ~M_FRAG;
	ip->ip_off <<= 3;


	/*
	 * Attempt reassembly; if it succeeds, proceed.
	 * ip_reass() will return a different mbuf.
	 */
	ipstat.ips_fragments++;
	m->m_pkthdr.header = ip;

	/* Previous ip_reass() started here. */
	/*
	 * Presence of header sizes in mbufs
	 * would confuse code below.
	 */
	m->m_data += hlen;
	m->m_len -= hlen;

	/*
	 * If first fragment to arrive, create a reassembly queue.
	 */
	if (fp == NULL) {
		if ((t = m_get(M_DONTWAIT, MT_FTABLE)) == NULL)
			goto dropfrag;
		fp = mtod(t, struct ipq *);
#ifdef MAC
		if (mac_init_ipq(fp, M_NOWAIT) != 0) {
			m_free(t);
			goto dropfrag;
		}
		mac_create_ipq(m, fp);
#endif
		TAILQ_INSERT_HEAD(head, fp, ipq_list);
		nipq++;
		fp->ipq_nfrags = 1;
		fp->ipq_ttl = IPFRAGTTL;
		fp->ipq_p = ip->ip_p;
		fp->ipq_id = ip->ip_id;
		fp->ipq_src = ip->ip_src;
		fp->ipq_dst = ip->ip_dst;
		fp->ipq_frags = m;
		m->m_nextpkt = NULL;
		goto inserted;
	} else {
		fp->ipq_nfrags++;
#ifdef MAC
		mac_update_ipq(m, fp);
#endif
	}

#define GETIP(m)	((struct ip*)((m)->m_pkthdr.header))

	/*
	 * Handle ECN by comparing this segment with the first one;
	 * if CE is set, do not lose CE.
	 * drop if CE and not-ECT are mixed for the same packet.
	 */
	ecn = ip->ip_tos & IPTOS_ECN_MASK;
	ecn0 = GETIP(fp->ipq_frags)->ip_tos & IPTOS_ECN_MASK;
	if (ecn == IPTOS_ECN_CE) {
		if (ecn0 == IPTOS_ECN_NOTECT)
			goto dropfrag;
		if (ecn0 != IPTOS_ECN_CE)
			GETIP(fp->ipq_frags)->ip_tos |= IPTOS_ECN_CE;
	}
	if (ecn == IPTOS_ECN_NOTECT && ecn0 != IPTOS_ECN_NOTECT)
		goto dropfrag;

	/*
	 * Find a segment which begins after this one does.
	 */
	for (p = NULL, q = fp->ipq_frags; q; p = q, q = q->m_nextpkt)
		if (GETIP(q)->ip_off > ip->ip_off)
			break;

	/*
	 * If there is a preceding segment, it may provide some of
	 * our data already.  If so, drop the data from the incoming
	 * segment.  If it provides all of our data, drop us, otherwise
	 * stick new segment in the proper place.
	 *
	 * If some of the data is dropped from the the preceding
	 * segment, then it's checksum is invalidated.
	 */
	if (p) {
		i = GETIP(p)->ip_off + GETIP(p)->ip_len - ip->ip_off;
		if (i > 0) {
			if (i >= ip->ip_len)
				goto dropfrag;
			m_adj(m, i);
			m->m_pkthdr.csum_flags = 0;
			ip->ip_off += i;
			ip->ip_len -= i;
		}
		m->m_nextpkt = p->m_nextpkt;
		p->m_nextpkt = m;
	} else {
		m->m_nextpkt = fp->ipq_frags;
		fp->ipq_frags = m;
	}

	/*
	 * While we overlap succeeding segments trim them or,
	 * if they are completely covered, dequeue them.
	 */
	for (; q != NULL && ip->ip_off + ip->ip_len > GETIP(q)->ip_off;
	     q = nq) {
		i = (ip->ip_off + ip->ip_len) - GETIP(q)->ip_off;
		if (i < GETIP(q)->ip_len) {
			GETIP(q)->ip_len -= i;
			GETIP(q)->ip_off += i;
			m_adj(q, i);
			q->m_pkthdr.csum_flags = 0;
			break;
		}
		nq = q->m_nextpkt;
		m->m_nextpkt = nq;
		ipstat.ips_fragdropped++;
		fp->ipq_nfrags--;
		m_freem(q);
	}

inserted:

	/*
	 * Check for complete reassembly and perform frag per packet
	 * limiting.
	 *
	 * Frag limiting is performed here so that the nth frag has
	 * a chance to complete the packet before we drop the packet.
	 * As a result, n+1 frags are actually allowed per packet, but
	 * only n will ever be stored. (n = maxfragsperpacket.)
	 *
	 */
	next = 0;
	for (p = NULL, q = fp->ipq_frags; q; p = q, q = q->m_nextpkt) {
		if (GETIP(q)->ip_off != next) {
			if (fp->ipq_nfrags > maxfragsperpacket) {
				ipstat.ips_fragdropped += fp->ipq_nfrags;
				ip_freef(head, fp);
			}
			goto done;
		}
		next += GETIP(q)->ip_len;
	}
	/* Make sure the last packet didn't have the IP_MF flag */
	if (p->m_flags & M_FRAG) {
		if (fp->ipq_nfrags > maxfragsperpacket) {
			ipstat.ips_fragdropped += fp->ipq_nfrags;
			ip_freef(head, fp);
		}
		goto done;
	}

	/*
	 * Reassembly is complete.  Make sure the packet is a sane size.
	 */
	q = fp->ipq_frags;
	ip = GETIP(q);
	if (next + (ip->ip_hl << 2) > IP_MAXPACKET) {
		ipstat.ips_toolong++;
		ipstat.ips_fragdropped += fp->ipq_nfrags;
		ip_freef(head, fp);
		goto done;
	}

	/*
	 * Concatenate fragments.
	 */
	m = q;
	t = m->m_next;
	m->m_next = 0;
	m_cat(m, t);
	nq = q->m_nextpkt;
	q->m_nextpkt = 0;
	for (q = nq; q != NULL; q = nq) {
		nq = q->m_nextpkt;
		q->m_nextpkt = NULL;
		m->m_pkthdr.csum_flags &= q->m_pkthdr.csum_flags;
		m->m_pkthdr.csum_data += q->m_pkthdr.csum_data;
		m_cat(m, q);
	}
#ifdef MAC
	mac_create_datagram_from_ipq(fp, m);
	mac_destroy_ipq(fp);
#endif

	/*
	 * Create header for new ip packet by modifying header of first
	 * packet;  dequeue and discard fragment reassembly header.
	 * Make header visible.
	 */
	ip->ip_len = (ip->ip_hl << 2) + next;
	ip->ip_src = fp->ipq_src;
	ip->ip_dst = fp->ipq_dst;
	TAILQ_REMOVE(head, fp, ipq_list);
	nipq--;
	(void) m_free(dtom(fp));
	m->m_len += (ip->ip_hl << 2);
	m->m_data -= (ip->ip_hl << 2);
	/* some debugging cruft by sklower, below, will go away soon */
	if (m->m_flags & M_PKTHDR)	/* XXX this should be done elsewhere */
		m_fixhdr(m);
	ipstat.ips_reassembled++;
	IPQ_UNLOCK();
	return (m);

dropfrag:
	ipstat.ips_fragdropped++;
	if (fp != NULL)
		fp->ipq_nfrags--;
	m_freem(m);
done:
	IPQ_UNLOCK();
	return (NULL);

#undef GETIP
}

/*
 * Free a fragment reassembly header and all
 * associated datagrams.
 */
static void
ip_freef(fhp, fp)
	struct ipqhead *fhp;
	struct ipq *fp;
{
	register struct mbuf *q;

	IPQ_LOCK_ASSERT();

	while (fp->ipq_frags) {
		q = fp->ipq_frags;
		fp->ipq_frags = q->m_nextpkt;
		m_freem(q);
	}
	TAILQ_REMOVE(fhp, fp, ipq_list);
	(void) m_free(dtom(fp));
	nipq--;
}

/*
 * IP timer processing;
 * if a timer expires on a reassembly
 * queue, discard it.
 */
void
ip_slowtimo()
{
	register struct ipq *fp;
	int s = splnet();
	int i;

	IPQ_LOCK();
	for (i = 0; i < IPREASS_NHASH; i++) {
		for(fp = TAILQ_FIRST(&ipq[i]); fp;) {
			struct ipq *fpp;

			fpp = fp;
			fp = TAILQ_NEXT(fp, ipq_list);
			if(--fpp->ipq_ttl == 0) {
				ipstat.ips_fragtimeout += fpp->ipq_nfrags;
				ip_freef(&ipq[i], fpp);
			}
		}
	}
	/*
	 * If we are over the maximum number of fragments
	 * (due to the limit being lowered), drain off
	 * enough to get down to the new limit.
	 */
	if (maxnipq >= 0 && nipq > maxnipq) {
		for (i = 0; i < IPREASS_NHASH; i++) {
			while (nipq > maxnipq && !TAILQ_EMPTY(&ipq[i])) {
				ipstat.ips_fragdropped +=
				    TAILQ_FIRST(&ipq[i])->ipq_nfrags;
				ip_freef(&ipq[i], TAILQ_FIRST(&ipq[i]));
			}
		}
	}
	IPQ_UNLOCK();
	splx(s);
}

/*
 * Drain off all datagram fragments.
 */
void
ip_drain()
{
	int     i;

	IPQ_LOCK();
	for (i = 0; i < IPREASS_NHASH; i++) {
		while(!TAILQ_EMPTY(&ipq[i])) {
			ipstat.ips_fragdropped +=
			    TAILQ_FIRST(&ipq[i])->ipq_nfrags;
			ip_freef(&ipq[i], TAILQ_FIRST(&ipq[i]));
		}
	}
	IPQ_UNLOCK();
	in_rtqdrain();
}

/*
 * Do option processing on a datagram,
 * possibly discarding it if bad options are encountered,
 * or forwarding it if source-routed.
 * The pass argument is used when operating in the IPSTEALTH
 * mode to tell what options to process:
 * [LS]SRR (pass 0) or the others (pass 1).
 * The reason for as many as two passes is that when doing IPSTEALTH,
 * non-routing options should be processed only if the packet is for us.
 * Returns 1 if packet has been forwarded/freed,
 * 0 if the packet should be processed further.
 */
static int
ip_dooptions(struct mbuf *m, int pass)
{
	struct ip *ip = mtod(m, struct ip *);
	u_char *cp;
	struct in_ifaddr *ia;
	int opt, optlen, cnt, off, code, type = ICMP_PARAMPROB, forward = 0;
	struct in_addr *sin, dst;
	n_time ntime;
	struct	sockaddr_in ipaddr = { sizeof(ipaddr), AF_INET };

	/* ignore or reject packets with IP options */
	if (ip_doopts == 0)
		return 0;
	else if (ip_doopts == 2) {
		type = ICMP_UNREACH;
		code = ICMP_UNREACH_FILTER_PROHIB;
		goto bad;
	}

	dst = ip->ip_dst;
	cp = (u_char *)(ip + 1);
	cnt = (ip->ip_hl << 2) - sizeof (struct ip);
	for (; cnt > 0; cnt -= optlen, cp += optlen) {
		opt = cp[IPOPT_OPTVAL];
		if (opt == IPOPT_EOL)
			break;
		if (opt == IPOPT_NOP)
			optlen = 1;
		else {
			if (cnt < IPOPT_OLEN + sizeof(*cp)) {
				code = &cp[IPOPT_OLEN] - (u_char *)ip;
				goto bad;
			}
			optlen = cp[IPOPT_OLEN];
			if (optlen < IPOPT_OLEN + sizeof(*cp) || optlen > cnt) {
				code = &cp[IPOPT_OLEN] - (u_char *)ip;
				goto bad;
			}
		}
		switch (opt) {

		default:
			break;

		/*
		 * Source routing with record.
		 * Find interface with current destination address.
		 * If none on this machine then drop if strictly routed,
		 * or do nothing if loosely routed.
		 * Record interface address and bring up next address
		 * component.  If strictly routed make sure next
		 * address is on directly accessible net.
		 */
		case IPOPT_LSRR:
		case IPOPT_SSRR:
#ifdef IPSTEALTH
			if (ipstealth && pass > 0)
				break;
#endif
			if (optlen < IPOPT_OFFSET + sizeof(*cp)) {
				code = &cp[IPOPT_OLEN] - (u_char *)ip;
				goto bad;
			}
			if ((off = cp[IPOPT_OFFSET]) < IPOPT_MINOFF) {
				code = &cp[IPOPT_OFFSET] - (u_char *)ip;
				goto bad;
			}
			ipaddr.sin_addr = ip->ip_dst;
			ia = (struct in_ifaddr *)
				ifa_ifwithaddr((struct sockaddr *)&ipaddr);
			if (ia == NULL) {
				if (opt == IPOPT_SSRR) {
					type = ICMP_UNREACH;
					code = ICMP_UNREACH_SRCFAIL;
					goto bad;
				}
				if (!ip_dosourceroute)
					goto nosourcerouting;
				/*
				 * Loose routing, and not at next destination
				 * yet; nothing to do except forward.
				 */
				break;
			}
			off--;			/* 0 origin */
			if (off > optlen - (int)sizeof(struct in_addr)) {
				/*
				 * End of source route.  Should be for us.
				 */
				if (!ip_acceptsourceroute)
					goto nosourcerouting;
				save_rte(m, cp, ip->ip_src);
				break;
			}
#ifdef IPSTEALTH
			if (ipstealth)
				goto dropit;
#endif
			if (!ip_dosourceroute) {
				if (ipforwarding) {
					char buf[16]; /* aaa.bbb.ccc.ddd\0 */
					/*
					 * Acting as a router, so generate ICMP
					 */
nosourcerouting:
					strcpy(buf, inet_ntoa(ip->ip_dst));
					log(LOG_WARNING, 
					    "attempted source route from %s to %s\n",
					    inet_ntoa(ip->ip_src), buf);
					type = ICMP_UNREACH;
					code = ICMP_UNREACH_SRCFAIL;
					goto bad;
				} else {
					/*
					 * Not acting as a router, so silently drop.
					 */
#ifdef IPSTEALTH
dropit:
#endif
					ipstat.ips_cantforward++;
					m_freem(m);
					return (1);
				}
			}

			/*
			 * locate outgoing interface
			 */
			(void)memcpy(&ipaddr.sin_addr, cp + off,
			    sizeof(ipaddr.sin_addr));

			if (opt == IPOPT_SSRR) {
#define	INA	struct in_ifaddr *
#define	SA	struct sockaddr *
			    if ((ia = (INA)ifa_ifwithdstaddr((SA)&ipaddr)) == NULL)
				ia = (INA)ifa_ifwithnet((SA)&ipaddr);
			} else
				ia = ip_rtaddr(ipaddr.sin_addr);
			if (ia == NULL) {
				type = ICMP_UNREACH;
				code = ICMP_UNREACH_SRCFAIL;
				goto bad;
			}
			ip->ip_dst = ipaddr.sin_addr;
			(void)memcpy(cp + off, &(IA_SIN(ia)->sin_addr),
			    sizeof(struct in_addr));
			cp[IPOPT_OFFSET] += sizeof(struct in_addr);
			/*
			 * Let ip_intr's mcast routing check handle mcast pkts
			 */
			forward = !IN_MULTICAST(ntohl(ip->ip_dst.s_addr));
			break;

		case IPOPT_RR:
#ifdef IPSTEALTH
			if (ipstealth && pass == 0)
				break;
#endif
			if (optlen < IPOPT_OFFSET + sizeof(*cp)) {
				code = &cp[IPOPT_OFFSET] - (u_char *)ip;
				goto bad;
			}
			if ((off = cp[IPOPT_OFFSET]) < IPOPT_MINOFF) {
				code = &cp[IPOPT_OFFSET] - (u_char *)ip;
				goto bad;
			}
			/*
			 * If no space remains, ignore.
			 */
			off--;			/* 0 origin */
			if (off > optlen - (int)sizeof(struct in_addr))
				break;
			(void)memcpy(&ipaddr.sin_addr, &ip->ip_dst,
			    sizeof(ipaddr.sin_addr));
			/*
			 * locate outgoing interface; if we're the destination,
			 * use the incoming interface (should be same).
			 */
			if ((ia = (INA)ifa_ifwithaddr((SA)&ipaddr)) == NULL &&
			    (ia = ip_rtaddr(ipaddr.sin_addr)) == NULL) {
				type = ICMP_UNREACH;
				code = ICMP_UNREACH_HOST;
				goto bad;
			}
			(void)memcpy(cp + off, &(IA_SIN(ia)->sin_addr),
			    sizeof(struct in_addr));
			cp[IPOPT_OFFSET] += sizeof(struct in_addr);
			break;

		case IPOPT_TS:
#ifdef IPSTEALTH
			if (ipstealth && pass == 0)
				break;
#endif
			code = cp - (u_char *)ip;
			if (optlen < 4 || optlen > 40) {
				code = &cp[IPOPT_OLEN] - (u_char *)ip;
				goto bad;
			}
			if ((off = cp[IPOPT_OFFSET]) < 5) {
				code = &cp[IPOPT_OLEN] - (u_char *)ip;
				goto bad;
			}
			if (off > optlen - (int)sizeof(int32_t)) {
				cp[IPOPT_OFFSET + 1] += (1 << 4);
				if ((cp[IPOPT_OFFSET + 1] & 0xf0) == 0) {
					code = &cp[IPOPT_OFFSET] - (u_char *)ip;
					goto bad;
				}
				break;
			}
			off--;				/* 0 origin */
			sin = (struct in_addr *)(cp + off);
			switch (cp[IPOPT_OFFSET + 1] & 0x0f) {

			case IPOPT_TS_TSONLY:
				break;

			case IPOPT_TS_TSANDADDR:
				if (off + sizeof(n_time) +
				    sizeof(struct in_addr) > optlen) {
					code = &cp[IPOPT_OFFSET] - (u_char *)ip;
					goto bad;
				}
				ipaddr.sin_addr = dst;
				ia = (INA)ifaof_ifpforaddr((SA)&ipaddr,
							    m->m_pkthdr.rcvif);
				if (ia == NULL)
					continue;
				(void)memcpy(sin, &IA_SIN(ia)->sin_addr,
				    sizeof(struct in_addr));
				cp[IPOPT_OFFSET] += sizeof(struct in_addr);
				off += sizeof(struct in_addr);
				break;

			case IPOPT_TS_PRESPEC:
				if (off + sizeof(n_time) +
				    sizeof(struct in_addr) > optlen) {
					code = &cp[IPOPT_OFFSET] - (u_char *)ip;
					goto bad;
				}
				(void)memcpy(&ipaddr.sin_addr, sin,
				    sizeof(struct in_addr));
				if (ifa_ifwithaddr((SA)&ipaddr) == NULL)
					continue;
				cp[IPOPT_OFFSET] += sizeof(struct in_addr);
				off += sizeof(struct in_addr);
				break;

			default:
				code = &cp[IPOPT_OFFSET + 1] - (u_char *)ip;
				goto bad;
			}
			ntime = iptime();
			(void)memcpy(cp + off, &ntime, sizeof(n_time));
			cp[IPOPT_OFFSET] += sizeof(n_time);
		}
	}
	if (forward && ipforwarding) {
		ip_forward(m, 1);
		return (1);
	}
	return (0);
bad:
	icmp_error(m, type, code, 0, 0);
	ipstat.ips_badoptions++;
	return (1);
}

/*
 * Given address of next destination (final or next hop),
 * return internet address info of interface to be used to get there.
 */
struct in_ifaddr *
ip_rtaddr(dst)
	struct in_addr dst;
{
	struct route sro;
	struct sockaddr_in *sin;
	struct in_ifaddr *ifa;

	bzero(&sro, sizeof(sro));
	sin = (struct sockaddr_in *)&sro.ro_dst;
	sin->sin_family = AF_INET;
	sin->sin_len = sizeof(*sin);
	sin->sin_addr = dst;
	rtalloc_ign(&sro, RTF_CLONING);

	if (sro.ro_rt == NULL)
		return ((struct in_ifaddr *)0);

	ifa = ifatoia(sro.ro_rt->rt_ifa);
	RTFREE(sro.ro_rt);
	return ifa;
}

/*
 * Save incoming source route for use in replies,
 * to be picked up later by ip_srcroute if the receiver is interested.
 */
static void
save_rte(m, option, dst)
	struct mbuf *m;
	u_char *option;
	struct in_addr dst;
{
	unsigned olen;
	struct ipopt_tag *opts;

	opts = (struct ipopt_tag *)m_tag_get(PACKET_TAG_IPOPTIONS,
					sizeof(struct ipopt_tag), M_NOWAIT);
	if (opts == NULL)
		return;

	olen = option[IPOPT_OLEN];
#ifdef DIAGNOSTIC
	if (ipprintfs)
		printf("save_rte: olen %d\n", olen);
#endif
	if (olen > sizeof(opts->ip_srcrt) - (1 + sizeof(dst)))
		return;
	bcopy(option, opts->ip_srcrt.srcopt, olen);
	opts->ip_nhops = (olen - IPOPT_OFFSET - 1) / sizeof(struct in_addr);
	opts->ip_srcrt.dst = dst;
	m_tag_prepend(m, (struct m_tag *)opts);
}

/*
 * Retrieve incoming source route for use in replies,
 * in the same form used by setsockopt.
 * The first hop is placed before the options, will be removed later.
 */
struct mbuf *
ip_srcroute(m0)
	struct mbuf *m0;
{
	register struct in_addr *p, *q;
	register struct mbuf *m;
	struct ipopt_tag *opts;

	opts = (struct ipopt_tag *)m_tag_find(m0, PACKET_TAG_IPOPTIONS, NULL);
	if (opts == NULL)
		return ((struct mbuf *)0);

	if (opts->ip_nhops == 0)
		return ((struct mbuf *)0);
	m = m_get(M_DONTWAIT, MT_HEADER);
	if (m == NULL)
		return ((struct mbuf *)0);

#define OPTSIZ	(sizeof(opts->ip_srcrt.nop) + sizeof(opts->ip_srcrt.srcopt))

	/* length is (nhops+1)*sizeof(addr) + sizeof(nop + srcrt header) */
	m->m_len = opts->ip_nhops * sizeof(struct in_addr) +
	    sizeof(struct in_addr) + OPTSIZ;
#ifdef DIAGNOSTIC
	if (ipprintfs)
		printf("ip_srcroute: nhops %d mlen %d", opts->ip_nhops, m->m_len);
#endif

	/*
	 * First save first hop for return route
	 */
	p = &(opts->ip_srcrt.route[opts->ip_nhops - 1]);
	*(mtod(m, struct in_addr *)) = *p--;
#ifdef DIAGNOSTIC
	if (ipprintfs)
		printf(" hops %lx", (u_long)ntohl(mtod(m, struct in_addr *)->s_addr));
#endif

	/*
	 * Copy option fields and padding (nop) to mbuf.
	 */
	opts->ip_srcrt.nop = IPOPT_NOP;
	opts->ip_srcrt.srcopt[IPOPT_OFFSET] = IPOPT_MINOFF;
	(void)memcpy(mtod(m, caddr_t) + sizeof(struct in_addr),
	    &(opts->ip_srcrt.nop), OPTSIZ);
	q = (struct in_addr *)(mtod(m, caddr_t) +
	    sizeof(struct in_addr) + OPTSIZ);
#undef OPTSIZ
	/*
	 * Record return path as an IP source route,
	 * reversing the path (pointers are now aligned).
	 */
	while (p >= opts->ip_srcrt.route) {
#ifdef DIAGNOSTIC
		if (ipprintfs)
			printf(" %lx", (u_long)ntohl(q->s_addr));
#endif
		*q++ = *p--;
	}
	/*
	 * Last hop goes to final destination.
	 */
	*q = opts->ip_srcrt.dst;
#ifdef DIAGNOSTIC
	if (ipprintfs)
		printf(" %lx\n", (u_long)ntohl(q->s_addr));
#endif
	m_tag_delete(m0, (struct m_tag *)opts);
	return (m);
}

/*
 * Strip out IP options, at higher
 * level protocol in the kernel.
 * Second argument is buffer to which options
 * will be moved, and return value is their length.
 * XXX should be deleted; last arg currently ignored.
 */
void
ip_stripoptions(m, mopt)
	register struct mbuf *m;
	struct mbuf *mopt;
{
	register int i;
	struct ip *ip = mtod(m, struct ip *);
	register caddr_t opts;
	int olen;

	olen = (ip->ip_hl << 2) - sizeof (struct ip);
	opts = (caddr_t)(ip + 1);
	i = m->m_len - (sizeof (struct ip) + olen);
	bcopy(opts + olen, opts, (unsigned)i);
	m->m_len -= olen;
	if (m->m_flags & M_PKTHDR)
		m->m_pkthdr.len -= olen;
	ip->ip_v = IPVERSION;
	ip->ip_hl = sizeof(struct ip) >> 2;
}

u_char inetctlerrmap[PRC_NCMDS] = {
	0,		0,		0,		0,
	0,		EMSGSIZE,	EHOSTDOWN,	EHOSTUNREACH,
	EHOSTUNREACH,	EHOSTUNREACH,	ECONNREFUSED,	ECONNREFUSED,
	EMSGSIZE,	EHOSTUNREACH,	0,		0,
	0,		0,		EHOSTUNREACH,	0,
	ENOPROTOOPT,	ECONNREFUSED
};

/*
 * Forward a packet.  If some error occurs return the sender
 * an icmp packet.  Note we can't always generate a meaningful
 * icmp message because icmp doesn't have a large enough repertoire
 * of codes and types.
 *
 * If not forwarding, just drop the packet.  This could be confusing
 * if ipforwarding was zero but some routing protocol was advancing
 * us as a gateway to somewhere.  However, we must let the routing
 * protocol deal with that.
 *
 * The srcrt parameter indicates whether the packet is being forwarded
 * via a source route.
 */
void
ip_forward(struct mbuf *m, int srcrt)
{
	struct ip *ip = mtod(m, struct ip *);
	struct in_ifaddr *ia = NULL;
	int error, type = 0, code = 0;
	struct mbuf *mcopy;
	struct in_addr dest;
	struct ifnet *destifp, dummyifp;

#ifdef DIAGNOSTIC
	if (ipprintfs)
		printf("forward: src %lx dst %lx ttl %x\n",
		    (u_long)ip->ip_src.s_addr, (u_long)ip->ip_dst.s_addr,
		    ip->ip_ttl);
#endif


	if (m->m_flags & (M_BCAST|M_MCAST) || in_canforward(ip->ip_dst) == 0) {
		ipstat.ips_cantforward++;
		m_freem(m);
		return;
	}
#ifdef IPSTEALTH
	if (!ipstealth) {
#endif
		if (ip->ip_ttl <= IPTTLDEC) {
			icmp_error(m, ICMP_TIMXCEED, ICMP_TIMXCEED_INTRANS,
			    0, 0);
			return;
		}
#ifdef IPSTEALTH
	}
#endif

	if (!srcrt && (ia = ip_rtaddr(ip->ip_dst)) == NULL) {
		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, 0);
		return;
	}

	/*
	 * Save the IP header and at most 8 bytes of the payload,
	 * in case we need to generate an ICMP message to the src.
	 *
	 * XXX this can be optimized a lot by saving the data in a local
	 * buffer on the stack (72 bytes at most), and only allocating the
	 * mbuf if really necessary. The vast majority of the packets
	 * are forwarded without having to send an ICMP back (either
	 * because unnecessary, or because rate limited), so we are
	 * really we are wasting a lot of work here.
	 *
	 * We don't use m_copy() because it might return a reference
	 * to a shared cluster. Both this function and ip_output()
	 * assume exclusive access to the IP header in `m', so any
	 * data in a cluster may change before we reach icmp_error().
	 */
	MGET(mcopy, M_DONTWAIT, m->m_type);
	if (mcopy != NULL && !m_dup_pkthdr(mcopy, m, M_DONTWAIT)) {
		/*
		 * It's probably ok if the pkthdr dup fails (because
		 * the deep copy of the tag chain failed), but for now
		 * be conservative and just discard the copy since
		 * code below may some day want the tags.
		 */
		m_free(mcopy);
		mcopy = NULL;
	}
	if (mcopy != NULL) {
		mcopy->m_len = imin((ip->ip_hl << 2) + 8,
		    (int)ip->ip_len);
		mcopy->m_pkthdr.len = mcopy->m_len;
		m_copydata(m, 0, mcopy->m_len, mtod(mcopy, caddr_t));
	}

#ifdef IPSTEALTH
	if (!ipstealth) {
#endif
		ip->ip_ttl -= IPTTLDEC;
#ifdef IPSTEALTH
	}
#endif

	/*
	 * If forwarding packet using same interface that it came in on,
	 * perhaps should send a redirect to sender to shortcut a hop.
	 * Only send redirect if source is sending directly to us,
	 * and if packet was not source routed (or has any options).
	 * Also, don't send redirect if forwarding using a default route
	 * or a route modified by a redirect.
	 */
	dest.s_addr = 0;
	if (!srcrt && ipsendredirects && ia->ia_ifp == m->m_pkthdr.rcvif) {
		struct sockaddr_in *sin;
		struct route ro;
		struct rtentry *rt;

		bzero(&ro, sizeof(ro));
		sin = (struct sockaddr_in *)&ro.ro_dst;
		sin->sin_family = AF_INET;
		sin->sin_len = sizeof(*sin);
		sin->sin_addr = ip->ip_dst;
		rtalloc_ign(&ro, RTF_CLONING);

		rt = ro.ro_rt;

		if (rt && (rt->rt_flags & (RTF_DYNAMIC|RTF_MODIFIED)) == 0 &&
		    satosin(rt_key(rt))->sin_addr.s_addr != 0) {
#define	RTA(rt)	((struct in_ifaddr *)(rt->rt_ifa))
			u_long src = ntohl(ip->ip_src.s_addr);

			if (RTA(rt) &&
			    (src & RTA(rt)->ia_subnetmask) == RTA(rt)->ia_subnet) {
				if (rt->rt_flags & RTF_GATEWAY)
					dest.s_addr = satosin(rt->rt_gateway)->sin_addr.s_addr;
				else
					dest.s_addr = ip->ip_dst.s_addr;
				/* Router requirements says to only send host redirects */
				type = ICMP_REDIRECT;
				code = ICMP_REDIRECT_HOST;
#ifdef DIAGNOSTIC
				if (ipprintfs)
					printf("redirect (%d) to %lx\n", code, (u_long)dest.s_addr);
#endif
			}
		}
		if (rt)
			RTFREE(rt);
	}

	error = ip_output(m, (struct mbuf *)0, NULL, IP_FORWARDING, 0, NULL);
	if (error)
		ipstat.ips_cantforward++;
	else {
		ipstat.ips_forward++;
		if (type)
			ipstat.ips_redirectsent++;
		else {
			if (mcopy)
				m_freem(mcopy);
			return;
		}
	}
	if (mcopy == NULL)
		return;
	destifp = NULL;

	switch (error) {

	case 0:				/* forwarded, but need redirect */
		/* type, code set above */
		break;

	case ENETUNREACH:		/* shouldn't happen, checked above */
	case EHOSTUNREACH:
	case ENETDOWN:
	case EHOSTDOWN:
	default:
		type = ICMP_UNREACH;
		code = ICMP_UNREACH_HOST;
		break;

	case EMSGSIZE:
		type = ICMP_UNREACH;
		code = ICMP_UNREACH_NEEDFRAG;
#if defined(IPSEC) || defined(FAST_IPSEC)
		/*
		 * If the packet is routed over IPsec tunnel, tell the
		 * originator the tunnel MTU.
		 *	tunnel MTU = if MTU - sizeof(IP) - ESP/AH hdrsiz
		 * XXX quickhack!!!
		 */
		{
			struct secpolicy *sp = NULL;
			int ipsecerror;
			int ipsechdr;
			struct route *ro;

#ifdef IPSEC
			sp = ipsec4_getpolicybyaddr(mcopy,
						    IPSEC_DIR_OUTBOUND,
						    IP_FORWARDING,
						    &ipsecerror);
#else /* FAST_IPSEC */
			sp = ipsec_getpolicybyaddr(mcopy,
						   IPSEC_DIR_OUTBOUND,
						   IP_FORWARDING,
						   &ipsecerror);
#endif
			if (sp != NULL) {
				/* count IPsec header size */
				ipsechdr = ipsec4_hdrsiz(mcopy,
							 IPSEC_DIR_OUTBOUND,
							 NULL);

				/*
				 * find the correct route for outer IPv4
				 * header, compute tunnel MTU.
				 *
				 * XXX BUG ALERT
				 * The "dummyifp" code relies upon the fact
				 * that icmp_error() touches only ifp->if_mtu.
				 */
				/*XXX*/
				destifp = NULL;
				if (sp->req != NULL
				 && sp->req->sav != NULL
				 && sp->req->sav->sah != NULL) {
					ro = &sp->req->sav->sah->sa_route;
					if (ro->ro_rt && ro->ro_rt->rt_ifp) {
						dummyifp.if_mtu =
						    ro->ro_rt->rt_rmx.rmx_mtu ?
						    ro->ro_rt->rt_rmx.rmx_mtu :
						    ro->ro_rt->rt_ifp->if_mtu;
						dummyifp.if_mtu -= ipsechdr;
						destifp = &dummyifp;
					}
				}

#ifdef IPSEC
				key_freesp(sp);
#else /* FAST_IPSEC */
				KEY_FREESP(&sp);
#endif
				ipstat.ips_cantfrag++;
				break;
			} else 
#endif /*IPSEC || FAST_IPSEC*/
		/*
		 * When doing source routing 'ia' can be NULL.  Fall back
		 * to the minimum guaranteed routeable packet size and use
		 * the same hack as IPSEC to setup a dummyifp for icmp.
		 */
		if (ia == NULL) {
			dummyifp.if_mtu = IP_MSS;
			destifp = &dummyifp;
		} else
			destifp = ia->ia_ifp;
#if defined(IPSEC) || defined(FAST_IPSEC)
		}
#endif /*IPSEC || FAST_IPSEC*/
		ipstat.ips_cantfrag++;
		break;

	case ENOBUFS:
		/*
		 * A router should not generate ICMP_SOURCEQUENCH as
		 * required in RFC1812 Requirements for IP Version 4 Routers.
		 * Source quench could be a big problem under DoS attacks,
		 * or if the underlying interface is rate-limited.
		 * Those who need source quench packets may re-enable them
		 * via the net.inet.ip.sendsourcequench sysctl.
		 */
		if (ip_sendsourcequench == 0) {
			m_freem(mcopy);
			return;
		} else {
			type = ICMP_SOURCEQUENCH;
			code = 0;
		}
		break;

	case EACCES:			/* ipfw denied packet */
		m_freem(mcopy);
		return;
	}
	icmp_error(mcopy, type, code, dest.s_addr, destifp);
}

void
ip_savecontrol(inp, mp, ip, m)
	register struct inpcb *inp;
	register struct mbuf **mp;
	register struct ip *ip;
	register struct mbuf *m;
{
	if (inp->inp_socket->so_options & (SO_BINTIME | SO_TIMESTAMP)) {
		struct bintime bt;

		bintime(&bt);
		if (inp->inp_socket->so_options & SO_BINTIME) {
			*mp = sbcreatecontrol((caddr_t) &bt, sizeof(bt),
			SCM_BINTIME, SOL_SOCKET);
			if (*mp)
				mp = &(*mp)->m_next;
		}
		if (inp->inp_socket->so_options & SO_TIMESTAMP) {
			struct timeval tv;

			bintime2timeval(&bt, &tv);
			*mp = sbcreatecontrol((caddr_t) &tv, sizeof(tv),
				SCM_TIMESTAMP, SOL_SOCKET);
			if (*mp)
				mp = &(*mp)->m_next;
		}
	}
	if (inp->inp_flags & INP_RECVDSTADDR) {
		*mp = sbcreatecontrol((caddr_t) &ip->ip_dst,
		    sizeof(struct in_addr), IP_RECVDSTADDR, IPPROTO_IP);
		if (*mp)
			mp = &(*mp)->m_next;
	}
	if (inp->inp_flags & INP_RECVTTL) {
		*mp = sbcreatecontrol((caddr_t) &ip->ip_ttl,
		    sizeof(u_char), IP_RECVTTL, IPPROTO_IP);
		if (*mp)
			mp = &(*mp)->m_next;
	}
#ifdef notyet
	/* XXX
	 * Moving these out of udp_input() made them even more broken
	 * than they already were.
	 */
	/* options were tossed already */
	if (inp->inp_flags & INP_RECVOPTS) {
		*mp = sbcreatecontrol((caddr_t) opts_deleted_above,
		    sizeof(struct in_addr), IP_RECVOPTS, IPPROTO_IP);
		if (*mp)
			mp = &(*mp)->m_next;
	}
	/* ip_srcroute doesn't do what we want here, need to fix */
	if (inp->inp_flags & INP_RECVRETOPTS) {
		*mp = sbcreatecontrol((caddr_t) ip_srcroute(m),
		    sizeof(struct in_addr), IP_RECVRETOPTS, IPPROTO_IP);
		if (*mp)
			mp = &(*mp)->m_next;
	}
#endif
	if (inp->inp_flags & INP_RECVIF) {
		struct ifnet *ifp;
		struct sdlbuf {
			struct sockaddr_dl sdl;
			u_char	pad[32];
		} sdlbuf;
		struct sockaddr_dl *sdp;
		struct sockaddr_dl *sdl2 = &sdlbuf.sdl;

		if (((ifp = m->m_pkthdr.rcvif)) 
		&& ( ifp->if_index && (ifp->if_index <= if_index))) {
			sdp = (struct sockaddr_dl *)
			    (ifaddr_byindex(ifp->if_index)->ifa_addr);
			/*
			 * Change our mind and don't try copy.
			 */
			if ((sdp->sdl_family != AF_LINK)
			|| (sdp->sdl_len > sizeof(sdlbuf))) {
				goto makedummy;
			}
			bcopy(sdp, sdl2, sdp->sdl_len);
		} else {
makedummy:	
			sdl2->sdl_len
				= offsetof(struct sockaddr_dl, sdl_data[0]);
			sdl2->sdl_family = AF_LINK;
			sdl2->sdl_index = 0;
			sdl2->sdl_nlen = sdl2->sdl_alen = sdl2->sdl_slen = 0;
		}
		*mp = sbcreatecontrol((caddr_t) sdl2, sdl2->sdl_len,
			IP_RECVIF, IPPROTO_IP);
		if (*mp)
			mp = &(*mp)->m_next;
	}
}

/*
 * XXX these routines are called from the upper part of the kernel.
 * They need to be locked when we remove Giant.
 *
 * They could also be moved to ip_mroute.c, since all the RSVP
 *  handling is done there already.
 */
static int ip_rsvp_on;
struct socket *ip_rsvpd;
int
ip_rsvp_init(struct socket *so)
{
	if (so->so_type != SOCK_RAW ||
	    so->so_proto->pr_protocol != IPPROTO_RSVP)
		return EOPNOTSUPP;

	if (ip_rsvpd != NULL)
		return EADDRINUSE;

	ip_rsvpd = so;
	/*
	 * This may seem silly, but we need to be sure we don't over-increment
	 * the RSVP counter, in case something slips up.
	 */
	if (!ip_rsvp_on) {
		ip_rsvp_on = 1;
		rsvp_on++;
	}

	return 0;
}

int
ip_rsvp_done(void)
{
	ip_rsvpd = NULL;
	/*
	 * This may seem silly, but we need to be sure we don't over-decrement
	 * the RSVP counter, in case something slips up.
	 */
	if (ip_rsvp_on) {
		ip_rsvp_on = 0;
		rsvp_on--;
	}
	return 0;
}

void
rsvp_input(struct mbuf *m, int off)	/* XXX must fixup manually */
{
	if (rsvp_input_p) { /* call the real one if loaded */
		rsvp_input_p(m, off);
		return;
	}

	/* Can still get packets with rsvp_on = 0 if there is a local member
	 * of the group to which the RSVP packet is addressed.  But in this
	 * case we want to throw the packet away.
	 */
	
	if (!rsvp_on) {
		m_freem(m);
		return;
	}

	if (ip_rsvpd != NULL) { 
		rip_input(m, off);
		return;
	}
	/* Drop the packet */
	m_freem(m);
}

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ip_input.c.diff"

--- ip_input.org.c	Mon Dec 27 01:53:29 2004
+++ ip_input.c	Mon Dec 27 01:51:55 2004
@@ -27,7 +27,7 @@
  * SUCH DAMAGE.
  *
  *	@(#)ip_input.c	8.2 (Berkeley) 1/4/94
- * $FreeBSD: /repoman/r/ncvs/src/sys/netinet/ip_input.c,v 1.292 2004/10/19 15:45:57 andre Exp $
+ * $FreeBSD: src/sys/netinet/ip_input.c,v 1.283.2.7 2004/10/03 17:04:40 mlaier Exp $
  */
 
 #include "opt_bootp.h"
@@ -156,7 +156,7 @@
 static int	ipprintfs = 0;
 #endif
 
-struct pfil_head inet_pfil_hook;	/* Packet filter hooks */
+struct pfil_head inet_pfil_hook;
 
 static struct	ifqueue ipintrq;
 static int	ipqmaxlen = IFQ_MAXLEN;
@@ -261,7 +261,7 @@
 		if (pr->pr_domain->dom_family == PF_INET &&
 		    pr->pr_protocol && pr->pr_protocol != IPPROTO_RAW) {
 			/* Be careful to only index valid IP protocols. */
-			if (pr->pr_protocol <= IPPROTO_MAX)
+			if (pr->pr_protocol && pr->pr_protocol < IPPROTO_MAX)
 				ip_protox[pr->pr_protocol] = pr - inetsw;
 		}
 
@@ -311,16 +311,29 @@
   	
 	if (m->m_flags & M_FASTFWD_OURS) {
 		/*
-		 * Firewall or NAT changed destination to local.
-		 * We expect ip_len and ip_off to be in host byte order.
+		 * ip_fastforward firewall changed dest to local.
+		 * We expect ip_len and ip_off in host byte order.
 		 */
-		m->m_flags &= ~M_FASTFWD_OURS;
-		/* Set up some basics that will be used later. */
+		m->m_flags &= ~M_FASTFWD_OURS;	/* for reflected mbufs */
+		/* Set up some basic stuff */
 		ip = mtod(m, struct ip *);
 		hlen = ip->ip_hl << 2;
   		goto ours;
   	}
 
+	if (m->m_flags & M_FASTFWD_PREPROC){
+		/*
+		 * Packets that require further analysis or destined
+		 * to our own addresses in ip_fastforward.
+		 * We expect ip_len and ip_off in host byte order.
+		 */
+		m->m_flags &= ~M_FASTFWD_PREPROC; /* for reflected mbufs */
+		/* Setup some basic stuff */
+		ip = mtod(m, struct ip *);
+		hlen = ip->ip_hl << 2;
+		goto preprocessed;
+	}
+
 	ipstat.ips_total++;
 
 	if (m->m_pkthdr.len < sizeof(struct ip))
@@ -408,6 +421,9 @@
 		} else
 			m_adj(m, ip->ip_len - m->m_pkthdr.len);
 	}
+
+preprocessed:
+
 #if defined(IPSEC) && !defined(IPSEC_FILTERGIF)
 	/*
 	 * Bypass packet filtering for packets from a tunnel (gif).
@@ -1143,67 +1159,6 @@
 	IPQ_UNLOCK();
 	in_rtqdrain();
 }
-
-/*
- * The protocol to be inserted into ip_protox[] must be already registered
- * in inetsw[], either statically or through pf_proto_register().
- */
-int
-ipproto_register(u_char ipproto)
-{
-	struct protosw *pr;
-
-	/* Sanity checks. */
-	if (ipproto == 0)
-		return (EPROTONOSUPPORT);
-
-	/*
-	 * The protocol slot must not be occupied by another protocol
-	 * already.  An index pointing to IPPROTO_RAW is unused.
-	 */
-	pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW);
-	if (pr == NULL)
-		return (EPFNOSUPPORT);
-	if (ip_protox[ipproto] != pr - inetsw)	/* IPPROTO_RAW */
-		return (EEXIST);
-
-	/* Find the protocol position in inetsw[] and set the index. */
-	for (pr = inetdomain.dom_protosw;
-	     pr < inetdomain.dom_protoswNPROTOSW; pr++) {
-		if (pr->pr_domain->dom_family == PF_INET &&
-		    pr->pr_protocol && pr->pr_protocol == ipproto) {
-			/* Be careful to only index valid IP protocols. */
-			if (pr->pr_protocol <= IPPROTO_MAX) {
-				ip_protox[pr->pr_protocol] = pr - inetsw;
-				return (0);
-			} else
-				return (EINVAL);
-		}
-	}
-	return (EPROTONOSUPPORT);
-}
-
-int
-ipproto_unregister(u_char ipproto)
-{
-	struct protosw *pr;
-
-	/* Sanity checks. */
-	if (ipproto == 0)
-		return (EPROTONOSUPPORT);
-
-	/* Check if the protocol was indeed registered. */
-	pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW);
-	if (pr == NULL)
-		return (EPFNOSUPPORT);
-	if (ip_protox[ipproto] == pr - inetsw)  /* IPPROTO_RAW */
-		return (ENOENT);
-
-	/* Reset the protocol slot to IPPROTO_RAW. */
-	ip_protox[ipproto] = pr - inetsw;
-	return (0);
-}
-
 
 /*
  * Do option processing on a datagram,

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ip_var.h"

/*
 * Copyright (c) 1982, 1986, 1993
 *	The Regents of the University of California.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 *	@(#)ip_var.h	8.2 (Berkeley) 1/9/95
 * $FreeBSD: src/sys/netinet/ip_var.h,v 1.89.2.2 2004/09/23 16:38:53 andre Exp $
 */

#ifndef _NETINET_IP_VAR_H_
#define	_NETINET_IP_VAR_H_

#include <sys/queue.h>

/*
 * Overlay for ip header used by other protocols (tcp, udp).
 */
struct ipovly {
	u_char	ih_x1[9];		/* (unused) */
	u_char	ih_pr;			/* protocol */
	u_short	ih_len;			/* protocol length */
	struct	in_addr ih_src;		/* source internet address */
	struct	in_addr ih_dst;		/* destination internet address */
};

#ifdef _KERNEL
/*
 * Ip reassembly queue structure.  Each fragment
 * being reassembled is attached to one of these structures.
 * They are timed out after ipq_ttl drops to 0, and may also
 * be reclaimed if memory becomes tight.
 */
struct ipq {
	TAILQ_ENTRY(ipq) ipq_list;	/* to other reass headers */
	u_char	ipq_ttl;		/* time for reass q to live */
	u_char	ipq_p;			/* protocol of this fragment */
	u_short	ipq_id;			/* sequence id for reassembly */
	struct mbuf *ipq_frags;		/* to ip headers of fragments */
	struct	in_addr ipq_src,ipq_dst;
	u_char	ipq_nfrags;		/* # frags in this packet */
	struct label *ipq_label;		/* MAC label */
};
#endif /* _KERNEL */

/*
 * Structure stored in mbuf in inpcb.ip_options
 * and passed to ip_output when ip options are in use.
 * The actual length of the options (including ipopt_dst)
 * is in m_len.
 */
#define MAX_IPOPTLEN	40

struct ipoption {
	struct	in_addr ipopt_dst;	/* first-hop dst if source routed */
	char	ipopt_list[MAX_IPOPTLEN];	/* options proper */
};

/*
 * Structure attached to inpcb.ip_moptions and
 * passed to ip_output when IP multicast options are in use.
 */
struct ip_moptions {
	struct	ifnet *imo_multicast_ifp; /* ifp for outgoing multicasts */
	struct in_addr imo_multicast_addr; /* ifindex/addr on MULTICAST_IF */
	u_char	imo_multicast_ttl;	/* TTL for outgoing multicasts */
	u_char	imo_multicast_loop;	/* 1 => hear sends if a member */
	u_short	imo_num_memberships;	/* no. memberships this socket */
	struct	in_multi *imo_membership[IP_MAX_MEMBERSHIPS];
	u_long	imo_multicast_vif;	/* vif num outgoing multicasts */
};

struct	ipstat {
	u_long	ips_total;		/* total packets received */
	u_long	ips_badsum;		/* checksum bad */
	u_long	ips_tooshort;		/* packet too short */
	u_long	ips_toosmall;		/* not enough data */
	u_long	ips_badhlen;		/* ip header length < data size */
	u_long	ips_badlen;		/* ip length < ip header length */
	u_long	ips_fragments;		/* fragments received */
	u_long	ips_fragdropped;	/* frags dropped (dups, out of space) */
	u_long	ips_fragtimeout;	/* fragments timed out */
	u_long	ips_forward;		/* packets forwarded */
	u_long	ips_fastforward;	/* packets fast forwarded */
	u_long	ips_transit_re;		/* packets sent to receive path from fastfwd */
	u_long	ips_cantforward;	/* packets rcvd for unreachable dest */
	u_long	ips_redirectsent;	/* packets forwarded on same net */
	u_long	ips_noproto;		/* unknown or unsupported protocol */
	u_long	ips_delivered;		/* datagrams delivered to upper level*/
	u_long	ips_localout;		/* total ip packets generated here */
	u_long	ips_odropped;		/* lost packets due to nobufs, etc. */
	u_long	ips_reassembled;	/* total packets reassembled ok */
	u_long	ips_fragmented;		/* datagrams successfully fragmented */
	u_long	ips_ofragments;		/* output fragments created */
	u_long	ips_cantfrag;		/* don't fragment flag was set, etc. */
	u_long	ips_badoptions;		/* error in option processing */
	u_long	ips_noroute;		/* packets discarded due to no route */
	u_long	ips_badvers;		/* ip version != 4 */
	u_long	ips_rawout;		/* total raw ip packets generated */
	u_long	ips_toolong;		/* ip length > max ip packet size */
	u_long	ips_notmember;		/* multicasts for unregistered grps */
	u_long	ips_nogif;		/* no match gif found */
	u_long	ips_badaddr;		/* invalid address on header */
};

#ifdef _KERNEL

/* flags passed to ip_output as last parameter */
#define	IP_FORWARDING		0x1		/* most of ip header exists */
#define	IP_RAWOUTPUT		0x2		/* raw ip header exists */
#define	IP_SENDONES		0x4		/* send all-ones broadcast */
#define	IP_ROUTETOIF		SO_DONTROUTE	/* bypass routing tables */
#define	IP_ALLOWBROADCAST	SO_BROADCAST	/* can send broadcast packets */

/* mbuf flag used by ip_fastfwd */
#define	M_FASTFWD_OURS		M_PROTO1	/* changed dst to local */
#define	M_FASTFWD_PREPROC	M_PROTO2	/* bypass pre processing */

struct ip;
struct inpcb;
struct route;
struct sockopt;

extern struct	ipstat	ipstat;
extern u_short	ip_id;				/* ip packet ctr, for ids */
extern int	ip_defttl;			/* default IP ttl */
extern int	ipforwarding;			/* ip forwarding */
extern int	ip_doopts;			/* process or ignore IP options */
#ifdef IPSTEALTH
extern int	ipstealth;			/* stealth forwarding */
#endif

extern u_char	ip_protox[];
extern struct socket *ip_rsvpd;	/* reservation protocol daemon */
extern struct socket *ip_mrouter; /* multicast routing daemon */
extern int	(*legal_vif_num)(int);
extern u_long	(*ip_mcast_src)(int);
extern int rsvp_on;
extern struct	pr_usrreqs rip_usrreqs;

int	 ip_ctloutput(struct socket *, struct sockopt *sopt);
void	 ip_drain(void);
int	 ip_fragment(struct ip *ip, struct mbuf **m_frag, int mtu,
	    u_long if_hwassist_flags, int sw_csum);
void	 ip_freemoptions(struct ip_moptions *);
void	 ip_init(void);
extern int	 (*ip_mforward)(struct ip *, struct ifnet *, struct mbuf *,
			  struct ip_moptions *);
int	 ip_output(struct mbuf *,
	    struct mbuf *, struct route *, int, struct ip_moptions *,
	    struct inpcb *);
struct mbuf *
	 ip_reass(struct mbuf *);
struct in_ifaddr *
	 ip_rtaddr(struct in_addr);
void	 ip_savecontrol(struct inpcb *, struct mbuf **, struct ip *,
		struct mbuf *);
void	 ip_slowtimo(void);
struct mbuf *
	 ip_srcroute(struct mbuf *);
void	 ip_stripoptions(struct mbuf *, struct mbuf *);
u_int16_t	ip_randomid(void);
int	rip_ctloutput(struct socket *, struct sockopt *);
void	rip_ctlinput(int, struct sockaddr *, void *);
void	rip_init(void);
void	rip_input(struct mbuf *, int);
int	rip_output(struct mbuf *, struct socket *, u_long);
void	ipip_input(struct mbuf *, int);
void	rsvp_input(struct mbuf *, int);
int	ip_rsvp_init(struct socket *);
int	ip_rsvp_done(void);
extern int	(*ip_rsvp_vif)(struct socket *, struct sockopt *);
extern void	(*ip_rsvp_force_done)(struct socket *);
extern void	(*rsvp_input_p)(struct mbuf *m, int off);

extern	struct pfil_head inet_pfil_hook;	/* packet filter hooks */

void	in_delayed_cksum(struct mbuf *m);

static __inline uint16_t ip_newid(void);
extern int ip_do_randomid;

static __inline uint16_t
ip_newid(void)
{
	if (ip_do_randomid)
		return ip_randomid();

	return htons(ip_id++);
}

#endif /* _KERNEL */

#endif /* !_NETINET_IP_VAR_H_ */

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ip_var.h.diff"

--- ip_var.org.h	Mon Dec 27 01:48:09 2004
+++ ip_var.h	Sun Dec 26 22:32:58 2004
@@ -27,7 +27,7 @@
  * SUCH DAMAGE.
  *
  *	@(#)ip_var.h	8.2 (Berkeley) 1/9/95
- * $FreeBSD: /repoman/r/ncvs/src/sys/netinet/ip_var.h,v 1.92 2004/10/19 15:45:57 andre Exp $
+ * $FreeBSD: src/sys/netinet/ip_var.h,v 1.89.2.2 2004/09/23 16:38:53 andre Exp $
  */
 
 #ifndef _NETINET_IP_VAR_H_
@@ -104,6 +104,7 @@
 	u_long	ips_fragtimeout;	/* fragments timed out */
 	u_long	ips_forward;		/* packets forwarded */
 	u_long	ips_fastforward;	/* packets fast forwarded */
+	u_long	ips_transit_re;		/* packets sent to receive path from fastfwd */
 	u_long	ips_cantforward;	/* packets rcvd for unreachable dest */
 	u_long	ips_redirectsent;	/* packets forwarded on same net */
 	u_long	ips_noproto;		/* unknown or unsupported protocol */
@@ -135,6 +136,7 @@
 
 /* mbuf flag used by ip_fastfwd */
 #define	M_FASTFWD_OURS		M_PROTO1	/* changed dst to local */
+#define	M_FASTFWD_PREPROC	M_PROTO2	/* bypass pre processing */
 
 struct ip;
 struct inpcb;
@@ -149,6 +151,7 @@
 #ifdef IPSTEALTH
 extern int	ipstealth;			/* stealth forwarding */
 #endif
+
 extern u_char	ip_protox[];
 extern struct socket *ip_rsvpd;	/* reservation protocol daemon */
 extern struct socket *ip_mrouter; /* multicast routing daemon */
@@ -168,8 +171,6 @@
 int	 ip_output(struct mbuf *,
 	    struct mbuf *, struct route *, int, struct ip_moptions *,
 	    struct inpcb *);
-int	 ipproto_register(u_char);
-int	 ipproto_unregister(u_char);
 struct mbuf *
 	 ip_reass(struct mbuf *);
 struct in_ifaddr *

--FCuugMFkClbJLl1L--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041227070514.GA68890>