From owner-freebsd-net@FreeBSD.ORG  Sun Dec 26 02:19:42 2004
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 569EA16A4CE; Sun, 26 Dec 2004 02:19:42 +0000 (GMT)
Received: from smtp.uol.com.br (smtpout1.uol.com.br [200.221.4.192])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 2C1CE43D1F; Sun, 26 Dec 2004 02:19:41 +0000 (GMT)
	(envelope-from jonny@jonny.eng.br)
Received: from [200.164.27.103] (200164027103.user.veloxzone.com.br
	[200.164.27.103])
	by scorpion1.uol.com.br (Postfix) with ESMTP id 79E8D774E;
	Sun, 26 Dec 2004 00:19:32 -0200 (BRST)
Message-ID: <41CE1FB5.4080401@jonny.eng.br>
Date: Sun, 26 Dec 2004 00:19:33 -0200
From: =?ISO-8859-1?Q?Jo=E3o_Carlos_Mendes_Lu=EDs?= <jonny@jonny.eng.br>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Robert Watson <rwatson@freebsd.org>
References: <Pine.NEB.3.96L.1041225121903.27724E-100000@fledge.watson.org>
In-Reply-To: <Pine.NEB.3.96L.1041225121903.27724E-100000@fledge.watson.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
cc: Jeff Behl <jbehl@fastclick.com>
cc: freebsd-performance@freebsd.org
cc: freebsd-net@freebsd.org
Subject: Re: %cpu in system - squid performance in FreeBSD 5.3
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Dec 2004 02:19:42 -0000

Robert Watson wrote:
> On Thu, 23 Dec 2004, Jeff Behl wrote:
> 
>>As a follow up to the below (original message at the very bottom), I
>>installed a load balancer in front of the machines which terminates the
>>tcp connections from clients and opens up a few, persistent connections
>>to each server over which requests are pipelined.  In this scenario
>>everything is copasetic: 
> 
> I'm not very familiar with Squid's architecture, but I would anticipate
> that what you're seeing is that the cost of additional connections served
> in parallel is pretty high due to the use of processes.  Specifically: if
> each TCP connection being served gets its own process, and there are a lot
> of TCP connections, you'll be doing a lot of process forking, context
> switching, exceeding cache sizes, etc.  With just a couple of connections,
> even if they're doing the same "work", the overhead is much lower. 
> Depending on how much time you're willing to invest in this, we can
> probably do quite a bit to diagnose where the cost is coming from and look
> for any specific problems or areas we could optimize.

     It must not be this.  Squid is mostly a single process system, with 
scheduling based on descriptors and select/poll.  Recent versions added 
some parallelism in other processes, but just for file reading/writing 
(diskd) and regular expression processing for ACLs.  Even DNS, which 
previously ran on blocking I/O in secondary processes now run internally 
in the select/poll scheduler.

     I also have some experience in older versions of squid, in which 
the same machine running the same version of squid, and changing Linux 
for FreeBSD raised the maximum simultaneus conection limit.

> I might start by turning on kernel profiling and doing a profile dump
> under load.  Be aware that turning on profiling uses up a lot of CPU
> itself, so will reduce the capacity of the system.  There's probably
> documentation elsewhere, but the process I use to set up profiling is
> here:

     I did not make any tests on this, but I would expect profiling to 
fail, since every step of the scheduler is very small, and deals with 
the smallest I/O available at that time.

     Indeed, based on the original report I would search for some 
optimization on descriptor searching in poll or select, whichever squid 
has chosen to use on FreeBSD (probably select, looking at the top 
output).  This is one of the crucial points on squid performance.  The 
other one is disk access, for sure, but the experimente describe would 
not change disk access patterns, would it?

>   http://www.watson.org/~robert/freebsd/netperf/profile/
> 
> Note that it warns the some results may be incorrect on SMP.  I think it
> would be useful to give it a try anyway just to see if we get something
> useful.

     As I said before, beeing a single process scheduler, squid does not 
gain much from SMP.  The secondary processes would benefit from the 
extra CPU, though.  Maybe interrupt processing also, if the giant lock 
does not interfere in any part of the processing path.

> As a final question: other than CPU consumption, do you have a reliable
> way to measure how efficiently the system is operating -- in particular,
> how fast it is able to serve data?  Having some sort of metric for
> performance can be quite useful in optimizing, as it can tell us whether

     One thing I fail to measure in FreeBSD is the reason for delays in 
disk access times.  How can I prove that the delay is on disk, and 
determine how to optimize it?  systat -v is very useful, but does not 
give me all answers.

>>last pid:  3377;  load averages:  0.12,  0.09,  0.08
>>up 0+17:24:53  10:02:13
>>31 processes:  1 running, 30 sleeping
>>CPU states:  5.1% user,  0.0% nice,  1.8% system,  1.2% interrupt, 92.0%
>>idle
>>Mem: 75M Active, 187M Inact, 168M Wired, 40K Cache, 214M Buf, 1482M Free
>>Swap: 4069M Total, 4069M Free
>>
>>  PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU
>>COMMAND
>>  474 squid     96    0 68276K 62480K select 0  53:38 16.80% 16.80%
>>squid
>>  311 bind      20    0 10628K  6016K kserel 0  12:28  0.00%  0.00%
>>named


                                         Jonny

-- 
João Carlos Mendes Luís - Networking Engineer - jonny@jonny.eng.br

From owner-freebsd-net@FreeBSD.ORG  Sun Dec 26 07:14:16 2004
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id A480216A4CE; Sun, 26 Dec 2004 07:14:16 +0000 (GMT)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 3FEBD43D2D; Sun, 26 Dec 2004 07:14:16 +0000 (GMT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (localhost [127.0.0.1])
	by fledge.watson.org (8.13.1/8.13.1) with ESMTP id iBQ7AxKU055168;
	Sun, 26 Dec 2004 02:10:59 -0500 (EST)
	(envelope-from robert@fledge.watson.org)
Received: from localhost (robert@localhost)iBQ7Axnk055164;
	Sun, 26 Dec 2004 07:10:59 GMT
	(envelope-from robert@fledge.watson.org)
Date: Sun, 26 Dec 2004 07:10:59 +0000 (GMT)
From: Robert Watson <rwatson@freebsd.org>
X-Sender: robert@fledge.watson.org
To: =?ISO-8859-1?Q?Jo=E3o_Carlos_Mendes_Lu=EDs?= <jonny@jonny.eng.br>
In-Reply-To: <41CE1FB5.4080401@jonny.eng.br>
Message-ID: <Pine.NEB.3.96L.1041226070126.45272F-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=ISO-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
cc: Jeff Behl <jbehl@fastclick.com>
cc: freebsd-performance@freebsd.org
cc: freebsd-net@freebsd.org
Subject: Re: %cpu in system - squid performance in FreeBSD 5.3
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Dec 2004 07:14:16 -0000


On Sun, 26 Dec 2004, Jo=E3o Carlos Mendes Lu=EDs wrote:

>      It must not be this.  Squid is mostly a single process system, with=
=20
> scheduling based on descriptors and select/poll.  Recent versions added=
=20
> some parallelism in other processes, but just for file reading/writing=20
> (diskd) and regular expression processing for ACLs.  Even DNS, which=20
> previously ran on blocking I/O in secondary processes now run internally=
=20
> in the select/poll scheduler.

Thanks for this information.

> > I might start by turning on kernel profiling and doing a profile dump
> > under load.  Be aware that turning on profiling uses up a lot of CPU
> > itself, so will reduce the capacity of the system.  There's probably
> > documentation elsewhere, but the process I use to set up profiling is
> > here:
>=20
>      I did not make any tests on this, but I would expect profiling to
> fail, since every step of the scheduler is very small, and deals with
> the smallest I/O available at that time.=20

This is kernel profiling, not application profiling, and would hopefully
give us information on where the kernel was spending most of its time,
since in the environment in question system time appears to be dominant.=20
If SMP in theory makes little difference to Squid performance, then
switching to a UP kernel may well make kernel profiling more reliable and
hence more useful in tracking systemn time.

>      Indeed, based on the original report I would search for some
> optimization on descriptor searching in poll or select, whichever squid
> has chosen to use on FreeBSD (probably select, looking at the top
> output).  This is one of the crucial points on squid performance.  The
> other one is disk access, for sure, but the experimente describe would
> not change disk access patterns, would it?=20

The reporter described a very high percentage of system time -- time spent
blocked on disk I/O isn't billed to system time; if spending lots of time
waiting on disk I/O for a single process, you'd see idle time rather than
system time predominating, I believe.

> > As a final question: other than CPU consumption, do you have a reliable
> > way to measure how efficiently the system is operating -- in particular=
,
> > how fast it is able to serve data?  Having some sort of metric for
> > performance can be quite useful in optimizing, as it can tell us whethe=
r
>=20
>      One thing I fail to measure in FreeBSD is the reason for delays in
> disk access times.  How can I prove that the delay is on disk, and
> determine how to optimize it?  systat -v is very useful, but does not
> give me all answers.=20

I'm not sure there are useful summary tools at a system-wide level for
this, but it is possible to use KTR(9) to trace the associated scheduler
and disk events.  In particular, I recently added high level tracing of
g_down and g_up GEOM events to KTR.  Jeff Roberson is about to commit a
scheduler visualization tool that interprets KTR events relating to the
scheduler that may also be useful.  It would certainly be extremely useful
to have a tool for normal system operation that could be pointed at a
process to say "show me the percent of time spent on various wait channels
for pid 50".  ktrace(1) has the ability to track context switches but
appears not to provide enough information to figure out why the context
switch took place currently.  I'll investigate this in the next couple of
days -- the trick is to gather this sort of statistic without too much
additional overhead.  If that's not easily possible, then simply
post-processing KTR may be the right approach.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Principal Research Scientist, McAfee Research


From owner-freebsd-net@FreeBSD.ORG  Sun Dec 26 18:10:48 2004
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B421216A4CE
	for <freebsd-net@freebsd.org>; Sun, 26 Dec 2004 18:10:48 +0000 (GMT)
Received: from borgtech.ca (borgtech.ca [216.187.106.216])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 735DE43D39
	for <freebsd-net@freebsd.org>; Sun, 26 Dec 2004 18:10:48 +0000 (GMT)
	(envelope-from asegu@borgtech.ca)
Received: from asegulaptop (ao3-m223.net.t-com.hr [195.29.34.223])
	by borgtech.ca (Postfix) with ESMTP id 3C55654A5;
	Sun, 26 Dec 2004 18:12:40 +0000 (GMT)
From: "Andrew Seguin" <asegu@borgtech.ca>
To: <freebsd-net@freebsd.org>
Date: Sun, 26 Dec 2004 19:10:29 +0100
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook, Build 11.0.5510
Thread-Index: AcTmwms23PKinut7T5aclTXOWMtivgEsaavA
In-Reply-To: <8510784015.20041220213227@star-sw.com>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180
Message-Id: <20041226181240.3C55654A5@borgtech.ca>
cc: "'Nickolay A. Kritsky'" <nkritsky@star-sw.com>
Subject: RE: FW: Curiosity in IPFW/Freebsd bridge. [more] 802.1q VLAN at
	fault?
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Dec 2004 18:10:48 -0000

My apologies for not replying sooner.

However, a few days before Christmas, I got the time to make the test and
the news is... it works.

A small curiosity however is that I had problem with the 'promisc' flag
being turned off. I ended up creating a small startup script to set the
sysctl and configure the netcards manually.

I thank all who helped me get this working!
Andrew

-----Original Message-----
From: Nickolay A. Kritsky [mailto:nkritsky@star-sw.com] 
Sent: Monday, December 20, 2004 7:32 PM
To: asegu@borgtech.ca
Cc: freebsd-net@freebsd.org
Subject: RE: FW: Curiosity in IPFW/Freebsd bridge. [more] 802.1q VLAN at
fault?

Hello asegu,

This one should work OK. But do not forget to put parent interfaces in
up and promisc mode in your rc.conf, otherwise you will not see any
vlan-bridging. 

Sunday, December 19, 2004, 11:33:57 PM, asegu@borgtech.ca wrote:

abc> Ok, the whole discussion to date led to how VLAN traffic wasn't being
abc> registered by IPFW in my system. I think that it'll probably be too
late
abc> for a code change to fix my problem, so I'm going to go the route of
abc> changing the network configuration.

abc> I've rebuilt to 4.10 and.. And I had no luck there (IPFW _really_
doesn't
abc> see the traffic now!). On the other hand, I've read about vlan
pseudo-dev
abc> and goten myself access to the switch's configuration.

abc> So tomorrow evening I plan on changing the vlan id used to 3, and then
in
abc> freebsd, use the following configuration(and I post this to the list to
abc> see if anybody knows that this is going to fail)

fxp1 -->> router (uses ID 2)
fxp0 -->> switch (uses ID 2, will switch to ID 3)
abc> ifconfig vlan1 vlan 3 vlandev fxp0
abc> ifconfig vlan0 vlan 2 vlandev fxp1

abc> sysctl net.link.ether.bridge_cfg=vlan1,vlan0
abc> sysctl net.link.ether.bridge_ipfw=1


abc> Does anybody think this will allow IPFW to see the packets? or that
this
abc> will outright fail?


abc> Thank you everybody,
abc> Andrew


-- 
Best regards,
;  Nickolay A. Kritsky
; SysAdmin STAR Software LLC
; mailto:nkritsky@star-sw.com


-- 
No virus found in this incoming message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.0 - Release Date: 12/17/2004
 

-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.4 - Release Date: 12/22/2004
 

From owner-freebsd-net@FreeBSD.ORG  Mon Dec 27 07:05:16 2004
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E3E1516A4CE
	for <freebsd-net@freebsd.org>; Mon, 27 Dec 2004 07:05:16 +0000 (GMT)
Received: from mx01.bos.ma.towardex.com (mx01.bos.ma.towardex.com
	[65.124.16.9])	by mx1.FreeBSD.org (Postfix) with ESMTP id C2E0A43D31
	for <freebsd-net@freebsd.org>; Mon, 27 Dec 2004 07:05:14 +0000 (GMT)
	(envelope-from haesu@mx01.bos.ma.towardex.com)
Received: by mx01.bos.ma.towardex.com (TowardEX ESMTP 3.0p11_DAKN, from userid
	1001)	id 59CD82F946; Mon, 27 Dec 2004 02:05:14 -0500 (EST)
Date: Mon, 27 Dec 2004 02:05:14 -0500
From: James <james@towardex.com>
To: freebsd-net@freebsd.org
Message-ID: <20041227070514.GA68890@scylla.towardex.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="FCuugMFkClbJLl1L"
Content-Disposition: inline
User-Agent: Mutt/1.4.1i
Subject: Receive path for ip_fastforward
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Dec 2004 07:05:17 -0000


--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

As requested, here you go.

What is included in the email attachments:

1. Modified files in raw format (for easier reads)
   - ip_fastfwd.c (sys/netinet)
   - ip_input.c   (sys/netinet)
   - in.c         (sys/netinet)
   - ip_var.h     (sys/netinet)
   - inet.c       (usr.bin/netstat)

2. Unified diff files for each above in .diff format
   so you can see the changes better with developer's eyes.

Notes:
  - ip_fastfwd.c:
    Production code, proven to work very well; currently in use on actual
    production routers pushing 300Mb/s traffic on the network.

 - ip_input.c:
   No changes other than mbuf tagging for packets preprocessed by ip_fastfwd
   in Steps 1 and 2 (the basic sanity/fallback checks).

 - ip_var.h:
   Adds one additional variable (ipstat.ips_transit_re) to track packets
   forwarded to receive path by ip_fastforward.

 - netstat/inet.c:
   Adds tracking information for ipstat.ips_transit_re in netstat(1) program.

 - in.c:
   Quickie hack (the code we are using on production routers is vastly
   different, so had to be quickly hacked up for this patch) to add receive
   path routes to routing table during SIOCSIFADDR. Been tested so far without
   any problems -- network address, broadcast address, our_own_addresses are
   installed with lo0/127.0.0.1 as next-hop during SIOCSIFADDR and are
   properly deleted during SIOCDIFADDR. But please make changes on this as
   necessary as this is a hack that may present some broken issues.


What it is and what it does:

 - For more information about what Receive Path ACL is all about:
<http://www.cisco.com/en/US/tech/tk648/tk361/technologies_white_paper09186a00801a0a5e.shtml>

 - The receive path installs IP addresses that should be forwarded to router's
  own control plane stack (ip_input and upwards) as /32 host routes to the
  routing table. During ip_fastforward stage, if the route to destination is
  a local/receive-path route (RTF_LOCAL), or if the packet needs to be punted
  to slow ip_input processing path because a further analysis is required,
  that packet is subject firewall rules that filter on the lo0 interface under
  INBOUND direction, before being released to ip_input. 

  The receive path work does _NOT_ actually forward the packet to lo0 driver.
  Doing so will actually break a number of protocols including OSPF and add
  further processing overhead for packets that need to be punted to ip_input.
  Instead, packets are simply subject to loopback filtering firewall rules
  before exiting ip_fastforward.

User's Guide:

--> Caveat before you start:
   The receive path uses pfil_hooks firewall API to subject control plane
   bound packets to loopback filtering rules. At this time, IPFW2 is *NOT*
   supported. pf(4) is fully supported and is proven to work fine for this
   application. IPFW does not work since it captures ifnet variable out of
   mbuf header instead of the ifnet provided by pfil_hooks.

 Step 1:
   sysctl -w net.inet.ip.fastforwarding=1
   Note: Fast forwarding MUST BE ENABLED in order for receive path to
   operate.

 Step 2:
   Setup pf(4) firewall rules to filter on lo0 at inbound direction.
   Be sure to allow packets sourced from 127.0.0.0/8 as many routing protocol
   software packages (including Zebra and Quagga) use loopback interface for
   their inter-process communications. Also be sure to allow any OSPF or
   routing protocols your router is running.

   Example of loopback filtering firewall out of a production router. The
   example below assumes your router is an edge router with just BGP running:

cr1.walt# pfctl -sr
pass quick on ge-0/0/0.2 all
pass quick on ge-0/1/0.12 all
pass quick on ge-0/1/0.203 all
pass in quick on lo0 proto tcp from any to any port = ssh keep state
pass in quick on lo0 proto tcp from any to any port = bgp keep state
pass in quick on lo0 proto tcp from any port = ftp-data to any
pass in quick on lo0 proto tcp from any port = ftp to any
pass in quick on lo0 proto tcp from any port = http to any
pass in quick on lo0 proto udp from any to any port 33434:33534
pass in quick on lo0 proto udp from any port = domain to any
pass in quick on lo0 proto icmp all
pass in quick on lo0 inet from 127.0.0.0/24 to any
block drop in quick on lo0 all
pass quick all
cr1.walt#

  Step 3:
    Packets successfully punted to ip_input either because they are too
    complex to be dealt with inside fast forwarding path, or because they
    are destined to router's own addresses, can be tracked by using the
    netstat(1) utility (after you patch it ofcourse). Example:

cr1.walt# netstat -sn -f inet | grep forward
        55647205 packets forwarded (52951423 packets fast forwarded)
        4927978 packets not forwardable
        345712 packets forwarded to receive path
cr1.walt#

As referenced by the BSD License, I am not liable for any damages arising
from your use of this feature submission.

Questions: let me know.

-J

-- 
James Jun                                            TowardEX Technologies, Inc.
Technical Lead                      Boston IPv4/IPv6 Web Hosting, Colocation and
james@towardex.com            Network design/consulting & configuration services
cell: 1(978)-394-2867           web: http://www.towardex.com , noc: www.twdx.net

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="in.c"

/*
 * Copyright (c) 1982, 1986, 1991, 1993
 *	The Regents of the University of California.  All rights reserved.
 * Copyright (C) 2001 WIDE Project.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 *	@(#)in.c	8.4 (Berkeley) 1/9/95
 * $FreeBSD: src/sys/netinet/in.c,v 1.77.2.1 2004/12/12 19:12:35 mlaier Exp $
 */

#include <sys/param.h>
#include <sys/systm.h>
#include <sys/sockio.h>
#include <sys/malloc.h>
#include <sys/socket.h>
#include <sys/kernel.h>
#include <sys/sysctl.h>

#include <net/if.h>
#include <net/if_types.h>
#include <net/route.h>

#include <netinet/in.h>
#include <netinet/in_var.h>
#include <netinet/in_pcb.h>

#include <netinet/igmp_var.h>

static MALLOC_DEFINE(M_IPMADDR, "in_multi", "internet multicast address");

static int in_mask2len(struct in_addr *);
static void in_len2mask(struct in_addr *, int);
static int in_lifaddr_ioctl(struct socket *, u_long, caddr_t,
	struct ifnet *, struct thread *);

static int	in_addprefix(struct in_ifaddr *, int);
static int	in_scrubprefix(struct in_ifaddr *);
static void	in_socktrim(struct sockaddr_in *);
static int	in_ifinit(struct ifnet *,
	    struct in_ifaddr *, struct sockaddr_in *, int);

static int subnetsarelocal = 0;
SYSCTL_INT(_net_inet_ip, OID_AUTO, subnets_are_local, CTLFLAG_RW,
	&subnetsarelocal, 0, "Treat all subnets as directly connected");

struct in_multihead in_multihead; /* XXX BSS initialization */

extern struct inpcbinfo ripcbinfo;
extern struct inpcbinfo udbinfo;

/*
 * Return 1 if an internet address is for a ``local'' host
 * (one to which we have a connection).  If subnetsarelocal
 * is true, this includes other subnets of the local net.
 * Otherwise, it includes only the directly-connected (sub)nets.
 */
int
in_localaddr(in)
	struct in_addr in;
{
	register u_long i = ntohl(in.s_addr);
	register struct in_ifaddr *ia;

	if (subnetsarelocal) {
		TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link)
			if ((i & ia->ia_netmask) == ia->ia_net)
				return (1);
	} else {
		TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link)
			if ((i & ia->ia_subnetmask) == ia->ia_subnet)
				return (1);
	}
	return (0);
}

/*
 * Return 1 if an internet address is for the local host and configured
 * on one of its interfaces.
 */
int
in_localip(in)
	struct in_addr in;
{
	struct in_ifaddr *ia;

	LIST_FOREACH(ia, INADDR_HASH(in.s_addr), ia_hash) {
		if (IA_SIN(ia)->sin_addr.s_addr == in.s_addr)
			return 1;
	}
	return 0;
}

/*
 * Determine whether an IP address is in a reserved set of addresses
 * that may not be forwarded, or whether datagrams to that destination
 * may be forwarded.
 */
int
in_canforward(in)
	struct in_addr in;
{
	register u_long i = ntohl(in.s_addr);
	register u_long net;

	if (IN_EXPERIMENTAL(i) || IN_MULTICAST(i))
		return (0);
	if (IN_CLASSA(i)) {
		net = i & IN_CLASSA_NET;
		if (net == 0 || net == (IN_LOOPBACKNET << IN_CLASSA_NSHIFT))
			return (0);
	}
	return (1);
}

/*
 * Sub-routine for in_ifaddrecv() and in_ifremrecv().
 * --james@towardex.com 12/17/2004
 */
static void
in_ifrecv_request(int call, int cmd, struct in_ifaddr *ia)
{
	struct sockaddr_in all1_sa;
	struct rtentry *nrt = NULL;
	struct ifaddr *ifa;
	int e = 0;
	struct sockaddr_in subnet = { sizeof(struct sockaddr_in), AF_INET };
	struct sockaddr_in loopback = { sizeof(struct sockaddr_in), AF_INET };

	ifa = &ia->ia_ifa;

       	bzero(&all1_sa, sizeof(all1_sa));
        all1_sa.sin_family = AF_INET;
        all1_sa.sin_len = sizeof(struct sockaddr_in);
        all1_sa.sin_addr.s_addr = (u_int32_t)0xffffffff;

	/* We need to manually specify loopback for network and broadcast
	 * addresses because we can't just let L2 rtrequest handlers to
	 * deal with ifa->if_addr set as gateway address.
	 */
        loopback.sin_family = AF_INET;
        loopback.sin_addr.s_addr = ntohl(INADDR_LOOPBACK);

	/*
	 * Set the rtflags to RTF_LLINFO so existing apps are happy
	 * with our changes.
	 */
	switch (call) {
	case 0:  /* own address request */
        	rtrequest(cmd, ifa->ifa_addr, sintosa(&loopback),
        	  (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt);
		break;
	case 1:  /* network address request */
        	rtrequest(cmd, sintosa(&ia->ia_dstaddr), sintosa(&loopback),
        	  (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt);
		break;
	case 2:  /* broadcast address request */
		subnet.sin_addr.s_addr = htonl(ia->ia_subnet);
		subnet.sin_family = AF_INET;

        	rtrequest(cmd, sintosa(&subnet), sintosa(&loopback),
        	  (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt);
		break;
	default:
		break;
	}

        if (nrt) {
                RT_LOCK(nrt);
                /*
                 * Make sure rt_ifa be equal to IFA, the second argument of
                 * the function.  We need this because when we refer to
                 * rt_ifa->ia_flags, we assume that the rt_ifa points to
		 * the address, not the loopback.
                 */
                if (cmd == RTM_ADD && ifa != nrt->rt_ifa) {
                        IFAFREE(nrt->rt_ifa);
                        IFAREF(ifa);
                        nrt->rt_ifa = ifa;
                }
                /*
		 * Report to routing socket.
                 */
                rt_newaddrmsg(cmd, ifa, e, nrt);
                if (cmd == RTM_DELETE) {
                        rtfree(nrt);
                } else {
                        /* the cmd must be RTM_ADD here */
                        RT_REMREF(nrt);
                        RT_UNLOCK(nrt);
                }
        }
}


/*
 * Add own address as loopback rtentry (receive path). We previously add
 * the route only if necessary (such as point to point circuit), or when
 * triggered by route cloning. However, a proper RIB and FIB implementation
 * must contain own-addrs as receive paths, allowing software to manage
 * its own addresses separately from prefixes. This is required for receive
 * adjacency/path in ip_fastforward() --james@towardex.com 2004/12/17
 */
static void
in_ifaddrecv(struct in_ifaddr *ia)
{
	struct rtentry *rt;
	int need_loop, need_netdst, need_bcast;
	struct sockaddr_in subnet = { sizeof(struct sockaddr_in), AF_INET };

	/* If there is no loopback entry, allocate one */
	rt = rtalloc1(ia->ia_ifa.ifa_addr, 0, 0);
	need_loop = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 ||
	  (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0);

	/* If there is no network entry, allocate one */
	if(rt) rtfree(rt);
	rt = rtalloc1(sintosa(&ia->ia_dstaddr), 0, 0);
	need_netdst = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 ||
	  (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0);
	
	/* If there is no broadcast entry, allocate one */
	subnet.sin_addr.s_addr = htonl(ia->ia_subnet);
	subnet.sin_family = AF_INET;
	if(rt) rtfree(rt);
	rt = rtalloc1(sintosa(&subnet), 0, 0);
	need_bcast = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 ||
	  (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0);

	if(rt)
	  rtfree(rt);

	if(need_loop)
	  in_ifrecv_request(0, RTM_ADD, ia);
	if(need_netdst)
	  in_ifrecv_request(1, RTM_ADD, ia);
	if(need_bcast)
	  in_ifrecv_request(2, RTM_ADD, ia);
}


/*
 * Remove loopback rtentry's of receive path generated by in_ifaddrecv()
 * if they exist. -- james 12/17/2004
 */
static void
in_ifremrecv(struct in_ifaddr *ia)
{
        struct rtentry *rt;
        
	/*
	 * Delete the route for ownaddr if it really exists.
	 */ 
        rt = rtalloc1(ia->ia_ifa.ifa_addr, 0, 0);
        if (rt != NULL && (rt->rt_flags & RTF_HOST) != 0 &&
             (rt->rt_ifp->if_flags & IFF_LOOPBACK) != 0) {
                  rtfree(rt);
                  in_ifrecv_request(0, RTM_DELETE, ia);
	}

	/* XXX
	 * Broadcast and network addresses are removed by
	 * by regular interface detach handlers, but we
	 * need to verify the design aspect of this more
	 * later.
	 */
}

/*
 * Trim a mask in a sockaddr
 */
static void
in_socktrim(ap)
struct sockaddr_in *ap;
{
    register char *cplim = (char *) &ap->sin_addr;
    register char *cp = (char *) (&ap->sin_addr + 1);

    ap->sin_len = 0;
    while (--cp >= cplim)
	if (*cp) {
	    (ap)->sin_len = cp - (char *) (ap) + 1;
	    break;
	}
}

static int
in_mask2len(mask)
	struct in_addr *mask;
{
	int x, y;
	u_char *p;

	p = (u_char *)mask;
	for (x = 0; x < sizeof(*mask); x++) {
		if (p[x] != 0xff)
			break;
	}
	y = 0;
	if (x < sizeof(*mask)) {
		for (y = 0; y < 8; y++) {
			if ((p[x] & (0x80 >> y)) == 0)
				break;
		}
	}
	return x * 8 + y;
}

static void
in_len2mask(mask, len)
	struct in_addr *mask;
	int len;
{
	int i;
	u_char *p;

	p = (u_char *)mask;
	bzero(mask, sizeof(*mask));
	for (i = 0; i < len / 8; i++)
		p[i] = 0xff;
	if (len % 8)
		p[i] = (0xff00 >> (len % 8)) & 0xff;
}

/*
 * Generic internet control operations (ioctl's).
 * Ifp is 0 if not an interface-specific ioctl.
 */
/* ARGSUSED */
int
in_control(so, cmd, data, ifp, td)
	struct socket *so;
	u_long cmd;
	caddr_t data;
	register struct ifnet *ifp;
	struct thread *td;
{
	register struct ifreq *ifr = (struct ifreq *)data;
	register struct in_ifaddr *ia = 0, *iap;
	register struct ifaddr *ifa;
	struct in_addr dst;
	struct in_ifaddr *oia;
	struct in_aliasreq *ifra = (struct in_aliasreq *)data;
	struct sockaddr_in oldaddr;
	int error, hostIsNew, iaIsNew, maskIsNew, s;

	iaIsNew = 0;

	switch (cmd) {
	case SIOCALIFADDR:
	case SIOCDLIFADDR:
		if (td && (error = suser(td)) != 0)
			return error;
		/*fall through*/
	case SIOCGLIFADDR:
		if (!ifp)
			return EINVAL;
		return in_lifaddr_ioctl(so, cmd, data, ifp, td);
	}

	/*
	 * Find address for this interface, if it exists.
	 *
	 * If an alias address was specified, find that one instead of
	 * the first one on the interface, if possible.
	 */
	if (ifp) {
		dst = ((struct sockaddr_in *)&ifr->ifr_addr)->sin_addr;
		LIST_FOREACH(iap, INADDR_HASH(dst.s_addr), ia_hash)
			if (iap->ia_ifp == ifp &&
			    iap->ia_addr.sin_addr.s_addr == dst.s_addr) {
				ia = iap;
				break;
			}
		if (ia == NULL)
			TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
				iap = ifatoia(ifa);
				if (iap->ia_addr.sin_family == AF_INET) {
					ia = iap;
					break;
				}
			}
	}

	switch (cmd) {

	case SIOCAIFADDR:
	case SIOCDIFADDR:
		if (ifp == 0)
			return (EADDRNOTAVAIL);
		if (ifra->ifra_addr.sin_family == AF_INET) {
			for (oia = ia; ia; ia = TAILQ_NEXT(ia, ia_link)) {
				if (ia->ia_ifp == ifp  &&
				    ia->ia_addr.sin_addr.s_addr ==
				    ifra->ifra_addr.sin_addr.s_addr)
					break;
			}
			if ((ifp->if_flags & IFF_POINTOPOINT)
			    && (cmd == SIOCAIFADDR)
			    && (ifra->ifra_dstaddr.sin_addr.s_addr
				== INADDR_ANY)) {
				return EDESTADDRREQ;
			}
		}
		if (cmd == SIOCDIFADDR && ia == 0)
			return (EADDRNOTAVAIL);
		/* FALLTHROUGH */
	case SIOCSIFADDR:
	case SIOCSIFNETMASK:
	case SIOCSIFDSTADDR:
		if (td && (error = suser(td)) != 0)
			return error;

		if (ifp == 0)
			return (EADDRNOTAVAIL);
		if (ia == (struct in_ifaddr *)0) {
			ia = (struct in_ifaddr *)
				malloc(sizeof *ia, M_IFADDR, M_WAITOK | M_ZERO);
			if (ia == (struct in_ifaddr *)NULL)
				return (ENOBUFS);
			/*
			 * Protect from ipintr() traversing address list
			 * while we're modifying it.
			 */
			s = splnet();
			TAILQ_INSERT_TAIL(&in_ifaddrhead, ia, ia_link);

			ifa = &ia->ia_ifa;
			IFA_LOCK_INIT(ifa);
			ifa->ifa_addr = (struct sockaddr *)&ia->ia_addr;
			ifa->ifa_dstaddr = (struct sockaddr *)&ia->ia_dstaddr;
			ifa->ifa_netmask = (struct sockaddr *)&ia->ia_sockmask;
			ifa->ifa_refcnt = 1;
			TAILQ_INSERT_TAIL(&ifp->if_addrhead, ifa, ifa_link);

			ia->ia_sockmask.sin_len = 8;
			ia->ia_sockmask.sin_family = AF_INET;
			if (ifp->if_flags & IFF_BROADCAST) {
				ia->ia_broadaddr.sin_len = sizeof(ia->ia_addr);
				ia->ia_broadaddr.sin_family = AF_INET;
			}
			ia->ia_ifp = ifp;
			splx(s);
			iaIsNew = 1;
		}
		break;

	case SIOCSIFBRDADDR:
		if (td && (error = suser(td)) != 0)
			return error;
		/* FALLTHROUGH */

	case SIOCGIFADDR:
	case SIOCGIFNETMASK:
	case SIOCGIFDSTADDR:
	case SIOCGIFBRDADDR:
		if (ia == (struct in_ifaddr *)0)
			return (EADDRNOTAVAIL);
		break;
	}
	switch (cmd) {

	case SIOCGIFADDR:
		*((struct sockaddr_in *)&ifr->ifr_addr) = ia->ia_addr;
		return (0);

	case SIOCGIFBRDADDR:
		if ((ifp->if_flags & IFF_BROADCAST) == 0)
			return (EINVAL);
		*((struct sockaddr_in *)&ifr->ifr_dstaddr) = ia->ia_broadaddr;
		return (0);

	case SIOCGIFDSTADDR:
		if ((ifp->if_flags & IFF_POINTOPOINT) == 0)
			return (EINVAL);
		*((struct sockaddr_in *)&ifr->ifr_dstaddr) = ia->ia_dstaddr;
		return (0);

	case SIOCGIFNETMASK:
		*((struct sockaddr_in *)&ifr->ifr_addr) = ia->ia_sockmask;
		return (0);

	case SIOCSIFDSTADDR:
		if ((ifp->if_flags & IFF_POINTOPOINT) == 0)
			return (EINVAL);
		oldaddr = ia->ia_dstaddr;
		ia->ia_dstaddr = *(struct sockaddr_in *)&ifr->ifr_dstaddr;
		if (ifp->if_ioctl && (error = (*ifp->if_ioctl)
					(ifp, SIOCSIFDSTADDR, (caddr_t)ia))) {
			ia->ia_dstaddr = oldaddr;
			return (error);
		}
		if (ia->ia_flags & IFA_ROUTE) {
			ia->ia_ifa.ifa_dstaddr = (struct sockaddr *)&oldaddr;
			rtinit(&(ia->ia_ifa), (int)RTM_DELETE, RTF_HOST);
			ia->ia_ifa.ifa_dstaddr =
					(struct sockaddr *)&ia->ia_dstaddr;
			rtinit(&(ia->ia_ifa), (int)RTM_ADD, RTF_HOST|RTF_UP);
		}
		return (0);

	case SIOCSIFBRDADDR:
		if ((ifp->if_flags & IFF_BROADCAST) == 0)
			return (EINVAL);
		ia->ia_broadaddr = *(struct sockaddr_in *)&ifr->ifr_broadaddr;
		return (0);

	case SIOCSIFADDR:
		error = in_ifinit(ifp, ia,
		    (struct sockaddr_in *) &ifr->ifr_addr, 1);
		if (error != 0 && iaIsNew)
			break;
		if (error == 0)
			EVENTHANDLER_INVOKE(ifaddr_event, ifp);
		return (0);

	case SIOCSIFNETMASK:
		ia->ia_sockmask.sin_addr = ifra->ifra_addr.sin_addr;
		ia->ia_subnetmask = ntohl(ia->ia_sockmask.sin_addr.s_addr);
		return (0);

	case SIOCAIFADDR:
		maskIsNew = 0;
		hostIsNew = 1;
		error = 0;
		if (ia->ia_addr.sin_family == AF_INET) {
			if (ifra->ifra_addr.sin_len == 0) {
				ifra->ifra_addr = ia->ia_addr;
				hostIsNew = 0;
			} else if (ifra->ifra_addr.sin_addr.s_addr ==
					       ia->ia_addr.sin_addr.s_addr)
				hostIsNew = 0;
		}
		if (ifra->ifra_mask.sin_len) {
			in_ifscrub(ifp, ia);
			ia->ia_sockmask = ifra->ifra_mask;
			ia->ia_sockmask.sin_family = AF_INET;
			ia->ia_subnetmask =
			     ntohl(ia->ia_sockmask.sin_addr.s_addr);
			maskIsNew = 1;
		}
		if ((ifp->if_flags & IFF_POINTOPOINT) &&
		    (ifra->ifra_dstaddr.sin_family == AF_INET)) {
			in_ifscrub(ifp, ia);
			ia->ia_dstaddr = ifra->ifra_dstaddr;
			maskIsNew  = 1; /* We lie; but the effect's the same */
		}
		if (ifra->ifra_addr.sin_family == AF_INET &&
		    (hostIsNew || maskIsNew))
			error = in_ifinit(ifp, ia, &ifra->ifra_addr, 0);
		if (error != 0 && iaIsNew)
			break;

		if ((ifp->if_flags & IFF_BROADCAST) &&
		    (ifra->ifra_broadaddr.sin_family == AF_INET))
			ia->ia_broadaddr = ifra->ifra_broadaddr;
		if (error == 0)
			EVENTHANDLER_INVOKE(ifaddr_event, ifp);
		return (error);

	case SIOCDIFADDR:
		/*
		 * in_ifscrub kills the interface route.
		 */
		in_ifscrub(ifp, ia);
		/*
		 * in_ifadown gets rid of all the rest of
		 * the routes.  This is not quite the right
		 * thing to do, but at least if we are running
		 * a routing process they will come back.
		 */
		in_ifadown(&ia->ia_ifa, 1);
		/*
		 * XXX horrible hack to detect that we are being called
		 * from if_detach()
		 */
		if (ifaddr_byindex(ifp->if_index) == NULL) {
			in_pcbpurgeif0(&ripcbinfo, ifp);
			in_pcbpurgeif0(&udbinfo, ifp);
		}
		EVENTHANDLER_INVOKE(ifaddr_event, ifp);
		error = 0;
		break;

	default:
		if (ifp == 0 || ifp->if_ioctl == 0)
			return (EOPNOTSUPP);
		return ((*ifp->if_ioctl)(ifp, cmd, data));
	}

	/*
	 * Protect from ipintr() traversing address list while we're modifying
	 * it.
	 */
	s = splnet();
	TAILQ_REMOVE(&ifp->if_addrhead, &ia->ia_ifa, ifa_link);
	TAILQ_REMOVE(&in_ifaddrhead, ia, ia_link);
	LIST_REMOVE(ia, ia_hash);
	IFAFREE(&ia->ia_ifa);
	splx(s);

	return (error);
}

/*
 * SIOC[GAD]LIFADDR.
 *	SIOCGLIFADDR: get first address. (?!?)
 *	SIOCGLIFADDR with IFLR_PREFIX:
 *		get first address that matches the specified prefix.
 *	SIOCALIFADDR: add the specified address.
 *	SIOCALIFADDR with IFLR_PREFIX:
 *		EINVAL since we can't deduce hostid part of the address.
 *	SIOCDLIFADDR: delete the specified address.
 *	SIOCDLIFADDR with IFLR_PREFIX:
 *		delete the first address that matches the specified prefix.
 * return values:
 *	EINVAL on invalid parameters
 *	EADDRNOTAVAIL on prefix match failed/specified address not found
 *	other values may be returned from in_ioctl()
 */
static int
in_lifaddr_ioctl(so, cmd, data, ifp, td)
	struct socket *so;
	u_long cmd;
	caddr_t	data;
	struct ifnet *ifp;
	struct thread *td;
{
	struct if_laddrreq *iflr = (struct if_laddrreq *)data;
	struct ifaddr *ifa;

	/* sanity checks */
	if (!data || !ifp) {
		panic("invalid argument to in_lifaddr_ioctl");
		/*NOTRECHED*/
	}

	switch (cmd) {
	case SIOCGLIFADDR:
		/* address must be specified on GET with IFLR_PREFIX */
		if ((iflr->flags & IFLR_PREFIX) == 0)
			break;
		/*FALLTHROUGH*/
	case SIOCALIFADDR:
	case SIOCDLIFADDR:
		/* address must be specified on ADD and DELETE */
		if (iflr->addr.ss_family != AF_INET)
			return EINVAL;
		if (iflr->addr.ss_len != sizeof(struct sockaddr_in))
			return EINVAL;
		/* XXX need improvement */
		if (iflr->dstaddr.ss_family
		 && iflr->dstaddr.ss_family != AF_INET)
			return EINVAL;
		if (iflr->dstaddr.ss_family
		 && iflr->dstaddr.ss_len != sizeof(struct sockaddr_in))
			return EINVAL;
		break;
	default: /*shouldn't happen*/
		return EOPNOTSUPP;
	}
	if (sizeof(struct in_addr) * 8 < iflr->prefixlen)
		return EINVAL;

	switch (cmd) {
	case SIOCALIFADDR:
	    {
		struct in_aliasreq ifra;

		if (iflr->flags & IFLR_PREFIX)
			return EINVAL;

		/* copy args to in_aliasreq, perform ioctl(SIOCAIFADDR_IN6). */
		bzero(&ifra, sizeof(ifra));
		bcopy(iflr->iflr_name, ifra.ifra_name,
			sizeof(ifra.ifra_name));

		bcopy(&iflr->addr, &ifra.ifra_addr, iflr->addr.ss_len);

		if (iflr->dstaddr.ss_family) {	/*XXX*/
			bcopy(&iflr->dstaddr, &ifra.ifra_dstaddr,
				iflr->dstaddr.ss_len);
		}

		ifra.ifra_mask.sin_family = AF_INET;
		ifra.ifra_mask.sin_len = sizeof(struct sockaddr_in);
		in_len2mask(&ifra.ifra_mask.sin_addr, iflr->prefixlen);

		return in_control(so, SIOCAIFADDR, (caddr_t)&ifra, ifp, td);
	    }
	case SIOCGLIFADDR:
	case SIOCDLIFADDR:
	    {
		struct in_ifaddr *ia;
		struct in_addr mask, candidate, match;
		struct sockaddr_in *sin;
		int cmp;

		bzero(&mask, sizeof(mask));
		if (iflr->flags & IFLR_PREFIX) {
			/* lookup a prefix rather than address. */
			in_len2mask(&mask, iflr->prefixlen);

			sin = (struct sockaddr_in *)&iflr->addr;
			match.s_addr = sin->sin_addr.s_addr;
			match.s_addr &= mask.s_addr;

			/* if you set extra bits, that's wrong */
			if (match.s_addr != sin->sin_addr.s_addr)
				return EINVAL;

			cmp = 1;
		} else {
			if (cmd == SIOCGLIFADDR) {
				/* on getting an address, take the 1st match */
				cmp = 0;	/*XXX*/
			} else {
				/* on deleting an address, do exact match */
				in_len2mask(&mask, 32);
				sin = (struct sockaddr_in *)&iflr->addr;
				match.s_addr = sin->sin_addr.s_addr;

				cmp = 1;
			}
		}

		TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link)	{
			if (ifa->ifa_addr->sa_family != AF_INET6)
				continue;
			if (!cmp)
				break;
			candidate.s_addr = ((struct sockaddr_in *)&ifa->ifa_addr)->sin_addr.s_addr;
			candidate.s_addr &= mask.s_addr;
			if (candidate.s_addr == match.s_addr)
				break;
		}
		if (!ifa)
			return EADDRNOTAVAIL;
		ia = (struct in_ifaddr *)ifa;

		if (cmd == SIOCGLIFADDR) {
			/* fill in the if_laddrreq structure */
			bcopy(&ia->ia_addr, &iflr->addr, ia->ia_addr.sin_len);

			if ((ifp->if_flags & IFF_POINTOPOINT) != 0) {
				bcopy(&ia->ia_dstaddr, &iflr->dstaddr,
					ia->ia_dstaddr.sin_len);
			} else
				bzero(&iflr->dstaddr, sizeof(iflr->dstaddr));

			iflr->prefixlen =
				in_mask2len(&ia->ia_sockmask.sin_addr);

			iflr->flags = 0;	/*XXX*/

			return 0;
		} else {
			struct in_aliasreq ifra;

			/* fill in_aliasreq and do ioctl(SIOCDIFADDR_IN6) */
			bzero(&ifra, sizeof(ifra));
			bcopy(iflr->iflr_name, ifra.ifra_name,
				sizeof(ifra.ifra_name));

			bcopy(&ia->ia_addr, &ifra.ifra_addr,
				ia->ia_addr.sin_len);
			if ((ifp->if_flags & IFF_POINTOPOINT) != 0) {
				bcopy(&ia->ia_dstaddr, &ifra.ifra_dstaddr,
					ia->ia_dstaddr.sin_len);
			}
			bcopy(&ia->ia_sockmask, &ifra.ifra_dstaddr,
				ia->ia_sockmask.sin_len);

			return in_control(so, SIOCDIFADDR, (caddr_t)&ifra,
					  ifp, td);
		}
	    }
	}

	return EOPNOTSUPP;	/*just for safety*/
}

/*
 * Delete any existing route for an interface.
 */
void
in_ifscrub(ifp, ia)
	register struct ifnet *ifp;
	register struct in_ifaddr *ia;
{
	in_scrubprefix(ia);

	/*
	 * delete receive path rtentry's if they exist.
	 */
	in_ifremrecv(ia);
}

/*
 * Initialize an interface's internet address
 * and routing table entry.
 */
static int
in_ifinit(ifp, ia, sin, scrub)
	register struct ifnet *ifp;
	register struct in_ifaddr *ia;
	struct sockaddr_in *sin;
	int scrub;
{
	register u_long i = ntohl(sin->sin_addr.s_addr);
	struct sockaddr_in oldaddr;
	int s = splimp(), flags = RTF_UP, error = 0;

	oldaddr = ia->ia_addr;
	if (oldaddr.sin_family == AF_INET)
		LIST_REMOVE(ia, ia_hash);
	ia->ia_addr = *sin;
	if (ia->ia_addr.sin_family == AF_INET)
		LIST_INSERT_HEAD(INADDR_HASH(ia->ia_addr.sin_addr.s_addr),
		    ia, ia_hash);
	/*
	 * Give the interface a chance to initialize
	 * if this is its first address,
	 * and to validate the address if necessary.
	 */
	if (ifp->if_ioctl &&
	    (error = (*ifp->if_ioctl)(ifp, SIOCSIFADDR, (caddr_t)ia))) {
		splx(s);
		/* LIST_REMOVE(ia, ia_hash) is done in in_control */
		ia->ia_addr = oldaddr;
		if (ia->ia_addr.sin_family == AF_INET)
			LIST_INSERT_HEAD(INADDR_HASH(ia->ia_addr.sin_addr.s_addr),
			    ia, ia_hash);
		return (error);
	}
	splx(s);
	if (scrub) {
		ia->ia_ifa.ifa_addr = (struct sockaddr *)&oldaddr;
		in_ifscrub(ifp, ia);
		ia->ia_ifa.ifa_addr = (struct sockaddr *)&ia->ia_addr;
	}
	if (IN_CLASSA(i))
		ia->ia_netmask = IN_CLASSA_NET;
	else if (IN_CLASSB(i))
		ia->ia_netmask = IN_CLASSB_NET;
	else
		ia->ia_netmask = IN_CLASSC_NET;
	/*
	 * The subnet mask usually includes at least the standard network part,
	 * but may may be smaller in the case of supernetting.
	 * If it is set, we believe it.
	 */
	if (ia->ia_subnetmask == 0) {
		ia->ia_subnetmask = ia->ia_netmask;
		ia->ia_sockmask.sin_addr.s_addr = htonl(ia->ia_subnetmask);
	} else
		ia->ia_netmask &= ia->ia_subnetmask;
	ia->ia_net = i & ia->ia_netmask;
	ia->ia_subnet = i & ia->ia_subnetmask;
	in_socktrim(&ia->ia_sockmask);
	/*
	 * Add route for the network.
	 */
	ia->ia_ifa.ifa_metric = ifp->if_metric;
	if (ifp->if_flags & IFF_BROADCAST) {
		ia->ia_broadaddr.sin_addr.s_addr =
			htonl(ia->ia_subnet | ~ia->ia_subnetmask);
		ia->ia_netbroadcast.s_addr =
			htonl(ia->ia_net | ~ ia->ia_netmask);
	} else if (ifp->if_flags & IFF_LOOPBACK) {
		ia->ia_dstaddr = ia->ia_addr;
		flags |= RTF_HOST;
	} else if (ifp->if_flags & IFF_POINTOPOINT) {
		if (ia->ia_dstaddr.sin_family != AF_INET)
			return (0);
		flags |= RTF_HOST;
	}
	if ((error = in_addprefix(ia, flags)) != 0)
		return (error);

	/*
	 * If the interface supports multicast, join the "all hosts"
	 * multicast group on that interface.
	 */
	if (ifp->if_flags & IFF_MULTICAST) {
		struct in_addr addr;

		addr.s_addr = htonl(INADDR_ALLHOSTS_GROUP);
		in_addmulti(&addr, ifp);
	}

	/*
	 * Bring online receive adjacency routes.
	 * -james 2004/12/17
	 *
	 * Deleted old 2004-09-09 kludge code; this is a cleaner
	 * approach, derived from KAME implementation for INET6.
	 */
	in_ifaddrecv(ia);

	return (error);
}

#define rtinitflags(x) \
	((((x)->ia_ifp->if_flags & (IFF_LOOPBACK | IFF_POINTOPOINT)) != 0) \
	    ? RTF_HOST : 0)
/*
 * Check if we have a route for the given prefix already or add a one
 * accordingly.
 */
static int
in_addprefix(target, flags)
	struct in_ifaddr *target;
	int flags;
{
	struct in_ifaddr *ia;
	struct in_addr prefix, mask, p;
	int error;

	if ((flags & RTF_HOST) != 0)
		prefix = target->ia_dstaddr.sin_addr;
	else {
		prefix = target->ia_addr.sin_addr;
		mask = target->ia_sockmask.sin_addr;
		prefix.s_addr &= mask.s_addr;
	}

	TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link) {
		if (rtinitflags(ia))
			p = ia->ia_dstaddr.sin_addr;
		else {
			p = ia->ia_addr.sin_addr;
			p.s_addr &= ia->ia_sockmask.sin_addr.s_addr;
		}

		if (prefix.s_addr != p.s_addr)
			continue;

		/*
		 * If we got a matching prefix route inserted by other
		 * interface address, we are done here.
		 */
		if (ia->ia_flags & IFA_ROUTE)
			return 0;
	}

	/*
	 * No-one seem to have this prefix route, so we try to insert it.
	 */
	error = rtinit(&target->ia_ifa, (int)RTM_ADD, flags);
	if (!error)
		target->ia_flags |= IFA_ROUTE;
	return error;
}


/*
 * If there is no other address in the system that can serve a route to the
 * same prefix, remove the route.  Hand over the route to the new address
 * otherwise.
 */
static int
in_scrubprefix(target)
	struct in_ifaddr *target;
{
	struct in_ifaddr *ia;
	struct in_addr prefix, mask, p;
	int error;

	if ((target->ia_flags & IFA_ROUTE) == 0)
		return 0;

	if (rtinitflags(target))
		prefix = target->ia_dstaddr.sin_addr;
	else {
		prefix = target->ia_addr.sin_addr;
		mask = target->ia_sockmask.sin_addr;
		prefix.s_addr &= mask.s_addr;
	}

	TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link) {
		if (rtinitflags(ia))
			p = ia->ia_dstaddr.sin_addr;
		else {
			p = ia->ia_addr.sin_addr;
			p.s_addr &= ia->ia_sockmask.sin_addr.s_addr;
		}

		if (prefix.s_addr != p.s_addr)
			continue;

		/*
		 * If we got a matching prefix address, move IFA_ROUTE and
		 * the route itself to it.  Make sure that routing daemons
		 * get a heads-up.
		 */
		if ((ia->ia_flags & IFA_ROUTE) == 0) {
			rtinit(&(target->ia_ifa), (int)RTM_DELETE,
			    rtinitflags(target));
			target->ia_flags &= ~IFA_ROUTE;

			error = rtinit(&ia->ia_ifa, (int)RTM_ADD,
			    rtinitflags(ia) | RTF_UP);
			if (error == 0)
				ia->ia_flags |= IFA_ROUTE;
			return error;
		}
	}

	/*
	 * As no-one seem to have this prefix, we can remove the route.
	 */
	rtinit(&(target->ia_ifa), (int)RTM_DELETE, rtinitflags(target));
	target->ia_flags &= ~IFA_ROUTE;
	return 0;
}

#undef rtinitflags

/*
 * Return 1 if the address might be a local broadcast address.
 */
int
in_broadcast(in, ifp)
	struct in_addr in;
	struct ifnet *ifp;
{
	register struct ifaddr *ifa;
	u_long t;

	if (in.s_addr == INADDR_BROADCAST ||
	    in.s_addr == INADDR_ANY)
		return 1;
	if ((ifp->if_flags & IFF_BROADCAST) == 0)
		return 0;
	t = ntohl(in.s_addr);
	/*
	 * Look through the list of addresses for a match
	 * with a broadcast address.
	 */
#define ia ((struct in_ifaddr *)ifa)
	TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link)
		if (ifa->ifa_addr->sa_family == AF_INET &&
		    (in.s_addr == ia->ia_broadaddr.sin_addr.s_addr ||
		     in.s_addr == ia->ia_netbroadcast.s_addr ||
		     /*
		      * Check for old-style (host 0) broadcast.
		      */
		     t == ia->ia_subnet || t == ia->ia_net) &&
		     /*
		      * Check for an all one subnetmask. These
		      * only exist when an interface gets a secondary
		      * address.
		      */
		     ia->ia_subnetmask != (u_long)0xffffffff)
			    return 1;
	return (0);
#undef ia
}
/*
 * Add an address to the list of IP multicast addresses for a given interface.
 */
struct in_multi *
in_addmulti(ap, ifp)
	register struct in_addr *ap;
	register struct ifnet *ifp;
{
	register struct in_multi *inm;
	int error;
	struct sockaddr_in sin;
	struct ifmultiaddr *ifma;
	int s = splnet();

	/*
	 * Call generic routine to add membership or increment
	 * refcount.  It wants addresses in the form of a sockaddr,
	 * so we build one here (being careful to zero the unused bytes).
	 */
	bzero(&sin, sizeof sin);
	sin.sin_family = AF_INET;
	sin.sin_len = sizeof sin;
	sin.sin_addr = *ap;
	error = if_addmulti(ifp, (struct sockaddr *)&sin, &ifma);
	if (error) {
		splx(s);
		return 0;
	}

	/*
	 * If ifma->ifma_protospec is null, then if_addmulti() created
	 * a new record.  Otherwise, we are done.
	 */
	if (ifma->ifma_protospec != 0) {
		splx(s);
		return ifma->ifma_protospec;
	}

	/* XXX - if_addmulti uses M_WAITOK.  Can this really be called
	   at interrupt time?  If so, need to fix if_addmulti. XXX */
	inm = (struct in_multi *)malloc(sizeof(*inm), M_IPMADDR,
	    M_NOWAIT | M_ZERO);
	if (inm == NULL) {
		splx(s);
		return (NULL);
	}

	inm->inm_addr = *ap;
	inm->inm_ifp = ifp;
	inm->inm_ifma = ifma;
	ifma->ifma_protospec = inm;
	LIST_INSERT_HEAD(&in_multihead, inm, inm_link);

	/*
	 * Let IGMP know that we have joined a new IP multicast group.
	 */
	igmp_joingroup(inm);
	splx(s);
	return (inm);
}

/*
 * Delete a multicast address record.
 */
void
in_delmulti(inm)
	register struct in_multi *inm;
{
	struct ifmultiaddr *ifma = inm->inm_ifma;
	struct in_multi my_inm;
	int s = splnet();

	my_inm.inm_ifp = NULL ; /* don't send the leave msg */
	if (ifma->ifma_refcount == 1) {
		/*
		 * No remaining claims to this record; let IGMP know that
		 * we are leaving the multicast group.
		 * But do it after the if_delmulti() which might reset
		 * the interface and nuke the packet.
		 */
		my_inm = *inm ;
		ifma->ifma_protospec = 0;
		LIST_REMOVE(inm, inm_link);
		free(inm, M_IPMADDR);
	}
	/* XXX - should be separate API for when we have an ifma? */
	if_delmulti(ifma->ifma_ifp, ifma->ifma_addr);
	if (my_inm.inm_ifp != NULL)
		igmp_leavegroup(&my_inm);
	splx(s);
}

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="in.c.diff"

--- in.org.c	Mon Dec 27 01:43:19 2004
+++ in.c	Mon Dec 27 01:42:40 2004
@@ -28,7 +28,7 @@
  * SUCH DAMAGE.
  *
  *	@(#)in.c	8.4 (Berkeley) 1/9/95
- * $FreeBSD: /repoman/r/ncvs/src/sys/netinet/in.c,v 1.77.2.1 2004/12/12 19:12:35 mlaier Exp $
+ * $FreeBSD: src/sys/netinet/in.c,v 1.77.2.1 2004/12/12 19:12:35 mlaier Exp $
  */
 
 #include <sys/param.h>
@@ -136,6 +136,159 @@
 }
 
 /*
+ * Sub-routine for in_ifaddrecv() and in_ifremrecv().
+ * --james@towardex.com 12/17/2004
+ */
+static void
+in_ifrecv_request(int call, int cmd, struct in_ifaddr *ia)
+{
+	struct sockaddr_in all1_sa;
+	struct rtentry *nrt = NULL;
+	struct ifaddr *ifa;
+	int e = 0;
+	struct sockaddr_in subnet = { sizeof(struct sockaddr_in), AF_INET };
+	struct sockaddr_in loopback = { sizeof(struct sockaddr_in), AF_INET };
+
+	ifa = &ia->ia_ifa;
+
+       	bzero(&all1_sa, sizeof(all1_sa));
+        all1_sa.sin_family = AF_INET;
+        all1_sa.sin_len = sizeof(struct sockaddr_in);
+        all1_sa.sin_addr.s_addr = (u_int32_t)0xffffffff;
+
+	/* We need to manually specify loopback for network and broadcast
+	 * addresses because we can't just let L2 rtrequest handlers to
+	 * deal with ifa->if_addr set as gateway address.
+	 */
+        loopback.sin_family = AF_INET;
+        loopback.sin_addr.s_addr = ntohl(INADDR_LOOPBACK);
+
+	/*
+	 * Set the rtflags to RTF_LLINFO so existing apps are happy
+	 * with our changes.
+	 */
+	switch (call) {
+	case 0:  /* own address request */
+        	rtrequest(cmd, ifa->ifa_addr, sintosa(&loopback),
+        	  (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt);
+		break;
+	case 1:  /* network address request */
+        	rtrequest(cmd, sintosa(&ia->ia_dstaddr), sintosa(&loopback),
+        	  (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt);
+		break;
+	case 2:  /* broadcast address request */
+		subnet.sin_addr.s_addr = htonl(ia->ia_subnet);
+		subnet.sin_family = AF_INET;
+
+        	rtrequest(cmd, sintosa(&subnet), sintosa(&loopback),
+        	  (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt);
+		break;
+	default:
+		break;
+	}
+
+        if (nrt) {
+                RT_LOCK(nrt);
+                /*
+                 * Make sure rt_ifa be equal to IFA, the second argument of
+                 * the function.  We need this because when we refer to
+                 * rt_ifa->ia_flags, we assume that the rt_ifa points to
+		 * the address, not the loopback.
+                 */
+                if (cmd == RTM_ADD && ifa != nrt->rt_ifa) {
+                        IFAFREE(nrt->rt_ifa);
+                        IFAREF(ifa);
+                        nrt->rt_ifa = ifa;
+                }
+                /*
+		 * Report to routing socket.
+                 */
+                rt_newaddrmsg(cmd, ifa, e, nrt);
+                if (cmd == RTM_DELETE) {
+                        rtfree(nrt);
+                } else {
+                        /* the cmd must be RTM_ADD here */
+                        RT_REMREF(nrt);
+                        RT_UNLOCK(nrt);
+                }
+        }
+}
+
+
+/*
+ * Add own address as loopback rtentry (receive path). We previously add
+ * the route only if necessary (such as point to point circuit), or when
+ * triggered by route cloning. However, a proper RIB and FIB implementation
+ * must contain own-addrs as receive paths, allowing software to manage
+ * its own addresses separately from prefixes. This is required for receive
+ * adjacency/path in ip_fastforward() --james@towardex.com 2004/12/17
+ */
+static void
+in_ifaddrecv(struct in_ifaddr *ia)
+{
+	struct rtentry *rt;
+	int need_loop, need_netdst, need_bcast;
+	struct sockaddr_in subnet = { sizeof(struct sockaddr_in), AF_INET };
+
+	/* If there is no loopback entry, allocate one */
+	rt = rtalloc1(ia->ia_ifa.ifa_addr, 0, 0);
+	need_loop = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 ||
+	  (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0);
+
+	/* If there is no network entry, allocate one */
+	if(rt) rtfree(rt);
+	rt = rtalloc1(sintosa(&ia->ia_dstaddr), 0, 0);
+	need_netdst = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 ||
+	  (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0);
+	
+	/* If there is no broadcast entry, allocate one */
+	subnet.sin_addr.s_addr = htonl(ia->ia_subnet);
+	subnet.sin_family = AF_INET;
+	if(rt) rtfree(rt);
+	rt = rtalloc1(sintosa(&subnet), 0, 0);
+	need_bcast = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 ||
+	  (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0);
+
+	if(rt)
+	  rtfree(rt);
+
+	if(need_loop)
+	  in_ifrecv_request(0, RTM_ADD, ia);
+	if(need_netdst)
+	  in_ifrecv_request(1, RTM_ADD, ia);
+	if(need_bcast)
+	  in_ifrecv_request(2, RTM_ADD, ia);
+}
+
+
+/*
+ * Remove loopback rtentry's of receive path generated by in_ifaddrecv()
+ * if they exist. -- james 12/17/2004
+ */
+static void
+in_ifremrecv(struct in_ifaddr *ia)
+{
+        struct rtentry *rt;
+        
+	/*
+	 * Delete the route for ownaddr if it really exists.
+	 */ 
+        rt = rtalloc1(ia->ia_ifa.ifa_addr, 0, 0);
+        if (rt != NULL && (rt->rt_flags & RTF_HOST) != 0 &&
+             (rt->rt_ifp->if_flags & IFF_LOOPBACK) != 0) {
+                  rtfree(rt);
+                  in_ifrecv_request(0, RTM_DELETE, ia);
+	}
+
+	/* XXX
+	 * Broadcast and network addresses are removed by
+	 * by regular interface detach handlers, but we
+	 * need to verify the design aspect of this more
+	 * later.
+	 */
+}
+
+/*
  * Trim a mask in a sockaddr
  */
 static void
@@ -658,6 +811,11 @@
 	register struct in_ifaddr *ia;
 {
 	in_scrubprefix(ia);
+
+	/*
+	 * delete receive path rtentry's if they exist.
+	 */
+	in_ifremrecv(ia);
 }
 
 /*
@@ -752,6 +910,16 @@
 		addr.s_addr = htonl(INADDR_ALLHOSTS_GROUP);
 		in_addmulti(&addr, ifp);
 	}
+
+	/*
+	 * Bring online receive adjacency routes.
+	 * -james 2004/12/17
+	 *
+	 * Deleted old 2004-09-09 kludge code; this is a cleaner
+	 * approach, derived from KAME implementation for INET6.
+	 */
+	in_ifaddrecv(ia);
+
 	return (error);
 }
 
@@ -806,6 +974,8 @@
 		target->ia_flags |= IFA_ROUTE;
 	return error;
 }
+
+
 
 /*
  * If there is no other address in the system that can serve a route to the

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="inet.c"

/*
 * Copyright (c) 1983, 1988, 1993, 1995
 *	The Regents of the University of California.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. All advertising materials mentioning features or use of this software
 *    must display the following acknowledgement:
 *	This product includes software developed by the University of
 *	California, Berkeley and its contributors.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

#if 0
#ifndef lint
static char sccsid[] = "@(#)inet.c	8.5 (Berkeley) 5/24/95";
#endif /* not lint */
#endif

#include <sys/cdefs.h>
__FBSDID("$FreeBSD: src/usr.bin/netstat/inet.c,v 1.67 2004/07/26 20:18:11 charnier Exp $");

#include <sys/param.h>
#include <sys/queue.h>
#include <sys/socket.h>
#include <sys/socketvar.h>
#include <sys/sysctl.h>
#include <sys/protosw.h>

#include <net/route.h>
#include <netinet/in.h>
#include <netinet/in_systm.h>
#include <netinet/ip.h>
#ifdef INET6
#include <netinet/ip6.h>
#endif /* INET6 */
#include <netinet/in_pcb.h>
#include <netinet/ip_icmp.h>
#include <netinet/icmp_var.h>
#include <netinet/igmp_var.h>
#include <netinet/ip_var.h>
#include <netinet/pim_var.h>
#include <netinet/tcp.h>
#include <netinet/tcpip.h>
#include <netinet/tcp_seq.h>
#define TCPSTATES
#include <netinet/tcp_fsm.h>
#include <netinet/tcp_timer.h>
#include <netinet/tcp_var.h>
#include <netinet/tcp_debug.h>
#include <netinet/udp.h>
#include <netinet/udp_var.h>

#include <arpa/inet.h>
#include <err.h>
#include <errno.h>
#include <libutil.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "netstat.h"

char	*inetname (struct in_addr *);
void	inetprint (struct in_addr *, int, const char *, int);
#ifdef INET6
static int udp_done, tcp_done;
#endif /* INET6 */

/*
 * Print a summary of connections related to an Internet
 * protocol.  For TCP, also give state of connection.
 * Listening processes (aflag) are suppressed unless the
 * -a (all) flag is specified.
 */
void
protopr(u_long proto,		/* for sysctl version we pass proto # */
	const char *name, int af1)
{
	int istcp;
	static int first = 1;
	char *buf;
	const char *mibvar, *vchar;
	struct tcpcb *tp = NULL;
	struct inpcb *inp;
	struct xinpgen *xig, *oxig;
	struct xsocket *so;
	size_t len;

	istcp = 0;
	switch (proto) {
	case IPPROTO_TCP:
#ifdef INET6
		if (tcp_done != 0)
			return;
		else
			tcp_done = 1;
#endif
		istcp = 1;
		mibvar = "net.inet.tcp.pcblist";
		break;
	case IPPROTO_UDP:
#ifdef INET6
		if (udp_done != 0)
			return;
		else
			udp_done = 1;
#endif
		mibvar = "net.inet.udp.pcblist";
		break;
	case IPPROTO_DIVERT:
		mibvar = "net.inet.divert.pcblist";
		break;
	default:
		mibvar = "net.inet.raw.pcblist";
		break;
	}
	len = 0;
	if (sysctlbyname(mibvar, 0, &len, 0, 0) < 0) {
		if (errno != ENOENT)
			warn("sysctl: %s", mibvar);
		return;
	}
	if ((buf = malloc(len)) == 0) {
		warnx("malloc %lu bytes", (u_long)len);
		return;
	}
	if (sysctlbyname(mibvar, buf, &len, 0, 0) < 0) {
		warn("sysctl: %s", mibvar);
		free(buf);
		return;
	}

	oxig = xig = (struct xinpgen *)buf;
	for (xig = (struct xinpgen *)((char *)xig + xig->xig_len);
	     xig->xig_len > sizeof(struct xinpgen);
	     xig = (struct xinpgen *)((char *)xig + xig->xig_len)) {
		if (istcp) {
			tp = &((struct xtcpcb *)xig)->xt_tp;
			inp = &((struct xtcpcb *)xig)->xt_inp;
			so = &((struct xtcpcb *)xig)->xt_socket;
		} else {
			inp = &((struct xinpcb *)xig)->xi_inp;
			so = &((struct xinpcb *)xig)->xi_socket;
		}

		/* Ignore sockets for protocols other than the desired one. */
		if (so->xso_protocol != (int)proto)
			continue;

		/* Ignore PCBs which were freed during copyout. */
		if (inp->inp_gencnt > oxig->xig_gen)
			continue;

		if ((af1 == AF_INET && (inp->inp_vflag & INP_IPV4) == 0)
#ifdef INET6
		    || (af1 == AF_INET6 && (inp->inp_vflag & INP_IPV6) == 0)
#endif /* INET6 */
		    || (af1 == AF_UNSPEC && ((inp->inp_vflag & INP_IPV4) == 0
#ifdef INET6
					    && (inp->inp_vflag &
						INP_IPV6) == 0
#endif /* INET6 */
			))
		    )
			continue;
		if (!aflag &&
		    (
		     (istcp && tp->t_state == TCPS_LISTEN)
		     || (af1 == AF_INET &&
		      inet_lnaof(inp->inp_laddr) == INADDR_ANY)
#ifdef INET6
		     || (af1 == AF_INET6 &&
			 IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr))
#endif /* INET6 */
		     || (af1 == AF_UNSPEC &&
			 (((inp->inp_vflag & INP_IPV4) != 0 &&
			   inet_lnaof(inp->inp_laddr) == INADDR_ANY)
#ifdef INET6
			  || ((inp->inp_vflag & INP_IPV6) != 0 &&
			      IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr))
#endif
			  ))
		     ))
			continue;

		if (first) {
			if (!Lflag) {
				printf("Active Internet connections");
				if (aflag)
					printf(" (including servers)");
			} else
				printf(
	"Current listen queue sizes (qlen/incqlen/maxqlen)");
			putchar('\n');
			if (Aflag)
				printf("%-8.8s ", "Socket");
			if (Lflag)
				printf("%-5.5s %-14.14s %-22.22s\n",
					"Proto", "Listen", "Local Address");
			else
				printf((Aflag && !Wflag) ?
		"%-5.5s %-6.6s %-6.6s  %-18.18s %-18.18s %s\n" :
		"%-5.5s %-6.6s %-6.6s  %-22.22s %-22.22s %s\n",
					"Proto", "Recv-Q", "Send-Q",
					"Local Address", "Foreign Address",
					"(state)");
			first = 0;
		}
		if (Lflag && so->so_qlimit == 0)
			continue;
		if (Aflag) {
			if (istcp)
				printf("%8lx ", (u_long)inp->inp_ppcb);
			else
				printf("%8lx ", (u_long)so->so_pcb);
		}
#ifdef INET6
		if ((inp->inp_vflag & INP_IPV6) != 0)
			vchar = ((inp->inp_vflag & INP_IPV4) != 0)
				? "46" : "6 ";
		else
#endif
		vchar = ((inp->inp_vflag & INP_IPV4) != 0)
				? "4 " : "  ";
		printf("%-3.3s%-2.2s ", name, vchar);
		if (Lflag) {
			char buf1[15];

			snprintf(buf1, 15, "%d/%d/%d", so->so_qlen,
				 so->so_incqlen, so->so_qlimit);
			printf("%-14.14s ", buf1);
		} else {
			printf("%6u %6u  ",
			       so->so_rcv.sb_cc,
			       so->so_snd.sb_cc);
		}
		if (numeric_port) {
			if (inp->inp_vflag & INP_IPV4) {
				inetprint(&inp->inp_laddr, (int)inp->inp_lport,
					  name, 1);
				if (!Lflag)
					inetprint(&inp->inp_faddr,
						  (int)inp->inp_fport, name, 1);
			}
#ifdef INET6
			else if (inp->inp_vflag & INP_IPV6) {
				inet6print(&inp->in6p_laddr,
					   (int)inp->inp_lport, name, 1);
				if (!Lflag)
					inet6print(&inp->in6p_faddr,
						   (int)inp->inp_fport, name, 1);
			} /* else nothing printed now */
#endif /* INET6 */
		} else if (inp->inp_flags & INP_ANONPORT) {
			if (inp->inp_vflag & INP_IPV4) {
				inetprint(&inp->inp_laddr, (int)inp->inp_lport,
					  name, 1);
				if (!Lflag)
					inetprint(&inp->inp_faddr,
						  (int)inp->inp_fport, name, 0);
			}
#ifdef INET6
			else if (inp->inp_vflag & INP_IPV6) {
				inet6print(&inp->in6p_laddr,
					   (int)inp->inp_lport, name, 1);
				if (!Lflag)
					inet6print(&inp->in6p_faddr,
						   (int)inp->inp_fport, name, 0);
			} /* else nothing printed now */
#endif /* INET6 */
		} else {
			if (inp->inp_vflag & INP_IPV4) {
				inetprint(&inp->inp_laddr, (int)inp->inp_lport,
					  name, 0);
				if (!Lflag)
					inetprint(&inp->inp_faddr,
						  (int)inp->inp_fport, name,
						  inp->inp_lport !=
							inp->inp_fport);
			}
#ifdef INET6
			else if (inp->inp_vflag & INP_IPV6) {
				inet6print(&inp->in6p_laddr,
					   (int)inp->inp_lport, name, 0);
				if (!Lflag)
					inet6print(&inp->in6p_faddr,
						   (int)inp->inp_fport, name,
						   inp->inp_lport !=
							inp->inp_fport);
			} /* else nothing printed now */
#endif /* INET6 */
		}
		if (istcp && !Lflag) {
			if (tp->t_state < 0 || tp->t_state >= TCP_NSTATES)
				printf("%d", tp->t_state);
                      else {
				printf("%s", tcpstates[tp->t_state]);
#if defined(TF_NEEDSYN) && defined(TF_NEEDFIN)
                              /* Show T/TCP `hidden state' */
                              if (tp->t_flags & (TF_NEEDSYN|TF_NEEDFIN))
                                      putchar('*');
#endif /* defined(TF_NEEDSYN) && defined(TF_NEEDFIN) */
                      }
		}
		putchar('\n');
	}
	if (xig != oxig && xig->xig_gen != oxig->xig_gen) {
		if (oxig->xig_count > xig->xig_count) {
			printf("Some %s sockets may have been deleted.\n",
			       name);
		} else if (oxig->xig_count < xig->xig_count) {
			printf("Some %s sockets may have been created.\n",
			       name);
		} else {
			printf("Some %s sockets may have been created or deleted.\n",
			       name);
		}
	}
	free(buf);
}

/*
 * Dump TCP statistics structure.
 */
void
tcp_stats(u_long off __unused, const char *name, int af1 __unused)
{
	struct tcpstat tcpstat, zerostat;
	size_t len = sizeof tcpstat;
	
	if (zflag)
		memset(&zerostat, 0, len);
	if (sysctlbyname("net.inet.tcp.stats", &tcpstat, &len,
	    zflag ? &zerostat : NULL, zflag ? len : 0) < 0) {
		warn("sysctl: net.inet.tcp.stats");
		return;
	}

#ifdef INET6
	if (tcp_done != 0)
		return;
	else
		tcp_done = 1;
#endif

	printf ("%s:\n", name);

#define	p(f, m) if (tcpstat.f || sflag <= 1) \
    printf(m, tcpstat.f, plural(tcpstat.f))
#define	p1a(f, m) if (tcpstat.f || sflag <= 1) \
    printf(m, tcpstat.f)
#define	p2(f1, f2, m) if (tcpstat.f1 || tcpstat.f2 || sflag <= 1) \
    printf(m, tcpstat.f1, plural(tcpstat.f1), tcpstat.f2, plural(tcpstat.f2))
#define	p2a(f1, f2, m) if (tcpstat.f1 || tcpstat.f2 || sflag <= 1) \
    printf(m, tcpstat.f1, plural(tcpstat.f1), tcpstat.f2)
#define	p3(f, m) if (tcpstat.f || sflag <= 1) \
    printf(m, tcpstat.f, plurales(tcpstat.f))

	p(tcps_sndtotal, "\t%lu packet%s sent\n");
	p2(tcps_sndpack,tcps_sndbyte,
		"\t\t%lu data packet%s (%lu byte%s)\n");
	p2(tcps_sndrexmitpack, tcps_sndrexmitbyte,
		"\t\t%lu data packet%s (%lu byte%s) retransmitted\n");
	p(tcps_sndrexmitbad,
		"\t\t%lu data packet%s unnecessarily retransmitted\n");
	p(tcps_mturesent, "\t\t%lu resend%s initiated by MTU discovery\n");
	p2a(tcps_sndacks, tcps_delack,
		"\t\t%lu ack-only packet%s (%lu delayed)\n");
	p(tcps_sndurg, "\t\t%lu URG only packet%s\n");
	p(tcps_sndprobe, "\t\t%lu window probe packet%s\n");
	p(tcps_sndwinup, "\t\t%lu window update packet%s\n");
	p(tcps_sndctrl, "\t\t%lu control packet%s\n");
	p(tcps_rcvtotal, "\t%lu packet%s received\n");
	p2(tcps_rcvackpack, tcps_rcvackbyte, "\t\t%lu ack%s (for %lu byte%s)\n");
	p(tcps_rcvdupack, "\t\t%lu duplicate ack%s\n");
	p(tcps_rcvacktoomuch, "\t\t%lu ack%s for unsent data\n");
	p2(tcps_rcvpack, tcps_rcvbyte,
		"\t\t%lu packet%s (%lu byte%s) received in-sequence\n");
	p2(tcps_rcvduppack, tcps_rcvdupbyte,
		"\t\t%lu completely duplicate packet%s (%lu byte%s)\n");
	p(tcps_pawsdrop, "\t\t%lu old duplicate packet%s\n");
	p2(tcps_rcvpartduppack, tcps_rcvpartdupbyte,
		"\t\t%lu packet%s with some dup. data (%lu byte%s duped)\n");
	p2(tcps_rcvoopack, tcps_rcvoobyte,
		"\t\t%lu out-of-order packet%s (%lu byte%s)\n");
	p2(tcps_rcvpackafterwin, tcps_rcvbyteafterwin,
		"\t\t%lu packet%s (%lu byte%s) of data after window\n");
	p(tcps_rcvwinprobe, "\t\t%lu window probe%s\n");
	p(tcps_rcvwinupd, "\t\t%lu window update packet%s\n");
	p(tcps_rcvafterclose, "\t\t%lu packet%s received after close\n");
	p(tcps_rcvbadsum, "\t\t%lu discarded for bad checksum%s\n");
	p(tcps_rcvbadoff, "\t\t%lu discarded for bad header offset field%s\n");
	p1a(tcps_rcvshort, "\t\t%lu discarded because packet too short\n");
	p(tcps_connattempt, "\t%lu connection request%s\n");
	p(tcps_accepts, "\t%lu connection accept%s\n");
	p(tcps_badsyn, "\t%lu bad connection attempt%s\n");
	p(tcps_listendrop, "\t%lu listen queue overflow%s\n");
	p(tcps_badrst, "\t%lu ignored RSTs in the window%s\n");
	p(tcps_connects, "\t%lu connection%s established (including accepts)\n");
	p2(tcps_closed, tcps_drops,
		"\t%lu connection%s closed (including %lu drop%s)\n");
	p(tcps_cachedrtt, "\t\t%lu connection%s updated cached RTT on close\n");
	p(tcps_cachedrttvar, 
	  "\t\t%lu connection%s updated cached RTT variance on close\n");
	p(tcps_cachedssthresh,
	  "\t\t%lu connection%s updated cached ssthresh on close\n");
	p(tcps_conndrops, "\t%lu embryonic connection%s dropped\n");
	p2(tcps_rttupdated, tcps_segstimed,
		"\t%lu segment%s updated rtt (of %lu attempt%s)\n");
	p(tcps_rexmttimeo, "\t%lu retransmit timeout%s\n");
	p(tcps_timeoutdrop, "\t\t%lu connection%s dropped by rexmit timeout\n");
	p(tcps_persisttimeo, "\t%lu persist timeout%s\n");
	p(tcps_persistdrop, "\t\t%lu connection%s dropped by persist timeout\n");
	p(tcps_keeptimeo, "\t%lu keepalive timeout%s\n");
	p(tcps_keepprobe, "\t\t%lu keepalive probe%s sent\n");
	p(tcps_keepdrops, "\t\t%lu connection%s dropped by keepalive\n");
	p(tcps_predack, "\t%lu correct ACK header prediction%s\n");
	p(tcps_preddat, "\t%lu correct data packet header prediction%s\n");

	p(tcps_sc_added, "\t%lu syncache entrie%s added\n"); 
	p1a(tcps_sc_retransmitted, "\t\t%lu retransmitted\n"); 
	p1a(tcps_sc_dupsyn, "\t\t%lu dupsyn\n"); 
	p1a(tcps_sc_dropped, "\t\t%lu dropped\n"); 
	p1a(tcps_sc_completed, "\t\t%lu completed\n"); 
	p1a(tcps_sc_bucketoverflow, "\t\t%lu bucket overflow\n"); 
	p1a(tcps_sc_cacheoverflow, "\t\t%lu cache overflow\n"); 
	p1a(tcps_sc_reset, "\t\t%lu reset\n"); 
	p1a(tcps_sc_stale, "\t\t%lu stale\n"); 
	p1a(tcps_sc_aborted, "\t\t%lu aborted\n"); 
	p1a(tcps_sc_badack, "\t\t%lu badack\n"); 
	p1a(tcps_sc_unreach, "\t\t%lu unreach\n"); 
	p(tcps_sc_zonefail, "\t\t%lu zone failure%s\n"); 
	p(tcps_sc_sendcookie, "\t%lu cookie%s sent\n"); 
	p(tcps_sc_recvcookie, "\t%lu cookie%s received\n"); 

	p(tcps_sack_recovery_episode, "\t%lu SACK recovery episode%s\n"); 
	p(tcps_sack_rexmits,
		"\t%lu segment rexmit%s in SACK recovery episodes\n");
	p(tcps_sack_rexmit_bytes,
		"\t%lu byte rexmit%s in SACK recovery episodes\n"); 
	p(tcps_sack_rcv_blocks,
		"\t%lu SACK option%s (SACK blocks) received\n"); 
	p(tcps_sack_send_blocks, "\t%lu SACK option%s (SACK blocks) sent\n"); 

#undef p
#undef p1a
#undef p2
#undef p2a
#undef p3
}

/*
 * Dump UDP statistics structure.
 */
void
udp_stats(u_long off __unused, const char *name, int af1 __unused)
{
	struct udpstat udpstat, zerostat;
	size_t len = sizeof udpstat;
	u_long delivered;

	if (zflag)
		memset(&zerostat, 0, len);
	if (sysctlbyname("net.inet.udp.stats", &udpstat, &len,
	    zflag ? &zerostat : NULL, zflag ? len : 0) < 0) {
		warn("sysctl: net.inet.udp.stats");
		return;
	}

#ifdef INET6
	if (udp_done != 0)
		return;
	else
		udp_done = 1;
#endif

	printf("%s:\n", name);
#define	p(f, m) if (udpstat.f || sflag <= 1) \
    printf(m, udpstat.f, plural(udpstat.f))
#define	p1a(f, m) if (udpstat.f || sflag <= 1) \
    printf(m, udpstat.f)
	p(udps_ipackets, "\t%lu datagram%s received\n");
	p1a(udps_hdrops, "\t%lu with incomplete header\n");
	p1a(udps_badlen, "\t%lu with bad data length field\n");
	p1a(udps_badsum, "\t%lu with bad checksum\n");
	p1a(udps_nosum, "\t%lu with no checksum\n");
	p1a(udps_noport, "\t%lu dropped due to no socket\n");
	p(udps_noportbcast,
	    "\t%lu broadcast/multicast datagram%s dropped due to no socket\n");
	p1a(udps_fullsock, "\t%lu dropped due to full socket buffers\n");
	p1a(udpps_pcbhashmiss, "\t%lu not for hashed pcb\n");
	delivered = udpstat.udps_ipackets -
		    udpstat.udps_hdrops -
		    udpstat.udps_badlen -
		    udpstat.udps_badsum -
		    udpstat.udps_noport -
		    udpstat.udps_noportbcast -
		    udpstat.udps_fullsock;
	if (delivered || sflag <= 1)
		printf("\t%lu delivered\n", delivered);
	p(udps_opackets, "\t%lu datagram%s output\n");
#undef p
#undef p1a
}

/*
 * Dump IP statistics structure.
 */
void
ip_stats(u_long off __unused, const char *name, int af1 __unused)
{
	struct ipstat ipstat, zerostat;
	size_t len = sizeof ipstat;

	if (zflag)
		memset(&zerostat, 0, len);
	if (sysctlbyname("net.inet.ip.stats", &ipstat, &len,
	    zflag ? &zerostat : NULL, zflag ? len : 0) < 0) {
		warn("sysctl: net.inet.ip.stats");
		return;
	}

	printf("%s:\n", name);

#define	p(f, m) if (ipstat.f || sflag <= 1) \
    printf(m, ipstat.f, plural(ipstat.f))
#define	p1a(f, m) if (ipstat.f || sflag <= 1) \
    printf(m, ipstat.f)

	p(ips_total, "\t%lu total packet%s received\n");
	p(ips_badsum, "\t%lu bad header checksum%s\n");
	p1a(ips_toosmall, "\t%lu with size smaller than minimum\n");
	p1a(ips_tooshort, "\t%lu with data size < data length\n");
	p1a(ips_toolong, "\t%lu with ip length > max ip packet size\n");
	p1a(ips_badhlen, "\t%lu with header length < data size\n");
	p1a(ips_badlen, "\t%lu with data length < header length\n");
	p1a(ips_badoptions, "\t%lu with bad options\n");
	p1a(ips_badvers, "\t%lu with incorrect version number\n");
	p(ips_fragments, "\t%lu fragment%s received\n");
	p(ips_fragdropped, "\t%lu fragment%s dropped (dup or out of space)\n");
	p(ips_fragtimeout, "\t%lu fragment%s dropped after timeout\n");
	p(ips_reassembled, "\t%lu packet%s reassembled ok\n");
	p(ips_delivered, "\t%lu packet%s for this host\n");
	p(ips_noproto, "\t%lu packet%s for unknown/unsupported protocol\n");
	p(ips_forward, "\t%lu packet%s forwarded");
	p(ips_fastforward, " (%lu packet%s fast forwarded)");
	if (ipstat.ips_forward || sflag <= 1) 
		putchar('\n');
	p(ips_cantforward, "\t%lu packet%s not forwardable\n");
	p(ips_transit_re, "\t%lu packet%s forwarded to receive path\n");
	p(ips_notmember,
	  "\t%lu packet%s received for unknown multicast group\n");
	p(ips_redirectsent, "\t%lu redirect%s sent\n");
	p(ips_localout, "\t%lu packet%s sent from this host\n");
	p(ips_rawout, "\t%lu packet%s sent with fabricated ip header\n");
	p(ips_odropped,
	  "\t%lu output packet%s dropped due to no bufs, etc.\n");
	p(ips_noroute, "\t%lu output packet%s discarded due to no route\n");
	p(ips_fragmented, "\t%lu output datagram%s fragmented\n");
	p(ips_ofragments, "\t%lu fragment%s created\n");
	p(ips_cantfrag, "\t%lu datagram%s that can't be fragmented\n");
	p(ips_nogif, "\t%lu tunneling packet%s that can't find gif\n");
	p(ips_badaddr, "\t%lu datagram%s with bad address in header\n");
#undef p
#undef p1a
}

static	const char *icmpnames[] = {
	"echo reply",
	"#1",
	"#2",
	"destination unreachable",
	"source quench",
	"routing redirect",
	"#6",
	"#7",
	"echo",
	"router advertisement",
	"router solicitation",
	"time exceeded",
	"parameter problem",
	"time stamp",
	"time stamp reply",
	"information request",
	"information request reply",
	"address mask request",
	"address mask reply",
};

/*
 * Dump ICMP statistics.
 */
void
icmp_stats(u_long off __unused, const char *name, int af1 __unused)
{
	struct icmpstat icmpstat, zerostat;
	int i, first;
	int mib[4];		/* CTL_NET + PF_INET + IPPROTO_ICMP + req */
	size_t len;

	mib[0] = CTL_NET;
	mib[1] = PF_INET;
	mib[2] = IPPROTO_ICMP;
	mib[3] = ICMPCTL_STATS;

	len = sizeof icmpstat;
	if (zflag)
		memset(&zerostat, 0, len);
	if (sysctl(mib, 4, &icmpstat, &len,
	    zflag ? &zerostat : NULL, zflag ? len : 0) < 0) {
		warn("sysctl: net.inet.icmp.stats");
		return;
	}

	printf("%s:\n", name);

#define	p(f, m) if (icmpstat.f || sflag <= 1) \
    printf(m, icmpstat.f, plural(icmpstat.f))
#define	p1a(f, m) if (icmpstat.f || sflag <= 1) \
    printf(m, icmpstat.f)
#define	p2(f, m) if (icmpstat.f || sflag <= 1) \
    printf(m, icmpstat.f, plurales(icmpstat.f))

	p(icps_error, "\t%lu call%s to icmp_error\n");
	p(icps_oldicmp,
	    "\t%lu error%s not generated in response to an icmp message\n");
	for (first = 1, i = 0; i < ICMP_MAXTYPE + 1; i++)
		if (icmpstat.icps_outhist[i] != 0) {
			if (first) {
				printf("\tOutput histogram:\n");
				first = 0;
			}
			printf("\t\t%s: %lu\n", icmpnames[i],
				icmpstat.icps_outhist[i]);
		}
	p(icps_badcode, "\t%lu message%s with bad code fields\n");
	p(icps_tooshort, "\t%lu message%s < minimum length\n");
	p(icps_checksum, "\t%lu bad checksum%s\n");
	p(icps_badlen, "\t%lu message%s with bad length\n");
	p1a(icps_bmcastecho, "\t%lu multicast echo requests ignored\n");
	p1a(icps_bmcasttstamp, "\t%lu multicast timestamp requests ignored\n");
	for (first = 1, i = 0; i < ICMP_MAXTYPE + 1; i++)
		if (icmpstat.icps_inhist[i] != 0) {
			if (first) {
				printf("\tInput histogram:\n");
				first = 0;
			}
			printf("\t\t%s: %lu\n", icmpnames[i],
				icmpstat.icps_inhist[i]);
		}
	p(icps_reflect, "\t%lu message response%s generated\n");
	p2(icps_badaddr, "\t%lu invalid return address%s\n");
	p(icps_noroute, "\t%lu no return route%s\n");
#undef p
#undef p1a
#undef p2
	mib[3] = ICMPCTL_MASKREPL;
	len = sizeof i;
	if (sysctl(mib, 4, &i, &len, (void *)0, 0) < 0)
		return;
	printf("\tICMP address mask responses are %sabled\n", 
	       i ? "en" : "dis");
}

/*
 * Dump IGMP statistics structure.
 */
void
igmp_stats(u_long off __unused, const char *name, int af1 __unused)
{
	struct igmpstat igmpstat, zerostat;
	size_t len = sizeof igmpstat;

	if (zflag)
		memset(&zerostat, 0, len);
	if (sysctlbyname("net.inet.igmp.stats", &igmpstat, &len,
	    zflag ? &zerostat : NULL, zflag ? len : 0) < 0) {
		warn("sysctl: net.inet.igmp.stats");
		return;
	}

	printf("%s:\n", name);

#define	p(f, m) if (igmpstat.f || sflag <= 1) \
    printf(m, igmpstat.f, plural(igmpstat.f))
#define	py(f, m) if (igmpstat.f || sflag <= 1) \
    printf(m, igmpstat.f, igmpstat.f != 1 ? "ies" : "y")
	p(igps_rcv_total, "\t%u message%s received\n");
        p(igps_rcv_tooshort, "\t%u message%s received with too few bytes\n");
        p(igps_rcv_badsum, "\t%u message%s received with bad checksum\n");
        py(igps_rcv_queries, "\t%u membership quer%s received\n");
        py(igps_rcv_badqueries, "\t%u membership quer%s received with invalid field(s)\n");
        p(igps_rcv_reports, "\t%u membership report%s received\n");
        p(igps_rcv_badreports, "\t%u membership report%s received with invalid field(s)\n");
        p(igps_rcv_ourreports, "\t%u membership report%s received for groups to which we belong\n");
        p(igps_snd_reports, "\t%u membership report%s sent\n");
#undef p
#undef py
}

/*
 * Dump PIM statistics structure.
 */
void
pim_stats(u_long off __unused, const char *name, int af1 __unused)
{
	struct pimstat pimstat, zerostat;
	size_t len = sizeof pimstat;

	if (zflag)
		memset(&zerostat, 0, len);
	if (sysctlbyname("net.inet.pim.stats", &pimstat, &len,
	    zflag ? &zerostat : NULL, zflag ? len : 0) < 0) {
		if (errno != ENOENT)
			warn("sysctl: net.inet.pim.stats");
		return;
	}

	printf("%s:\n", name);

#define	p(f, m) if (pimstat.f || sflag <= 1) \
    printf(m, pimstat.f, plural(pimstat.f))
#define	py(f, m) if (pimstat.f || sflag <= 1) \
    printf(m, pimstat.f, pimstat.f != 1 ? "ies" : "y")
	p(pims_rcv_total_msgs, "\t%llu message%s received\n");
	p(pims_rcv_total_bytes, "\t%llu byte%s received\n");
	p(pims_rcv_tooshort, "\t%llu message%s received with too few bytes\n");
        p(pims_rcv_badsum, "\t%llu message%s received with bad checksum\n");
	p(pims_rcv_badversion, "\t%llu message%s received with bad version\n");
	p(pims_rcv_registers_msgs, "\t%llu data register message%s received\n");
	p(pims_rcv_registers_bytes, "\t%llu data register byte%s received\n");
	p(pims_rcv_registers_wrongiif, "\t%llu data register message%s received on wrong iif\n");
	p(pims_rcv_badregisters, "\t%llu bad register%s received\n");
	p(pims_snd_registers_msgs, "\t%llu data register message%s sent\n");
	p(pims_snd_registers_bytes, "\t%llu data register byte%s sent\n");
#undef p
#undef py
}

/*
 * Pretty print an Internet address (net address + port).
 */
void
inetprint(struct in_addr *in, int port, const char *proto, int num_port)
{
	struct servent *sp = 0;
	char line[80], *cp;
	int width;

	if (Wflag)
	    sprintf(line, "%s.", inetname(in));
	else
	    sprintf(line, "%.*s.", (Aflag && !num_port) ? 12 : 16, inetname(in));
	cp = index(line, '\0');
	if (!num_port && port)
		sp = getservbyport((int)port, proto);
	if (sp || port == 0)
		sprintf(cp, "%.15s ", sp ? sp->s_name : "*");
	else
		sprintf(cp, "%d ", ntohs((u_short)port));
	width = (Aflag && !Wflag) ? 18 : 22;
	if (Wflag)
	    printf("%-*s ", width, line);
	else
	    printf("%-*.*s ", width, width, line);
}

/*
 * Construct an Internet address representation.
 * If numeric_addr has been supplied, give
 * numeric value, otherwise try for symbolic name.
 */
char *
inetname(struct in_addr *inp)
{
	char *cp;
	static char line[MAXHOSTNAMELEN];
	struct hostent *hp;
	struct netent *np;

	cp = 0;
	if (!numeric_addr && inp->s_addr != INADDR_ANY) {
		int net = inet_netof(*inp);
		int lna = inet_lnaof(*inp);

		if (lna == INADDR_ANY) {
			np = getnetbyaddr(net, AF_INET);
			if (np)
				cp = np->n_name;
		}
		if (cp == 0) {
			hp = gethostbyaddr((char *)inp, sizeof (*inp), AF_INET);
			if (hp) {
				cp = hp->h_name;
				trimdomain(cp, strlen(cp));
			}
		}
	}
	if (inp->s_addr == INADDR_ANY)
		strcpy(line, "*");
	else if (cp) {
		strncpy(line, cp, sizeof(line) - 1);
		line[sizeof(line) - 1] = '\0';
	} else {
		inp->s_addr = ntohl(inp->s_addr);
#define C(x)	((u_int)((x) & 0xff))
		sprintf(line, "%u.%u.%u.%u", C(inp->s_addr >> 24),
		    C(inp->s_addr >> 16), C(inp->s_addr >> 8), C(inp->s_addr));
	}
	return (line);
}

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="inet.c.diff"

--- inet.org.c	Mon Dec 27 01:48:58 2004
+++ inet.c	Sun Dec 26 22:33:20 2004
@@ -38,7 +38,7 @@
 #endif
 
 #include <sys/cdefs.h>
-__FBSDID("$FreeBSD: /repoman/r/ncvs/src/usr.bin/netstat/inet.c,v 1.67 2004/07/26 20:18:11 charnier Exp $");
+__FBSDID("$FreeBSD: src/usr.bin/netstat/inet.c,v 1.67 2004/07/26 20:18:11 charnier Exp $");
 
 #include <sys/param.h>
 #include <sys/queue.h>
@@ -569,6 +569,7 @@
 	if (ipstat.ips_forward || sflag <= 1) 
 		putchar('\n');
 	p(ips_cantforward, "\t%lu packet%s not forwardable\n");
+	p(ips_transit_re, "\t%lu packet%s forwarded to receive path\n");
 	p(ips_notmember,
 	  "\t%lu packet%s received for unknown multicast group\n");
 	p(ips_redirectsent, "\t%lu redirect%s sent\n");

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ip_fastfwd.c"

/*
 * Copyright (c) 2003 Andre Oppermann, Internet Business Solutions AG
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. The name of the author may not be used to endorse or promote
 *    products derived from this software without specific prior written
 *    permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 * $FreeBSD: src/sys/netinet/ip_fastfwd.c,v 1.17.2.3 2004/10/03 17:04:40 mlaier Exp $
 * $Wolfowitz: snap5d/src/sys/netinet/apc_ip_fastfwd.c,v 1.35.2 2004/12/04 15:32:21 jenkins Exp $
 * $Wolfowitz: freebsd5/src/sys/netinet/ip_fastfwd.c,v 1.18.0.3 2004/12/15 17:04:40 blahdy Exp $
 */

/*
 * ip_fastforward gets its speed from processing the forwarded packet to
 * completion (if_output on the other side) without any queues or netisr's.
 * The receiving interface DMAs the packet into memory, the upper half of
 * driver calls ip_fastforward, we do our routing table lookup and directly
 * send it off to the outgoing interface which DMAs the packet to the
 * network card. The only part of the packet we touch with the CPU is the
 * IP header (unless there are complex firewall rules touching other parts
 * of the packet, but that is up to you). We are essentially limited by bus
 * bandwidth and how fast the network card/driver can set up receives and
 * transmits.
 *
 * We handle basic errors, ip header errors, checksum errors,
 * destination unreachable, fragmentation and fragmentation needed and
 * report them via icmp to the sender.
 *
 * Else if something is not pure IPv4 unicast forwarding we fall back to
 * the normal ip_input processing path. We should only be called from
 * interfaces connected to the outside world.
 *
 * Firewalling is fully supported including divert, ipfw fwd and ipfilter
 * ipnat and address rewrite.
 *
 * IPSEC is not supported if this host is a tunnel broker. IPSEC is
 * supported for connections to/from local host.
 *
 * We try to do the least expensive (in CPU ops) checks and operations
 * first to catch junk with as little overhead as possible.
 * 
 * We take full advantage of hardware support for ip checksum and
 * fragmentation offloading.
 *
 * We don't do ICMP redirect in the fast forwarding path. I have had my own
 * cases where two core routers with Zebra routing suite would send millions
 * ICMP redirects to connected hosts if the router to dest was not the default
 * gateway. In one case it was filling the routing table of a host with close
 * 300'000 cloned redirect entries until it ran out of kernel memory. However
 * the networking code proved very robust and it didn't crash or went ill
 * otherwise.
 */

/*
 * Many thanks to Matt Thomas of NetBSD for basic structure of ip_flow.c which
 * is being followed here.
 */

#include "opt_ipfw.h"
#include "opt_ipstealth.h"

#include <sys/param.h>
#include <sys/systm.h>
#include <sys/kernel.h>
#include <sys/malloc.h>
#include <sys/mbuf.h>
#include <sys/protosw.h>
#include <sys/socket.h>
#include <sys/sysctl.h>

#include <net/pfil.h>
#include <net/if.h>
#include <net/if_types.h>
#include <net/if_var.h>
#include <net/if_dl.h>
#include <net/route.h>
/* include <net/fib.h> */

#include <netinet/in.h>
#include <netinet/in_systm.h>
#include <netinet/in_var.h>
#include <netinet/ip.h>
#include <netinet/ip_var.h>
#include <netinet/ip_icmp.h>

#include <machine/in_cksum.h>

static int ipfastforward_active = 0;
SYSCTL_INT(_net_inet_ip, OID_AUTO, fastforwarding, CTLFLAG_RW,
    &ipfastforward_active, 0, "Enable fast IP forwarding");

static struct sockaddr_in *
ip_findroute(struct route *ro, struct in_addr dest, struct mbuf *m)
{
	struct sockaddr_in *dst;
	struct rtentry *rt;
 	/* struct mtrie *mt; */

	/*
	 * Find route to destination.
	 */
	bzero(ro, sizeof(*ro));
	dst = (struct sockaddr_in *)&ro->ro_dst;
	dst->sin_family = AF_INET;
	dst->sin_len = sizeof(*dst);
	dst->sin_addr.s_addr = dest.s_addr;
	rtalloc_ign(ro, RTF_CLONING);
	/* fiballoc(pfx, mt); */

	/*
	 * Prefix there and valid adjacency?
	 */
	rt = ro->ro_rt;
	if (rt && (rt->rt_flags & RTF_UP) &&
	    (rt->rt_ifp->if_flags & IFF_UP) &&
	    (rt->rt_ifp->if_flags & IFF_RUNNING)) {
		if (rt->rt_flags & RTF_GATEWAY)
			dst = (struct sockaddr_in *)rt->rt_gateway;
	} else {
		ipstat.ips_noroute++;
		ipstat.ips_cantforward++;
		if (rt)
			RTFREE(rt);

		/*
		 * The old ip_fastforward() violated RFC1812 by responding
		 * with !H instead of !N when there is no destination 
		 * route found. Behaviors observed from both Cisco Cat6509/Sup720
		 * and Juniper M20 result in !N (correctly complying to
		 * RFC1812) when there is no route available. --james 2004/09/17
		 */
		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NET, 0, NULL);
		return NULL;
	}
	return dst;
}

/*
 * Try to forward a packet based on the destination address.
 * This is a fast path optimized for the plain forwarding case.
 * If the packet is handled (and consumed) here then we return 1;
 * otherwise 0 is returned and the packet should be delivered
 * to ip_input for full processing.
 */
int
ip_fastforward(struct mbuf *m)
{
	struct ip *ip;
	struct mbuf *m0 = NULL;
	struct route ro;
	/* struct fentry *pfx = NULL;  */
	struct sockaddr_in *dst = NULL;
	struct ifnet *ifp;
	struct in_addr odest, dest;
	u_short sum, ip_len;
	int error = 0;
	int hlen, mtu;
#ifdef IPFIREWALL_FORWARD
	struct m_tag *fwd_tag;
#endif

	/*
	 * Are we active and forwarding packets?
	 */
	if (!ipfastforward_active || !ipforwarding)
		return 0;

	M_ASSERTVALID(m);
	M_ASSERTPKTHDR(m);

	ro.ro_rt = NULL;

	/*
	 * Step 1: check for packet drop conditions (and sanity checks)
	 */

	ipstat.ips_total++;

	/*
	 * Is entire packet big enough?
	 */
	if (m->m_pkthdr.len < sizeof(struct ip)) {
		ipstat.ips_tooshort++;
		goto drop;
	}

	/*
	 * Is first mbuf large enough for ip header and is header present?
	 */
	if (m->m_len < sizeof (struct ip) &&
	   (m = m_pullup(m, sizeof (struct ip))) == 0) {
		ipstat.ips_toosmall++;
		return 1;
	}

	ip = mtod(m, struct ip *);

	/*
	 * Is it IPv4?
	 */
	if (ip->ip_v != IPVERSION) {
		ipstat.ips_badvers++;
		goto drop;
	}

	/*
	 * Is IP header length correct and is it in first mbuf?
	 */
	hlen = ip->ip_hl << 2;
	if (hlen < sizeof(struct ip)) {	/* minimum header length */
		ipstat.ips_badlen++;
		goto drop;
	}
	if (hlen > m->m_len) {
		if ((m = m_pullup(m, hlen)) == 0) {
			ipstat.ips_badhlen++;
			return 1;
		}
		ip = mtod(m, struct ip *);
	}

	/*
	 * Checksum correct?
	 */
	if (m->m_pkthdr.csum_flags & CSUM_IP_CHECKED)
		sum = !(m->m_pkthdr.csum_flags & CSUM_IP_VALID);
	else {
		if (hlen == sizeof(struct ip))
			sum = in_cksum_hdr(ip);
		else
			sum = in_cksum(m, hlen);
	}
	if (sum) {
		ipstat.ips_badsum++;
		goto drop;
	}
	m->m_pkthdr.csum_flags |= (CSUM_IP_CHECKED | CSUM_IP_VALID);

	ip_len = ntohs(ip->ip_len);

	/*
	 * Is IP length longer than packet we have got?
	 */
	if (m->m_pkthdr.len < ip_len) {
		ipstat.ips_tooshort++;
		goto drop;
	}

	/*
	 * Is packet longer than IP header tells us? If yes, truncate packet.
	 */
	if (m->m_pkthdr.len > ip_len) {
		if (m->m_len == m->m_pkthdr.len) {
			m->m_len = ip_len;
			m->m_pkthdr.len = ip_len;
		} else
			m_adj(m, ip_len - m->m_pkthdr.len);
	}

	/*
	 * Is packet from or to 127/8?
	 */
	if ((ntohl(ip->ip_dst.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET ||
	    (ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET) {
		ipstat.ips_badaddr++;
		goto drop;
	}

#ifdef ALTQ
	/*
	 * Is packet dropped by traffic conditioner?
	 */
	if (altq_input != NULL && (*altq_input)(m, AF_INET) == 0)
		return 1;
#endif

	/*
	 * Step 2: fallback conditions to normal ip_input path processing
	 */

	/*
	 * Only IP packets without options
	 */
	if (ip->ip_hl != (sizeof(struct ip) >> 2)) {
		if (ip_doopts == 1){
			goto prercvpath;
		} else if (ip_doopts == 2) {
			icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_FILTER_PROHIB,
				0, NULL);
			return 1;
		}
		/* else ignore IP options and continue */
	}

	/*
	 * Only unicast IP, not from loopback, no L2 or IP broadcast,
	 * no multicast, no INADDR_ANY
	 *
	 * XXX: Probably some of these checks could be direct drop
	 * conditions.  However it is not clear whether there are some
	 * hacks or obscure behaviours which make it neccessary to
	 * let ip_input handle it.  We play safe here and let ip_input
	 * deal with it until it is proven that we can directly drop it.
	 *
	 * If packet originated from loopback interface, don't even
	 * bother with receive path. Receive acl must only validate
	 * "From-Wire -> To-ControlPlane" destined traffic, not the
	 * packets we created on our own.
	 */
	if (m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) 
		return 0;  

	if (ntohl(ip->ip_src.s_addr) == (u_long)INADDR_BROADCAST ||
	    ntohl(ip->ip_dst.s_addr) == (u_long)INADDR_BROADCAST ||
	    IN_MULTICAST(ntohl(ip->ip_src.s_addr)) ||
	    IN_MULTICAST(ntohl(ip->ip_dst.s_addr)) ||
	    ip->ip_dst.s_addr == INADDR_ANY )
		goto prercvpath;


	/*
	 * Step 3: incoming packet firewall processing
	 */

	/*
	 * Convert to host representation
	 */
	ip->ip_len = ntohs(ip->ip_len);
	ip->ip_off = ntohs(ip->ip_off);

	odest.s_addr = dest.s_addr = ip->ip_dst.s_addr;

	/*
	 * Run through list of ipfilter hooks for input packets
	 */
	if (inet_pfil_hook.ph_busy_count == -1)
		goto passin;

	if (pfil_run_hooks(&inet_pfil_hook, &m, m->m_pkthdr.rcvif, PFIL_IN, NULL) ||
	    m == NULL)
		return 1;

	M_ASSERTVALID(m);
	M_ASSERTPKTHDR(m);

	ip = mtod(m, struct ip *);	/* m may have changed by pfil hook */
	dest.s_addr = ip->ip_dst.s_addr;

passin:
	/*
	 * Step 4: Look up and analyze route then decrement TTL.
	 */

	/*
	 * Find route to destination.
	 * Note: If firewall call above changed destination to another
	 * address, lookup of kernel RIB will be acted upon the new
	 * destination address -- hence saving us a hash lookup here.
	 */
	if ((dst = ip_findroute(&ro, dest, m)) == NULL)
		return 1;	/* icmp unreach already sent */
	ifp = ro.ro_rt->rt_ifp;

	/*
	 * Destination address changed by firewall? (policy routing)
	 */
	if (odest.s_addr != dest.s_addr) {
		/*
		 * Is the new destination for a local address on this host?
		 */
		if (ro.ro_rt->rt_flags & RTF_LOCAL)
			goto forwardlocal;
		/*
		 * Go on with new destination address
		 */
	}
#ifdef IPFIREWALL_FORWARD
	if (m->m_flags & M_FASTFWD_OURS) {
		/*
		 * ipfw changed it for a local address on this host.
		 */
		goto forwardlocal;
	}
#endif /* IPFIREWALL_FORWARD */

	/*
	 * Is packet destined to us or broadcast address(es)?
	 * SIOCSIFADDR installs /32 lo0 routes so let's check if
	 * this is a route that is bound to loopback.
	 */
	if (ro.ro_rt->rt_flags & RTF_LOCAL)
		goto rcvpath;

	/*
	 * Drop blackhole and reject routes while we are in the
	 * fast forwarding path.
	 */
	if (ro.ro_rt->rt_flags & RTF_BLACKHOLE)
		goto drop;

	/*
	 * XXX Need L2 info off the kernel routing table.. This is a
	 * makeshift kludge, so please use 2nd consideration before
	 * committing the line below into main cvs tree.
	 *
	 * Administratively installed reject routes should have 
	 * rmx_expire unset.
	 */
	if ((ro.ro_rt->rt_flags & RTF_REJECT) && 
            ro.ro_rt->rt_rmx.rmx_expire == 0){
		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NET, 0, NULL);
		goto consumed;
	}

	/*
	 * Check TTL
	 */
#ifdef IPSTEALTH
	if (!ipstealth) {
#endif
	if (ip->ip_ttl <= IPTTLDEC) {
		icmp_error(m, ICMP_TIMXCEED, ICMP_TIMXCEED_INTRANS, 0, NULL);
		goto consumed;
	}

	/*
	 * Decrement the TTL and incrementally change the checksum.
	 * Don't bother doing this with hw checksum offloading.
	 */
	ip->ip_ttl -= IPTTLDEC;
	if (ip->ip_sum >= (u_int16_t) ~htons(IPTTLDEC << 8))
		ip->ip_sum -= ~htons(IPTTLDEC << 8);
	else
		ip->ip_sum += htons(IPTTLDEC << 8);
#ifdef IPSTEALTH
	}
#endif

	/*
	 * Step 5: outgoing firewall packet processing
	 */

	/*
	 * Run through list of hooks for output packets.
	 */
	if (inet_pfil_hook.ph_busy_count == -1)
		goto passout;

	if (pfil_run_hooks(&inet_pfil_hook, &m, ifp, PFIL_OUT, NULL) || m == NULL) {
		goto consumed;
	}

	M_ASSERTVALID(m);
	M_ASSERTPKTHDR(m);

	ip = mtod(m, struct ip *);
	dest.s_addr = ip->ip_dst.s_addr;

	/*
	 * Destination address changed?
	 */
#ifndef IPFIREWALL_FORWARD
	if (odest.s_addr != dest.s_addr) {
#else
	fwd_tag = m_tag_find(m, PACKET_TAG_IPFORWARD, NULL);
	if (odest.s_addr != dest.s_addr || fwd_tag != NULL) {
#endif /* IPFIREWALL_FORWARD */
		/*
		 * Is it now for a local address on this host?
		 *
		 * We'll simply rely on in_localip() to determine whether
		 * address is destined to us this time around -- because
		 * I really don't think running radix lookup two more
		 * times in the outbound sections will outperform hash
		 * lookup of system interface addrs.
		 *
		 * In the above ingress checks, we were able to get rid
		 * of a hash lookup (in_localip() call that is) because
		 * we are doing a radix lookup after the initial firewall
		 * operation.
		 */
#ifndef IPFIREWALL_FORWARD
		if (in_localip(dest)) {
#else
		if (in_localip(dest) || m->m_flags & M_FASTFWD_OURS) {
#endif /* IPFIREWALL_FORWARD */
forwardlocal:
			/*
			 * Return packet for processing by ip_input().
			 * Keep host byte order as expected at ip_input's
			 * "ours"-label.
			 */
			m->m_flags |= M_FASTFWD_OURS;
			goto rcvpath;
		}
		/*
		 * Redo route lookup with new destination address
		 */
#ifdef IPFIREWALL_FORWARD
		if (fwd_tag) {
			if (!in_localip(ip->ip_src) && !in_localaddr(ip->ip_dst))
				dest.s_addr = ((struct sockaddr_in *)(fwd_tag+1))->sin_addr.s_addr;
			m_tag_delete(m, fwd_tag);
		}
#endif /* IPFIREWALL_FORWARD */
		RTFREE(ro.ro_rt);
		if ((dst = ip_findroute(&ro, dest, m)) == NULL)
			return 1;	/* icmp unreach already sent */
		ifp = ro.ro_rt->rt_ifp;
	}

passout:
	/*
	 * Step 6: send off the packet
	 */

#ifndef ALTQ
	/*
	 * Check if there is enough space in the interface queue
	 */
	if ((ifp->if_snd.ifq_len + ip->ip_len / ifp->if_mtu + 1) >=
	    ifp->if_snd.ifq_maxlen) {
		ipstat.ips_odropped++;
		/* would send source quench here but that is depreciated */
		goto drop;
	}
#endif

	/*
	 * Check if media link state of interface is not down
	 */
	if (ifp->if_link_state == LINK_STATE_DOWN) {
		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, NULL);
		goto consumed;
	}

	/*
	 * Check if packet fits MTU or if hardware will fragement for us
	 */
	if (ro.ro_rt->rt_rmx.rmx_mtu)
		mtu = min(ro.ro_rt->rt_rmx.rmx_mtu, ifp->if_mtu);
	else
		mtu = ifp->if_mtu;

	if (ip->ip_len <= mtu ||
	    (ifp->if_hwassist & CSUM_FRAGMENT && (ip->ip_off & IP_DF) == 0)) {
		/*
		 * Restore packet header fields to original values
		 */
		ip->ip_len = htons(ip->ip_len);
		ip->ip_off = htons(ip->ip_off);
		/*
		 * Send off the packet via outgoing interface
		 */
		error = (*ifp->if_output)(ifp, m,
				(struct sockaddr *)dst, ro.ro_rt);
	} else {
		/*
		 * Handle EMSGSIZE with icmp reply needfrag for TCP MTU discovery
		 */
		if (ip->ip_off & IP_DF) {
			ipstat.ips_cantfrag++;
			icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NEEDFRAG,
				0, ifp);
			goto consumed;
		} else {
			/*
			 * We have to fragement the packet
			 */
			m->m_pkthdr.csum_flags |= CSUM_IP;
			/*
			 * ip_fragment expects ip_len and ip_off in host byte
			 * order but returns all packets in network byte order
			 */
			if (ip_fragment(ip, &m, mtu, ifp->if_hwassist,
					(~ifp->if_hwassist & CSUM_DELAY_IP))) {
				goto drop;
			}
			KASSERT(m != NULL, ("null mbuf and no error"));
			/*
			 * Send off the fragments via outgoing interface
			 */
			error = 0;
			do {
				m0 = m->m_nextpkt;
				m->m_nextpkt = NULL;

				error = (*ifp->if_output)(ifp, m,
					(struct sockaddr *)dst, ro.ro_rt);
				if (error)
					break;
			} while ((m = m0) != NULL);
			if (error) {
				/* Reclaim remaining fragments */
				for (; m; m = m0) {
					m0 = m->m_nextpkt;
					m->m_nextpkt = NULL;
					m_freem(m);
				}
			} else
				ipstat.ips_fragmented++;
		}
	}

	if (error != 0)
		ipstat.ips_odropped++;
	else {
		ipstat.ips_forward++;
		ipstat.ips_fastforward++;
	}
consumed:
	RTFREE(ro.ro_rt);
	return 1;
prercvpath:
	/*
	 * Convert to host representation
	 */
	ip->ip_len = ntohs(ip->ip_len);
	ip->ip_off = ntohs(ip->ip_off);

	odest.s_addr = dest.s_addr = ip->ip_dst.s_addr;
rcvpath:
	/*
	 * Receive adjacency. If the packet needs to be punted up to
	 * ip_input path for further analysis or because it is destined to
	 * one of our own addresses, run it through the receive-path
	 * firewall. To actually use this, the user must set up a firewall
	 * rule using pf(4), ipfw(2), etc that checks on lo0 interface
	 * under INBOUND direction (e.g. `<action> in quick on lo0` in pf)
	 *
	 * Cisco calls this Receive Path ACL, Juniper calls this Loopback
	 * Filter. The fact that this is FreeBSD makes us behave like
	 * Juniper (filtering on lo0) instead of Cisco (filtering via
	 * "ip receive <acl number>" command).  --james 2004/10/23
	 */

	/*
	 * Set coordinates to loopback interface, inbound direction,
	 * then call in the pfil_hooks.
	 */

	if (ro.ro_rt)
	  RTFREE(ro.ro_rt);

	if (inet_pfil_hook.ph_busy_count == -1)
		goto punt;

	if (pfil_run_hooks(&inet_pfil_hook, &m, loif, PFIL_IN, NULL) ||
	    m == NULL)
		return 1;

	ip = mtod(m, struct ip *);	/* m may have changed by pfil hook */
	dest.s_addr = ip->ip_dst.s_addr;

	/* We do not support policy routing inside the receive path.
	 * If the user requests it, drop the packet. Ensure that this
	 * is documented in the user manual.
	 */
	if (odest.s_addr != dest.s_addr) 
		goto drop;

punt:
	/* 
	 * Packet has been pre-processed by ip_fastforward for 
	 * control plane evaluations.
	 */
	m->m_flags |= M_FASTFWD_PREPROC;

	ipstat.ips_transit_re++;
	return 0;
drop:
	if (m)
		m_freem(m);
	if (ro.ro_rt)
		RTFREE(ro.ro_rt);
	return 1;
}

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ip_fastfwd.c.diff"

--- ip_fastfwd.org.c	Mon Dec 27 01:42:27 2004
+++ ip_fastfwd.c	Sun Dec 26 22:33:15 2004
@@ -26,7 +26,9 @@
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
- * $FreeBSD: /repoman/r/ncvs/src/sys/netinet/ip_fastfwd.c,v 1.25 2004/11/09 09:40:32 andre Exp $
+ * $FreeBSD: src/sys/netinet/ip_fastfwd.c,v 1.17.2.3 2004/10/03 17:04:40 mlaier Exp $
+ * $Wolfowitz: snap5d/src/sys/netinet/apc_ip_fastfwd.c,v 1.35.2 2004/12/04 15:32:21 jenkins Exp $
+ * $Wolfowitz: freebsd5/src/sys/netinet/ip_fastfwd.c,v 1.18.0.3 2004/12/15 17:04:40 blahdy Exp $
  */
 
 /*
@@ -93,6 +95,7 @@
 #include <net/if_var.h>
 #include <net/if_dl.h>
 #include <net/route.h>
+/* include <net/fib.h> */
 
 #include <netinet/in.h>
 #include <netinet/in_systm.h>
@@ -112,6 +115,7 @@
 {
 	struct sockaddr_in *dst;
 	struct rtentry *rt;
+ 	/* struct mtrie *mt; */
 
 	/*
 	 * Find route to destination.
@@ -122,9 +126,10 @@
 	dst->sin_len = sizeof(*dst);
 	dst->sin_addr.s_addr = dest.s_addr;
 	rtalloc_ign(ro, RTF_CLONING);
+	/* fiballoc(pfx, mt); */
 
 	/*
-	 * Route there and interface still up?
+	 * Prefix there and valid adjacency?
 	 */
 	rt = ro->ro_rt;
 	if (rt && (rt->rt_flags & RTF_UP) &&
@@ -137,7 +142,15 @@
 		ipstat.ips_cantforward++;
 		if (rt)
 			RTFREE(rt);
-		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, NULL);
+
+		/*
+		 * The old ip_fastforward() violated RFC1812 by responding
+		 * with !H instead of !N when there is no destination 
+		 * route found. Behaviors observed from both Cisco Cat6509/Sup720
+		 * and Juniper M20 result in !N (correctly complying to
+		 * RFC1812) when there is no route available. --james 2004/09/17
+		 */
+		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NET, 0, NULL);
 		return NULL;
 	}
 	return dst;
@@ -156,9 +169,8 @@
 	struct ip *ip;
 	struct mbuf *m0 = NULL;
 	struct route ro;
+	/* struct fentry *pfx = NULL;  */
 	struct sockaddr_in *dst = NULL;
-	struct in_ifaddr *ia = NULL;
-	struct ifaddr *ifa = NULL;
 	struct ifnet *ifp;
 	struct in_addr odest, dest;
 	u_short sum, ip_len;
@@ -183,6 +195,8 @@
 	 * Step 1: check for packet drop conditions (and sanity checks)
 	 */
 
+	ipstat.ips_total++;
+
 	/*
 	 * Is entire packet big enough?
 	 */
@@ -195,9 +209,9 @@
 	 * Is first mbuf large enough for ip header and is header present?
 	 */
 	if (m->m_len < sizeof (struct ip) &&
-	   (m = m_pullup(m, sizeof (struct ip))) == NULL) {
+	   (m = m_pullup(m, sizeof (struct ip))) == 0) {
 		ipstat.ips_toosmall++;
-		return 1;	/* mbuf already free'd */
+		return 1;
 	}
 
 	ip = mtod(m, struct ip *);
@@ -241,10 +255,6 @@
 		ipstat.ips_badsum++;
 		goto drop;
 	}
-
-	/*
-	 * Remeber that we have checked the IP header and found it valid.
-	 */
 	m->m_pkthdr.csum_flags |= (CSUM_IP_CHECKED | CSUM_IP_VALID);
 
 	ip_len = ntohs(ip->ip_len);
@@ -293,9 +303,9 @@
 	 * Only IP packets without options
 	 */
 	if (ip->ip_hl != (sizeof(struct ip) >> 2)) {
-		if (ip_doopts == 1)
-			return 0;
-		else if (ip_doopts == 2) {
+		if (ip_doopts == 1){
+			goto prercvpath;
+		} else if (ip_doopts == 2) {
 			icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_FILTER_PROHIB,
 				0, NULL);
 			return 1;
@@ -312,38 +322,22 @@
 	 * hacks or obscure behaviours which make it neccessary to
 	 * let ip_input handle it.  We play safe here and let ip_input
 	 * deal with it until it is proven that we can directly drop it.
+	 *
+	 * If packet originated from loopback interface, don't even
+	 * bother with receive path. Receive acl must only validate
+	 * "From-Wire -> To-ControlPlane" destined traffic, not the
+	 * packets we created on our own.
 	 */
-	if ((m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) ||
-	    ntohl(ip->ip_src.s_addr) == (u_long)INADDR_BROADCAST ||
+	if (m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) 
+		return 0;  
+
+	if (ntohl(ip->ip_src.s_addr) == (u_long)INADDR_BROADCAST ||
 	    ntohl(ip->ip_dst.s_addr) == (u_long)INADDR_BROADCAST ||
 	    IN_MULTICAST(ntohl(ip->ip_src.s_addr)) ||
 	    IN_MULTICAST(ntohl(ip->ip_dst.s_addr)) ||
 	    ip->ip_dst.s_addr == INADDR_ANY )
-		return 0;
-
-	/*
-	 * Is it for a local address on this host?
-	 */
-	if (in_localip(ip->ip_dst))
-		return 0;
+		goto prercvpath;
 
-	/*
-	 * Or is it for a local IP broadcast address on this host?
-	 */
-	if ((m->m_flags & M_BCAST) &&
-	    (m->m_pkthdr.rcvif->if_flags & IFF_BROADCAST)) {
-	        TAILQ_FOREACH(ifa, &m->m_pkthdr.rcvif->if_addrhead, ifa_link) {
-			if (ifa->ifa_addr->sa_family != AF_INET)
-				continue;
-			ia = ifatoia(ifa);
-			if (ia->ia_netbroadcast.s_addr == ip->ip_dst.s_addr)
-				return 0;
-			if (satosin(&ia->ia_broadaddr)->sin_addr.s_addr ==
-			    ip->ip_dst.s_addr)
-				return 0;
-		}
-	}
-	ipstat.ips_total++;
 
 	/*
 	 * Step 3: incoming packet firewall processing
@@ -373,14 +367,29 @@
 	ip = mtod(m, struct ip *);	/* m may have changed by pfil hook */
 	dest.s_addr = ip->ip_dst.s_addr;
 
+passin:
 	/*
-	 * Destination address changed?
+	 * Step 4: Look up and analyze route then decrement TTL.
+	 */
+
+	/*
+	 * Find route to destination.
+	 * Note: If firewall call above changed destination to another
+	 * address, lookup of kernel RIB will be acted upon the new
+	 * destination address -- hence saving us a hash lookup here.
+	 */
+	if ((dst = ip_findroute(&ro, dest, m)) == NULL)
+		return 1;	/* icmp unreach already sent */
+	ifp = ro.ro_rt->rt_ifp;
+
+	/*
+	 * Destination address changed by firewall? (policy routing)
 	 */
 	if (odest.s_addr != dest.s_addr) {
 		/*
-		 * Is it now for a local address on this host?
+		 * Is the new destination for a local address on this host?
 		 */
-		if (in_localip(dest))
+		if (ro.ro_rt->rt_flags & RTF_LOCAL)
 			goto forwardlocal;
 		/*
 		 * Go on with new destination address
@@ -395,10 +404,34 @@
 	}
 #endif /* IPFIREWALL_FORWARD */
 
-passin:
 	/*
-	 * Step 4: decrement TTL and look up route
+	 * Is packet destined to us or broadcast address(es)?
+	 * SIOCSIFADDR installs /32 lo0 routes so let's check if
+	 * this is a route that is bound to loopback.
 	 */
+	if (ro.ro_rt->rt_flags & RTF_LOCAL)
+		goto rcvpath;
+
+	/*
+	 * Drop blackhole and reject routes while we are in the
+	 * fast forwarding path.
+	 */
+	if (ro.ro_rt->rt_flags & RTF_BLACKHOLE)
+		goto drop;
+
+	/*
+	 * XXX Need L2 info off the kernel routing table.. This is a
+	 * makeshift kludge, so please use 2nd consideration before
+	 * committing the line below into main cvs tree.
+	 *
+	 * Administratively installed reject routes should have 
+	 * rmx_expire unset.
+	 */
+	if ((ro.ro_rt->rt_flags & RTF_REJECT) && 
+            ro.ro_rt->rt_rmx.rmx_expire == 0){
+		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NET, 0, NULL);
+		goto consumed;
+	}
 
 	/*
 	 * Check TTL
@@ -408,13 +441,12 @@
 #endif
 	if (ip->ip_ttl <= IPTTLDEC) {
 		icmp_error(m, ICMP_TIMXCEED, ICMP_TIMXCEED_INTRANS, 0, NULL);
-		return 1;
+		goto consumed;
 	}
 
 	/*
-	 * Decrement the TTL and incrementally change the IP header checksum.
-	 * Don't bother doing this with hw checksum offloading, it's faster
-	 * doing it right here.
+	 * Decrement the TTL and incrementally change the checksum.
+	 * Don't bother doing this with hw checksum offloading.
 	 */
 	ip->ip_ttl -= IPTTLDEC;
 	if (ip->ip_sum >= (u_int16_t) ~htons(IPTTLDEC << 8))
@@ -426,19 +458,6 @@
 #endif
 
 	/*
-	 * Find route to destination.
-	 */
-	if ((dst = ip_findroute(&ro, dest, m)) == NULL)
-		return 1;	/* icmp unreach already sent */
-	ifp = ro.ro_rt->rt_ifp;
-
-	/*
-	 * Immediately drop blackholed traffic.
-	 */
-	if (ro.ro_rt->rt_flags & RTF_BLACKHOLE)
-		goto drop;
-
-	/*
 	 * Step 5: outgoing firewall packet processing
 	 */
 
@@ -469,11 +488,22 @@
 #endif /* IPFIREWALL_FORWARD */
 		/*
 		 * Is it now for a local address on this host?
+		 *
+		 * We'll simply rely on in_localip() to determine whether
+		 * address is destined to us this time around -- because
+		 * I really don't think running radix lookup two more
+		 * times in the outbound sections will outperform hash
+		 * lookup of system interface addrs.
+		 *
+		 * In the above ingress checks, we were able to get rid
+		 * of a hash lookup (in_localip() call that is) because
+		 * we are doing a radix lookup after the initial firewall
+		 * operation.
 		 */
 #ifndef IPFIREWALL_FORWARD
 		if (in_localip(dest)) {
 #else
-		if (m->m_flags & M_FASTFWD_OURS || in_localip(dest)) {
+		if (in_localip(dest) || m->m_flags & M_FASTFWD_OURS) {
 #endif /* IPFIREWALL_FORWARD */
 forwardlocal:
 			/*
@@ -482,9 +512,7 @@
 			 * "ours"-label.
 			 */
 			m->m_flags |= M_FASTFWD_OURS;
-			if (ro.ro_rt)
-				RTFREE(ro.ro_rt);
-			return 0;
+			goto rcvpath;
 		}
 		/*
 		 * Redo route lookup with new destination address
@@ -507,15 +535,6 @@
 	 * Step 6: send off the packet
 	 */
 
-	/*
-	 * Check if route is dampned (when ARP is unable to resolve)
-	 */
-	if ((ro.ro_rt->rt_flags & RTF_REJECT) &&
-	    ro.ro_rt->rt_rmx.rmx_expire >= time_second) {
-		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, NULL);
-		goto consumed;
-	}
-
 #ifndef ALTQ
 	/*
 	 * Check if there is enough space in the interface queue
@@ -607,13 +626,69 @@
 	if (error != 0)
 		ipstat.ips_odropped++;
 	else {
-		ro.ro_rt->rt_rmx.rmx_pksent++;
 		ipstat.ips_forward++;
 		ipstat.ips_fastforward++;
 	}
 consumed:
 	RTFREE(ro.ro_rt);
 	return 1;
+prercvpath:
+	/*
+	 * Convert to host representation
+	 */
+	ip->ip_len = ntohs(ip->ip_len);
+	ip->ip_off = ntohs(ip->ip_off);
+
+	odest.s_addr = dest.s_addr = ip->ip_dst.s_addr;
+rcvpath:
+	/*
+	 * Receive adjacency. If the packet needs to be punted up to
+	 * ip_input path for further analysis or because it is destined to
+	 * one of our own addresses, run it through the receive-path
+	 * firewall. To actually use this, the user must set up a firewall
+	 * rule using pf(4), ipfw(2), etc that checks on lo0 interface
+	 * under INBOUND direction (e.g. `<action> in quick on lo0` in pf)
+	 *
+	 * Cisco calls this Receive Path ACL, Juniper calls this Loopback
+	 * Filter. The fact that this is FreeBSD makes us behave like
+	 * Juniper (filtering on lo0) instead of Cisco (filtering via
+	 * "ip receive <acl number>" command).  --james 2004/10/23
+	 */
+
+	/*
+	 * Set coordinates to loopback interface, inbound direction,
+	 * then call in the pfil_hooks.
+	 */
+
+	if (ro.ro_rt)
+	  RTFREE(ro.ro_rt);
+
+	if (inet_pfil_hook.ph_busy_count == -1)
+		goto punt;
+
+	if (pfil_run_hooks(&inet_pfil_hook, &m, loif, PFIL_IN, NULL) ||
+	    m == NULL)
+		return 1;
+
+	ip = mtod(m, struct ip *);	/* m may have changed by pfil hook */
+	dest.s_addr = ip->ip_dst.s_addr;
+
+	/* We do not support policy routing inside the receive path.
+	 * If the user requests it, drop the packet. Ensure that this
+	 * is documented in the user manual.
+	 */
+	if (odest.s_addr != dest.s_addr) 
+		goto drop;
+
+punt:
+	/* 
+	 * Packet has been pre-processed by ip_fastforward for 
+	 * control plane evaluations.
+	 */
+	m->m_flags |= M_FASTFWD_PREPROC;
+
+	ipstat.ips_transit_re++;
+	return 0;
 drop:
 	if (m)
 		m_freem(m);

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ip_input.c"

/*
 * Copyright (c) 1982, 1986, 1988, 1993
 *	The Regents of the University of California.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 *	@(#)ip_input.c	8.2 (Berkeley) 1/4/94
 * $FreeBSD: src/sys/netinet/ip_input.c,v 1.283.2.7 2004/10/03 17:04:40 mlaier Exp $
 */

#include "opt_bootp.h"
#include "opt_ipfw.h"
#include "opt_ipstealth.h"
#include "opt_ipsec.h"
#include "opt_mac.h"

#include <sys/param.h>
#include <sys/systm.h>
#include <sys/mac.h>
#include <sys/mbuf.h>
#include <sys/malloc.h>
#include <sys/domain.h>
#include <sys/protosw.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <sys/kernel.h>
#include <sys/syslog.h>
#include <sys/sysctl.h>

#include <net/pfil.h>
#include <net/if.h>
#include <net/if_types.h>
#include <net/if_var.h>
#include <net/if_dl.h>
#include <net/route.h>
#include <net/netisr.h>

#include <netinet/in.h>
#include <netinet/in_systm.h>
#include <netinet/in_var.h>
#include <netinet/ip.h>
#include <netinet/in_pcb.h>
#include <netinet/ip_var.h>
#include <netinet/ip_icmp.h>
#include <machine/in_cksum.h>

#include <sys/socketvar.h>

/* XXX: Temporary until ipfw_ether and ipfw_bridge are converted. */
#include <netinet/ip_fw.h>
#include <netinet/ip_dummynet.h>

#ifdef IPSEC
#include <netinet6/ipsec.h>
#include <netkey/key.h>
#endif

#ifdef FAST_IPSEC
#include <netipsec/ipsec.h>
#include <netipsec/key.h>
#endif

int rsvp_on = 0;

int	ipforwarding = 0;
SYSCTL_INT(_net_inet_ip, IPCTL_FORWARDING, forwarding, CTLFLAG_RW,
    &ipforwarding, 0, "Enable IP forwarding between interfaces");

static int	ipsendredirects = 1; /* XXX */
SYSCTL_INT(_net_inet_ip, IPCTL_SENDREDIRECTS, redirect, CTLFLAG_RW,
    &ipsendredirects, 0, "Enable sending IP redirects");

int	ip_defttl = IPDEFTTL;
SYSCTL_INT(_net_inet_ip, IPCTL_DEFTTL, ttl, CTLFLAG_RW,
    &ip_defttl, 0, "Maximum TTL on IP packets");

static int	ip_dosourceroute = 0;
SYSCTL_INT(_net_inet_ip, IPCTL_SOURCEROUTE, sourceroute, CTLFLAG_RW,
    &ip_dosourceroute, 0, "Enable forwarding source routed IP packets");

static int	ip_acceptsourceroute = 0;
SYSCTL_INT(_net_inet_ip, IPCTL_ACCEPTSOURCEROUTE, accept_sourceroute, 
    CTLFLAG_RW, &ip_acceptsourceroute, 0, 
    "Enable accepting source routed IP packets");

int		ip_doopts = 1;	/* 0 = ignore, 1 = process, 2 = reject */
SYSCTL_INT(_net_inet_ip, OID_AUTO, process_options, CTLFLAG_RW,
    &ip_doopts, 0, "Enable IP options processing ([LS]SRR, RR, TS)");

static int	ip_keepfaith = 0;
SYSCTL_INT(_net_inet_ip, IPCTL_KEEPFAITH, keepfaith, CTLFLAG_RW,
	&ip_keepfaith,	0,
	"Enable packet capture for FAITH IPv4->IPv6 translater daemon");

static int    nipq = 0;         /* total # of reass queues */
static int    maxnipq;
SYSCTL_INT(_net_inet_ip, OID_AUTO, maxfragpackets, CTLFLAG_RW,
	&maxnipq, 0,
	"Maximum number of IPv4 fragment reassembly queue entries");

static int    maxfragsperpacket;
SYSCTL_INT(_net_inet_ip, OID_AUTO, maxfragsperpacket, CTLFLAG_RW,
	&maxfragsperpacket, 0,
	"Maximum number of IPv4 fragments allowed per packet");

static int	ip_sendsourcequench = 0;
SYSCTL_INT(_net_inet_ip, OID_AUTO, sendsourcequench, CTLFLAG_RW,
	&ip_sendsourcequench, 0,
	"Enable the transmission of source quench packets");

int	ip_do_randomid = 0;
SYSCTL_INT(_net_inet_ip, OID_AUTO, random_id, CTLFLAG_RW,
	&ip_do_randomid, 0,
	"Assign random ip_id values");

/*
 * XXX - Setting ip_checkinterface mostly implements the receive side of
 * the Strong ES model described in RFC 1122, but since the routing table
 * and transmit implementation do not implement the Strong ES model,
 * setting this to 1 results in an odd hybrid.
 *
 * XXX - ip_checkinterface currently must be disabled if you use ipnat
 * to translate the destination address to another local interface.
 *
 * XXX - ip_checkinterface must be disabled if you add IP aliases
 * to the loopback interface instead of the interface where the
 * packets for those addresses are received.
 */
static int	ip_checkinterface = 0;
SYSCTL_INT(_net_inet_ip, OID_AUTO, check_interface, CTLFLAG_RW,
    &ip_checkinterface, 0, "Verify packet arrives on correct interface");

#ifdef DIAGNOSTIC
static int	ipprintfs = 0;
#endif

struct pfil_head inet_pfil_hook;

static struct	ifqueue ipintrq;
static int	ipqmaxlen = IFQ_MAXLEN;

extern	struct domain inetdomain;
extern	struct protosw inetsw[];
u_char	ip_protox[IPPROTO_MAX];
struct	in_ifaddrhead in_ifaddrhead; 		/* first inet address */
struct	in_ifaddrhashhead *in_ifaddrhashtbl;	/* inet addr hash table  */
u_long 	in_ifaddrhmask;				/* mask for hash table */

SYSCTL_INT(_net_inet_ip, IPCTL_INTRQMAXLEN, intr_queue_maxlen, CTLFLAG_RW,
    &ipintrq.ifq_maxlen, 0, "Maximum size of the IP input queue");
SYSCTL_INT(_net_inet_ip, IPCTL_INTRQDROPS, intr_queue_drops, CTLFLAG_RD,
    &ipintrq.ifq_drops, 0, "Number of packets dropped from the IP input queue");

struct ipstat ipstat;
SYSCTL_STRUCT(_net_inet_ip, IPCTL_STATS, stats, CTLFLAG_RW,
    &ipstat, ipstat, "IP statistics (struct ipstat, netinet/ip_var.h)");

/* Packet reassembly stuff */
#define IPREASS_NHASH_LOG2      6
#define IPREASS_NHASH           (1 << IPREASS_NHASH_LOG2)
#define IPREASS_HMASK           (IPREASS_NHASH - 1)
#define IPREASS_HASH(x,y) \
	(((((x) & 0xF) | ((((x) >> 8) & 0xF) << 4)) ^ (y)) & IPREASS_HMASK)

static TAILQ_HEAD(ipqhead, ipq) ipq[IPREASS_NHASH];
struct mtx ipqlock;

#define	IPQ_LOCK()	mtx_lock(&ipqlock)
#define	IPQ_UNLOCK()	mtx_unlock(&ipqlock)
#define	IPQ_LOCK_INIT()	mtx_init(&ipqlock, "ipqlock", NULL, MTX_DEF)
#define	IPQ_LOCK_ASSERT()	mtx_assert(&ipqlock, MA_OWNED)

#ifdef IPCTL_DEFMTU
SYSCTL_INT(_net_inet_ip, IPCTL_DEFMTU, mtu, CTLFLAG_RW,
    &ip_mtu, 0, "Default MTU");
#endif

#ifdef IPSTEALTH
int	ipstealth = 0;
SYSCTL_INT(_net_inet_ip, OID_AUTO, stealth, CTLFLAG_RW,
    &ipstealth, 0, "");
#endif

/*
 * ipfw_ether and ipfw_bridge hooks.
 * XXX: Temporary until those are converted to pfil_hooks as well.
 */
ip_fw_chk_t *ip_fw_chk_ptr = NULL;
ip_dn_io_t *ip_dn_io_ptr = NULL;
int fw_enable = 1;
int fw_one_pass = 1;

/*
 * XXX this is ugly.  IP options source routing magic.
 */
struct ipoptrt {
	struct	in_addr dst;			/* final destination */
	char	nop;				/* one NOP to align */
	char	srcopt[IPOPT_OFFSET + 1];	/* OPTVAL, OLEN and OFFSET */
	struct	in_addr route[MAX_IPOPTLEN/sizeof(struct in_addr)];
};

struct ipopt_tag {
	struct	m_tag tag;
	int	ip_nhops;
	struct	ipoptrt ip_srcrt;
};

static void	save_rte(struct mbuf *, u_char *, struct in_addr);
static int	ip_dooptions(struct mbuf *m, int);
static void	ip_forward(struct mbuf *m, int srcrt);
static void	ip_freef(struct ipqhead *, struct ipq *);

/*
 * IP initialization: fill in IP protocol switch table.
 * All protocols not implemented in kernel go to raw IP protocol handler.
 */
void
ip_init()
{
	register struct protosw *pr;
	register int i;

	TAILQ_INIT(&in_ifaddrhead);
	in_ifaddrhashtbl = hashinit(INADDR_NHASH, M_IFADDR, &in_ifaddrhmask);
	pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW);
	if (pr == 0)
		panic("ip_init: PF_INET not found");

	/* Initialize the entire ip_protox[] array to IPPROTO_RAW. */
	for (i = 0; i < IPPROTO_MAX; i++)
		ip_protox[i] = pr - inetsw;
	/*
	 * Cycle through IP protocols and put them into the appropriate place
	 * in ip_protox[].
	 */
	for (pr = inetdomain.dom_protosw;
	    pr < inetdomain.dom_protoswNPROTOSW; pr++)
		if (pr->pr_domain->dom_family == PF_INET &&
		    pr->pr_protocol && pr->pr_protocol != IPPROTO_RAW) {
			/* Be careful to only index valid IP protocols. */
			if (pr->pr_protocol && pr->pr_protocol < IPPROTO_MAX)
				ip_protox[pr->pr_protocol] = pr - inetsw;
		}

	/* Initialize packet filter hooks. */
	inet_pfil_hook.ph_type = PFIL_TYPE_AF;
	inet_pfil_hook.ph_af = AF_INET;
	if ((i = pfil_head_register(&inet_pfil_hook)) != 0)
		printf("%s: WARNING: unable to register pfil hook, "
			"error %d\n", __func__, i);

	/* Initialize IP reassembly queue. */
	IPQ_LOCK_INIT();
	for (i = 0; i < IPREASS_NHASH; i++)
	    TAILQ_INIT(&ipq[i]);
	maxnipq = nmbclusters / 32;
	maxfragsperpacket = 16;

	/* Initialize various other remaining things. */
	ip_id = time_second & 0xffff;
	ipintrq.ifq_maxlen = ipqmaxlen;
	mtx_init(&ipintrq.ifq_mtx, "ip_inq", NULL, MTX_DEF);
	netisr_register(NETISR_IP, ip_input, &ipintrq, NETISR_MPSAFE);
}

/*
 * Ip input routine.  Checksum and byte swap header.  If fragmented
 * try to reassemble.  Process options.  Pass to next level.
 */
void
ip_input(struct mbuf *m)
{
	struct ip *ip = NULL;
	struct in_ifaddr *ia = NULL;
	struct ifaddr *ifa;
	int    checkif, hlen = 0;
	u_short sum;
	int dchg = 0;				/* dest changed after fw */
	struct in_addr odst;			/* original dst address */
#ifdef FAST_IPSEC
	struct m_tag *mtag;
	struct tdb_ident *tdbi;
	struct secpolicy *sp;
	int s, error;
#endif /* FAST_IPSEC */

  	M_ASSERTPKTHDR(m);
  	
	if (m->m_flags & M_FASTFWD_OURS) {
		/*
		 * ip_fastforward firewall changed dest to local.
		 * We expect ip_len and ip_off in host byte order.
		 */
		m->m_flags &= ~M_FASTFWD_OURS;	/* for reflected mbufs */
		/* Set up some basic stuff */
		ip = mtod(m, struct ip *);
		hlen = ip->ip_hl << 2;
  		goto ours;
  	}

	if (m->m_flags & M_FASTFWD_PREPROC){
		/*
		 * Packets that require further analysis or destined
		 * to our own addresses in ip_fastforward.
		 * We expect ip_len and ip_off in host byte order.
		 */
		m->m_flags &= ~M_FASTFWD_PREPROC; /* for reflected mbufs */
		/* Setup some basic stuff */
		ip = mtod(m, struct ip *);
		hlen = ip->ip_hl << 2;
		goto preprocessed;
	}

	ipstat.ips_total++;

	if (m->m_pkthdr.len < sizeof(struct ip))
		goto tooshort;

	if (m->m_len < sizeof (struct ip) &&
	    (m = m_pullup(m, sizeof (struct ip))) == NULL) {
		ipstat.ips_toosmall++;
		return;
	}
	ip = mtod(m, struct ip *);

	if (ip->ip_v != IPVERSION) {
		ipstat.ips_badvers++;
		goto bad;
	}

	hlen = ip->ip_hl << 2;
	if (hlen < sizeof(struct ip)) {	/* minimum header length */
		ipstat.ips_badhlen++;
		goto bad;
	}
	if (hlen > m->m_len) {
		if ((m = m_pullup(m, hlen)) == NULL) {
			ipstat.ips_badhlen++;
			return;
		}
		ip = mtod(m, struct ip *);
	}

	/* 127/8 must not appear on wire - RFC1122 */
	if ((ntohl(ip->ip_dst.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET ||
	    (ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET) {
		if ((m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) == 0) {
			ipstat.ips_badaddr++;
			goto bad;
		}
	}

	if (m->m_pkthdr.csum_flags & CSUM_IP_CHECKED) {
		sum = !(m->m_pkthdr.csum_flags & CSUM_IP_VALID);
	} else {
		if (hlen == sizeof(struct ip)) {
			sum = in_cksum_hdr(ip);
		} else {
			sum = in_cksum(m, hlen);
		}
	}
	if (sum) {
		ipstat.ips_badsum++;
		goto bad;
	}

#ifdef ALTQ
	if (altq_input != NULL && (*altq_input)(m, AF_INET) == 0)
		/* packet is dropped by traffic conditioner */
		return;
#endif

	/*
	 * Convert fields to host representation.
	 */
	ip->ip_len = ntohs(ip->ip_len);
	if (ip->ip_len < hlen) {
		ipstat.ips_badlen++;
		goto bad;
	}
	ip->ip_off = ntohs(ip->ip_off);

	/*
	 * Check that the amount of data in the buffers
	 * is as at least much as the IP header would have us expect.
	 * Trim mbufs if longer than we expect.
	 * Drop packet if shorter than we expect.
	 */
	if (m->m_pkthdr.len < ip->ip_len) {
tooshort:
		ipstat.ips_tooshort++;
		goto bad;
	}
	if (m->m_pkthdr.len > ip->ip_len) {
		if (m->m_len == m->m_pkthdr.len) {
			m->m_len = ip->ip_len;
			m->m_pkthdr.len = ip->ip_len;
		} else
			m_adj(m, ip->ip_len - m->m_pkthdr.len);
	}

preprocessed:

#if defined(IPSEC) && !defined(IPSEC_FILTERGIF)
	/*
	 * Bypass packet filtering for packets from a tunnel (gif).
	 */
	if (ipsec_getnhist(m))
		goto passin;
#endif
#if defined(FAST_IPSEC) && !defined(IPSEC_FILTERGIF)
	/*
	 * Bypass packet filtering for packets from a tunnel (gif).
	 */
	if (m_tag_find(m, PACKET_TAG_IPSEC_IN_DONE, NULL) != NULL)
		goto passin;
#endif

	/*
	 * Run through list of hooks for input packets.
	 *
	 * NB: Beware of the destination address changing (e.g.
	 *     by NAT rewriting).  When this happens, tell
	 *     ip_forward to do the right thing.
	 */

	/* Jump over all PFIL processing if hooks are not active. */
	if (inet_pfil_hook.ph_busy_count == -1)
		goto passin;

	odst = ip->ip_dst;
	if (pfil_run_hooks(&inet_pfil_hook, &m, m->m_pkthdr.rcvif,
	    PFIL_IN, NULL) != 0)
		return;
	if (m == NULL)			/* consumed by filter */
		return;

	ip = mtod(m, struct ip *);
	dchg = (odst.s_addr != ip->ip_dst.s_addr);

#ifdef IPFIREWALL_FORWARD
	if (m->m_flags & M_FASTFWD_OURS) {
		m->m_flags &= ~M_FASTFWD_OURS;
		goto ours;
	}
	dchg = (m_tag_find(m, PACKET_TAG_IPFORWARD, NULL) != NULL);
#endif /* IPFIREWALL_FORWARD */

passin:
	/*
	 * Process options and, if not destined for us,
	 * ship it on.  ip_dooptions returns 1 when an
	 * error was detected (causing an icmp message
	 * to be sent and the original packet to be freed).
	 */
	if (hlen > sizeof (struct ip) && ip_dooptions(m, 0))
		return;

        /* greedy RSVP, snatches any PATH packet of the RSVP protocol and no
         * matter if it is destined to another node, or whether it is 
         * a multicast one, RSVP wants it! and prevents it from being forwarded
         * anywhere else. Also checks if the rsvp daemon is running before
	 * grabbing the packet.
         */
	if (rsvp_on && ip->ip_p==IPPROTO_RSVP) 
		goto ours;

	/*
	 * Check our list of addresses, to see if the packet is for us.
	 * If we don't have any addresses, assume any unicast packet
	 * we receive might be for us (and let the upper layers deal
	 * with it).
	 */
	if (TAILQ_EMPTY(&in_ifaddrhead) &&
	    (m->m_flags & (M_MCAST|M_BCAST)) == 0)
		goto ours;

	/*
	 * Enable a consistency check between the destination address
	 * and the arrival interface for a unicast packet (the RFC 1122
	 * strong ES model) if IP forwarding is disabled and the packet
	 * is not locally generated and the packet is not subject to
	 * 'ipfw fwd'.
	 *
	 * XXX - Checking also should be disabled if the destination
	 * address is ipnat'ed to a different interface.
	 *
	 * XXX - Checking is incompatible with IP aliases added
	 * to the loopback interface instead of the interface where
	 * the packets are received.
	 */
	checkif = ip_checkinterface && (ipforwarding == 0) && 
	    m->m_pkthdr.rcvif != NULL &&
	    ((m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) == 0) &&
	    (dchg == 0);

	/*
	 * Check for exact addresses in the hash bucket.
	 */
	LIST_FOREACH(ia, INADDR_HASH(ip->ip_dst.s_addr), ia_hash) {
		/*
		 * If the address matches, verify that the packet
		 * arrived via the correct interface if checking is
		 * enabled.
		 */
		if (IA_SIN(ia)->sin_addr.s_addr == ip->ip_dst.s_addr && 
		    (!checkif || ia->ia_ifp == m->m_pkthdr.rcvif))
			goto ours;
	}
	/*
	 * Check for broadcast addresses.
	 *
	 * Only accept broadcast packets that arrive via the matching
	 * interface.  Reception of forwarded directed broadcasts would
	 * be handled via ip_forward() and ether_output() with the loopback
	 * into the stack for SIMPLEX interfaces handled by ether_output().
	 */
	if (m->m_pkthdr.rcvif != NULL &&
	    m->m_pkthdr.rcvif->if_flags & IFF_BROADCAST) {
	        TAILQ_FOREACH(ifa, &m->m_pkthdr.rcvif->if_addrhead, ifa_link) {
			if (ifa->ifa_addr->sa_family != AF_INET)
				continue;
			ia = ifatoia(ifa);
			if (satosin(&ia->ia_broadaddr)->sin_addr.s_addr ==
			    ip->ip_dst.s_addr)
				goto ours;
			if (ia->ia_netbroadcast.s_addr == ip->ip_dst.s_addr)
				goto ours;
#ifdef BOOTP_COMPAT
			if (IA_SIN(ia)->sin_addr.s_addr == INADDR_ANY)
				goto ours;
#endif
		}
	}
	if (IN_MULTICAST(ntohl(ip->ip_dst.s_addr))) {
		struct in_multi *inm;
		if (ip_mrouter) {
			/*
			 * If we are acting as a multicast router, all
			 * incoming multicast packets are passed to the
			 * kernel-level multicast forwarding function.
			 * The packet is returned (relatively) intact; if
			 * ip_mforward() returns a non-zero value, the packet
			 * must be discarded, else it may be accepted below.
			 */
			if (ip_mforward &&
			    ip_mforward(ip, m->m_pkthdr.rcvif, m, 0) != 0) {
				ipstat.ips_cantforward++;
				m_freem(m);
				return;
			}

			/*
			 * The process-level routing daemon needs to receive
			 * all multicast IGMP packets, whether or not this
			 * host belongs to their destination groups.
			 */
			if (ip->ip_p == IPPROTO_IGMP)
				goto ours;
			ipstat.ips_forward++;
		}
		/*
		 * See if we belong to the destination multicast group on the
		 * arrival interface.
		 */
		IN_LOOKUP_MULTI(ip->ip_dst, m->m_pkthdr.rcvif, inm);
		if (inm == NULL) {
			ipstat.ips_notmember++;
			m_freem(m);
			return;
		}
		goto ours;
	}
	if (ip->ip_dst.s_addr == (u_long)INADDR_BROADCAST)
		goto ours;
	if (ip->ip_dst.s_addr == INADDR_ANY)
		goto ours;

	/*
	 * FAITH(Firewall Aided Internet Translator)
	 */
	if (m->m_pkthdr.rcvif && m->m_pkthdr.rcvif->if_type == IFT_FAITH) {
		if (ip_keepfaith) {
			if (ip->ip_p == IPPROTO_TCP || ip->ip_p == IPPROTO_ICMP) 
				goto ours;
		}
		m_freem(m);
		return;
	}

	/*
	 * Not for us; forward if possible and desirable.
	 */
	if (ipforwarding == 0) {
		ipstat.ips_cantforward++;
		m_freem(m);
	} else {
#ifdef IPSEC
		/*
		 * Enforce inbound IPsec SPD.
		 */
		if (ipsec4_in_reject(m, NULL)) {
			ipsecstat.in_polvio++;
			goto bad;
		}
#endif /* IPSEC */
#ifdef FAST_IPSEC
		mtag = m_tag_find(m, PACKET_TAG_IPSEC_IN_DONE, NULL);
		s = splnet();
		if (mtag != NULL) {
			tdbi = (struct tdb_ident *)(mtag + 1);
			sp = ipsec_getpolicy(tdbi, IPSEC_DIR_INBOUND);
		} else {
			sp = ipsec_getpolicybyaddr(m, IPSEC_DIR_INBOUND,
						   IP_FORWARDING, &error);   
		}
		if (sp == NULL) {	/* NB: can happen if error */
			splx(s);
			/*XXX error stat???*/
			DPRINTF(("ip_input: no SP for forwarding\n"));	/*XXX*/
			goto bad;
		}

		/*
		 * Check security policy against packet attributes.
		 */
		error = ipsec_in_reject(sp, m);
		KEY_FREESP(&sp);
		splx(s);
		if (error) {
			ipstat.ips_cantforward++;
			goto bad;
		}
#endif /* FAST_IPSEC */
		ip_forward(m, dchg);
	}
	return;

ours:
#ifdef IPSTEALTH
	/*
	 * IPSTEALTH: Process non-routing options only
	 * if the packet is destined for us.
	 */
	if (ipstealth && hlen > sizeof (struct ip) &&
	    ip_dooptions(m, 1))
		return;
#endif /* IPSTEALTH */

	/* Count the packet in the ip address stats */
	if (ia != NULL) {
		ia->ia_ifa.if_ipackets++;
		ia->ia_ifa.if_ibytes += m->m_pkthdr.len;
	}

	/*
	 * Attempt reassembly; if it succeeds, proceed.
	 * ip_reass() will return a different mbuf.
	 */
	if (ip->ip_off & (IP_MF | IP_OFFMASK)) {
		m = ip_reass(m);
		if (m == NULL)
			return;
		ip = mtod(m, struct ip *);
		/* Get the header length of the reassembled packet */
		hlen = ip->ip_hl << 2;
	}

	/*
	 * Further protocols expect the packet length to be w/o the
	 * IP header.
	 */
	ip->ip_len -= hlen;

#ifdef IPSEC
	/*
	 * enforce IPsec policy checking if we are seeing last header.
	 * note that we do not visit this with protocols with pcb layer
	 * code - like udp/tcp/raw ip.
	 */
	if ((inetsw[ip_protox[ip->ip_p]].pr_flags & PR_LASTHDR) != 0 &&
	    ipsec4_in_reject(m, NULL)) {
		ipsecstat.in_polvio++;
		goto bad;
	}
#endif
#if FAST_IPSEC
	/*
	 * enforce IPsec policy checking if we are seeing last header.
	 * note that we do not visit this with protocols with pcb layer
	 * code - like udp/tcp/raw ip.
	 */
	if ((inetsw[ip_protox[ip->ip_p]].pr_flags & PR_LASTHDR) != 0) {
		/*
		 * Check if the packet has already had IPsec processing
		 * done.  If so, then just pass it along.  This tag gets
		 * set during AH, ESP, etc. input handling, before the
		 * packet is returned to the ip input queue for delivery.
		 */ 
		mtag = m_tag_find(m, PACKET_TAG_IPSEC_IN_DONE, NULL);
		s = splnet();
		if (mtag != NULL) {
			tdbi = (struct tdb_ident *)(mtag + 1);
			sp = ipsec_getpolicy(tdbi, IPSEC_DIR_INBOUND);
		} else {
			sp = ipsec_getpolicybyaddr(m, IPSEC_DIR_INBOUND,
						   IP_FORWARDING, &error);   
		}
		if (sp != NULL) {
			/*
			 * Check security policy against packet attributes.
			 */
			error = ipsec_in_reject(sp, m);
			KEY_FREESP(&sp);
		} else {
			/* XXX error stat??? */
			error = EINVAL;
DPRINTF(("ip_input: no SP, packet discarded\n"));/*XXX*/
			goto bad;
		}
		splx(s);
		if (error)
			goto bad;
	}
#endif /* FAST_IPSEC */

	/*
	 * Switch out to protocol's input routine.
	 */
	ipstat.ips_delivered++;

	(*inetsw[ip_protox[ip->ip_p]].pr_input)(m, hlen);
	return;
bad:
	m_freem(m);
}

/*
 * Take incoming datagram fragment and try to reassemble it into
 * whole datagram.  If the argument is the first fragment or one
 * in between the function will return NULL and store the mbuf
 * in the fragment chain.  If the argument is the last fragment
 * the packet will be reassembled and the pointer to the new
 * mbuf returned for further processing.  Only m_tags attached
 * to the first packet/fragment are preserved.
 * The IP header is *NOT* adjusted out of iplen.
 */

struct mbuf *
ip_reass(struct mbuf *m)
{
	struct ip *ip;
	struct mbuf *p, *q, *nq, *t;
	struct ipq *fp = NULL;
	struct ipqhead *head;
	int i, hlen, next;
	u_int8_t ecn, ecn0;
	u_short hash;

	/* If maxnipq is 0, never accept fragments. */
	if (maxnipq == 0) {
		ipstat.ips_fragments++;
		ipstat.ips_fragdropped++;
		m_freem(m);
		return (NULL);
	}

	ip = mtod(m, struct ip *);
	hlen = ip->ip_hl << 2;

	hash = IPREASS_HASH(ip->ip_src.s_addr, ip->ip_id);
	head = &ipq[hash];
	IPQ_LOCK();

	/*
	 * Look for queue of fragments
	 * of this datagram.
	 */
	TAILQ_FOREACH(fp, head, ipq_list)
		if (ip->ip_id == fp->ipq_id &&
		    ip->ip_src.s_addr == fp->ipq_src.s_addr &&
		    ip->ip_dst.s_addr == fp->ipq_dst.s_addr &&
#ifdef MAC
		    mac_fragment_match(m, fp) &&
#endif
		    ip->ip_p == fp->ipq_p)
			goto found;

	fp = NULL;

	/*
	 * Enforce upper bound on number of fragmented packets
	 * for which we attempt reassembly;
	 * If maxnipq is -1, accept all fragments without limitation.
	 */
	if ((nipq > maxnipq) && (maxnipq > 0)) {
		/*
		 * drop something from the tail of the current queue
		 * before proceeding further
		 */
		struct ipq *q = TAILQ_LAST(head, ipqhead);
		if (q == NULL) {   /* gak */
			for (i = 0; i < IPREASS_NHASH; i++) {
				struct ipq *r = TAILQ_LAST(&ipq[i], ipqhead);
				if (r) {
					ipstat.ips_fragtimeout += r->ipq_nfrags;
					ip_freef(&ipq[i], r);
					break;
				}
			}
		} else {
			ipstat.ips_fragtimeout += q->ipq_nfrags;
			ip_freef(head, q);
		}
	}

found:
	/*
	 * Adjust ip_len to not reflect header,
	 * convert offset of this to bytes.
	 */
	ip->ip_len -= hlen;
	if (ip->ip_off & IP_MF) {
		/*
		 * Make sure that fragments have a data length
		 * that's a non-zero multiple of 8 bytes.
		 */
		if (ip->ip_len == 0 || (ip->ip_len & 0x7) != 0) {
			ipstat.ips_toosmall++; /* XXX */
			goto dropfrag;
		}
		m->m_flags |= M_FRAG;
	} else
		m->m_flags &= ~M_FRAG;
	ip->ip_off <<= 3;


	/*
	 * Attempt reassembly; if it succeeds, proceed.
	 * ip_reass() will return a different mbuf.
	 */
	ipstat.ips_fragments++;
	m->m_pkthdr.header = ip;

	/* Previous ip_reass() started here. */
	/*
	 * Presence of header sizes in mbufs
	 * would confuse code below.
	 */
	m->m_data += hlen;
	m->m_len -= hlen;

	/*
	 * If first fragment to arrive, create a reassembly queue.
	 */
	if (fp == NULL) {
		if ((t = m_get(M_DONTWAIT, MT_FTABLE)) == NULL)
			goto dropfrag;
		fp = mtod(t, struct ipq *);
#ifdef MAC
		if (mac_init_ipq(fp, M_NOWAIT) != 0) {
			m_free(t);
			goto dropfrag;
		}
		mac_create_ipq(m, fp);
#endif
		TAILQ_INSERT_HEAD(head, fp, ipq_list);
		nipq++;
		fp->ipq_nfrags = 1;
		fp->ipq_ttl = IPFRAGTTL;
		fp->ipq_p = ip->ip_p;
		fp->ipq_id = ip->ip_id;
		fp->ipq_src = ip->ip_src;
		fp->ipq_dst = ip->ip_dst;
		fp->ipq_frags = m;
		m->m_nextpkt = NULL;
		goto inserted;
	} else {
		fp->ipq_nfrags++;
#ifdef MAC
		mac_update_ipq(m, fp);
#endif
	}

#define GETIP(m)	((struct ip*)((m)->m_pkthdr.header))

	/*
	 * Handle ECN by comparing this segment with the first one;
	 * if CE is set, do not lose CE.
	 * drop if CE and not-ECT are mixed for the same packet.
	 */
	ecn = ip->ip_tos & IPTOS_ECN_MASK;
	ecn0 = GETIP(fp->ipq_frags)->ip_tos & IPTOS_ECN_MASK;
	if (ecn == IPTOS_ECN_CE) {
		if (ecn0 == IPTOS_ECN_NOTECT)
			goto dropfrag;
		if (ecn0 != IPTOS_ECN_CE)
			GETIP(fp->ipq_frags)->ip_tos |= IPTOS_ECN_CE;
	}
	if (ecn == IPTOS_ECN_NOTECT && ecn0 != IPTOS_ECN_NOTECT)
		goto dropfrag;

	/*
	 * Find a segment which begins after this one does.
	 */
	for (p = NULL, q = fp->ipq_frags; q; p = q, q = q->m_nextpkt)
		if (GETIP(q)->ip_off > ip->ip_off)
			break;

	/*
	 * If there is a preceding segment, it may provide some of
	 * our data already.  If so, drop the data from the incoming
	 * segment.  If it provides all of our data, drop us, otherwise
	 * stick new segment in the proper place.
	 *
	 * If some of the data is dropped from the the preceding
	 * segment, then it's checksum is invalidated.
	 */
	if (p) {
		i = GETIP(p)->ip_off + GETIP(p)->ip_len - ip->ip_off;
		if (i > 0) {
			if (i >= ip->ip_len)
				goto dropfrag;
			m_adj(m, i);
			m->m_pkthdr.csum_flags = 0;
			ip->ip_off += i;
			ip->ip_len -= i;
		}
		m->m_nextpkt = p->m_nextpkt;
		p->m_nextpkt = m;
	} else {
		m->m_nextpkt = fp->ipq_frags;
		fp->ipq_frags = m;
	}

	/*
	 * While we overlap succeeding segments trim them or,
	 * if they are completely covered, dequeue them.
	 */
	for (; q != NULL && ip->ip_off + ip->ip_len > GETIP(q)->ip_off;
	     q = nq) {
		i = (ip->ip_off + ip->ip_len) - GETIP(q)->ip_off;
		if (i < GETIP(q)->ip_len) {
			GETIP(q)->ip_len -= i;
			GETIP(q)->ip_off += i;
			m_adj(q, i);
			q->m_pkthdr.csum_flags = 0;
			break;
		}
		nq = q->m_nextpkt;
		m->m_nextpkt = nq;
		ipstat.ips_fragdropped++;
		fp->ipq_nfrags--;
		m_freem(q);
	}

inserted:

	/*
	 * Check for complete reassembly and perform frag per packet
	 * limiting.
	 *
	 * Frag limiting is performed here so that the nth frag has
	 * a chance to complete the packet before we drop the packet.
	 * As a result, n+1 frags are actually allowed per packet, but
	 * only n will ever be stored. (n = maxfragsperpacket.)
	 *
	 */
	next = 0;
	for (p = NULL, q = fp->ipq_frags; q; p = q, q = q->m_nextpkt) {
		if (GETIP(q)->ip_off != next) {
			if (fp->ipq_nfrags > maxfragsperpacket) {
				ipstat.ips_fragdropped += fp->ipq_nfrags;
				ip_freef(head, fp);
			}
			goto done;
		}
		next += GETIP(q)->ip_len;
	}
	/* Make sure the last packet didn't have the IP_MF flag */
	if (p->m_flags & M_FRAG) {
		if (fp->ipq_nfrags > maxfragsperpacket) {
			ipstat.ips_fragdropped += fp->ipq_nfrags;
			ip_freef(head, fp);
		}
		goto done;
	}

	/*
	 * Reassembly is complete.  Make sure the packet is a sane size.
	 */
	q = fp->ipq_frags;
	ip = GETIP(q);
	if (next + (ip->ip_hl << 2) > IP_MAXPACKET) {
		ipstat.ips_toolong++;
		ipstat.ips_fragdropped += fp->ipq_nfrags;
		ip_freef(head, fp);
		goto done;
	}

	/*
	 * Concatenate fragments.
	 */
	m = q;
	t = m->m_next;
	m->m_next = 0;
	m_cat(m, t);
	nq = q->m_nextpkt;
	q->m_nextpkt = 0;
	for (q = nq; q != NULL; q = nq) {
		nq = q->m_nextpkt;
		q->m_nextpkt = NULL;
		m->m_pkthdr.csum_flags &= q->m_pkthdr.csum_flags;
		m->m_pkthdr.csum_data += q->m_pkthdr.csum_data;
		m_cat(m, q);
	}
#ifdef MAC
	mac_create_datagram_from_ipq(fp, m);
	mac_destroy_ipq(fp);
#endif

	/*
	 * Create header for new ip packet by modifying header of first
	 * packet;  dequeue and discard fragment reassembly header.
	 * Make header visible.
	 */
	ip->ip_len = (ip->ip_hl << 2) + next;
	ip->ip_src = fp->ipq_src;
	ip->ip_dst = fp->ipq_dst;
	TAILQ_REMOVE(head, fp, ipq_list);
	nipq--;
	(void) m_free(dtom(fp));
	m->m_len += (ip->ip_hl << 2);
	m->m_data -= (ip->ip_hl << 2);
	/* some debugging cruft by sklower, below, will go away soon */
	if (m->m_flags & M_PKTHDR)	/* XXX this should be done elsewhere */
		m_fixhdr(m);
	ipstat.ips_reassembled++;
	IPQ_UNLOCK();
	return (m);

dropfrag:
	ipstat.ips_fragdropped++;
	if (fp != NULL)
		fp->ipq_nfrags--;
	m_freem(m);
done:
	IPQ_UNLOCK();
	return (NULL);

#undef GETIP
}

/*
 * Free a fragment reassembly header and all
 * associated datagrams.
 */
static void
ip_freef(fhp, fp)
	struct ipqhead *fhp;
	struct ipq *fp;
{
	register struct mbuf *q;

	IPQ_LOCK_ASSERT();

	while (fp->ipq_frags) {
		q = fp->ipq_frags;
		fp->ipq_frags = q->m_nextpkt;
		m_freem(q);
	}
	TAILQ_REMOVE(fhp, fp, ipq_list);
	(void) m_free(dtom(fp));
	nipq--;
}

/*
 * IP timer processing;
 * if a timer expires on a reassembly
 * queue, discard it.
 */
void
ip_slowtimo()
{
	register struct ipq *fp;
	int s = splnet();
	int i;

	IPQ_LOCK();
	for (i = 0; i < IPREASS_NHASH; i++) {
		for(fp = TAILQ_FIRST(&ipq[i]); fp;) {
			struct ipq *fpp;

			fpp = fp;
			fp = TAILQ_NEXT(fp, ipq_list);
			if(--fpp->ipq_ttl == 0) {
				ipstat.ips_fragtimeout += fpp->ipq_nfrags;
				ip_freef(&ipq[i], fpp);
			}
		}
	}
	/*
	 * If we are over the maximum number of fragments
	 * (due to the limit being lowered), drain off
	 * enough to get down to the new limit.
	 */
	if (maxnipq >= 0 && nipq > maxnipq) {
		for (i = 0; i < IPREASS_NHASH; i++) {
			while (nipq > maxnipq && !TAILQ_EMPTY(&ipq[i])) {
				ipstat.ips_fragdropped +=
				    TAILQ_FIRST(&ipq[i])->ipq_nfrags;
				ip_freef(&ipq[i], TAILQ_FIRST(&ipq[i]));
			}
		}
	}
	IPQ_UNLOCK();
	splx(s);
}

/*
 * Drain off all datagram fragments.
 */
void
ip_drain()
{
	int     i;

	IPQ_LOCK();
	for (i = 0; i < IPREASS_NHASH; i++) {
		while(!TAILQ_EMPTY(&ipq[i])) {
			ipstat.ips_fragdropped +=
			    TAILQ_FIRST(&ipq[i])->ipq_nfrags;
			ip_freef(&ipq[i], TAILQ_FIRST(&ipq[i]));
		}
	}
	IPQ_UNLOCK();
	in_rtqdrain();
}

/*
 * Do option processing on a datagram,
 * possibly discarding it if bad options are encountered,
 * or forwarding it if source-routed.
 * The pass argument is used when operating in the IPSTEALTH
 * mode to tell what options to process:
 * [LS]SRR (pass 0) or the others (pass 1).
 * The reason for as many as two passes is that when doing IPSTEALTH,
 * non-routing options should be processed only if the packet is for us.
 * Returns 1 if packet has been forwarded/freed,
 * 0 if the packet should be processed further.
 */
static int
ip_dooptions(struct mbuf *m, int pass)
{
	struct ip *ip = mtod(m, struct ip *);
	u_char *cp;
	struct in_ifaddr *ia;
	int opt, optlen, cnt, off, code, type = ICMP_PARAMPROB, forward = 0;
	struct in_addr *sin, dst;
	n_time ntime;
	struct	sockaddr_in ipaddr = { sizeof(ipaddr), AF_INET };

	/* ignore or reject packets with IP options */
	if (ip_doopts == 0)
		return 0;
	else if (ip_doopts == 2) {
		type = ICMP_UNREACH;
		code = ICMP_UNREACH_FILTER_PROHIB;
		goto bad;
	}

	dst = ip->ip_dst;
	cp = (u_char *)(ip + 1);
	cnt = (ip->ip_hl << 2) - sizeof (struct ip);
	for (; cnt > 0; cnt -= optlen, cp += optlen) {
		opt = cp[IPOPT_OPTVAL];
		if (opt == IPOPT_EOL)
			break;
		if (opt == IPOPT_NOP)
			optlen = 1;
		else {
			if (cnt < IPOPT_OLEN + sizeof(*cp)) {
				code = &cp[IPOPT_OLEN] - (u_char *)ip;
				goto bad;
			}
			optlen = cp[IPOPT_OLEN];
			if (optlen < IPOPT_OLEN + sizeof(*cp) || optlen > cnt) {
				code = &cp[IPOPT_OLEN] - (u_char *)ip;
				goto bad;
			}
		}
		switch (opt) {

		default:
			break;

		/*
		 * Source routing with record.
		 * Find interface with current destination address.
		 * If none on this machine then drop if strictly routed,
		 * or do nothing if loosely routed.
		 * Record interface address and bring up next address
		 * component.  If strictly routed make sure next
		 * address is on directly accessible net.
		 */
		case IPOPT_LSRR:
		case IPOPT_SSRR:
#ifdef IPSTEALTH
			if (ipstealth && pass > 0)
				break;
#endif
			if (optlen < IPOPT_OFFSET + sizeof(*cp)) {
				code = &cp[IPOPT_OLEN] - (u_char *)ip;
				goto bad;
			}
			if ((off = cp[IPOPT_OFFSET]) < IPOPT_MINOFF) {
				code = &cp[IPOPT_OFFSET] - (u_char *)ip;
				goto bad;
			}
			ipaddr.sin_addr = ip->ip_dst;
			ia = (struct in_ifaddr *)
				ifa_ifwithaddr((struct sockaddr *)&ipaddr);
			if (ia == NULL) {
				if (opt == IPOPT_SSRR) {
					type = ICMP_UNREACH;
					code = ICMP_UNREACH_SRCFAIL;
					goto bad;
				}
				if (!ip_dosourceroute)
					goto nosourcerouting;
				/*
				 * Loose routing, and not at next destination
				 * yet; nothing to do except forward.
				 */
				break;
			}
			off--;			/* 0 origin */
			if (off > optlen - (int)sizeof(struct in_addr)) {
				/*
				 * End of source route.  Should be for us.
				 */
				if (!ip_acceptsourceroute)
					goto nosourcerouting;
				save_rte(m, cp, ip->ip_src);
				break;
			}
#ifdef IPSTEALTH
			if (ipstealth)
				goto dropit;
#endif
			if (!ip_dosourceroute) {
				if (ipforwarding) {
					char buf[16]; /* aaa.bbb.ccc.ddd\0 */
					/*
					 * Acting as a router, so generate ICMP
					 */
nosourcerouting:
					strcpy(buf, inet_ntoa(ip->ip_dst));
					log(LOG_WARNING, 
					    "attempted source route from %s to %s\n",
					    inet_ntoa(ip->ip_src), buf);
					type = ICMP_UNREACH;
					code = ICMP_UNREACH_SRCFAIL;
					goto bad;
				} else {
					/*
					 * Not acting as a router, so silently drop.
					 */
#ifdef IPSTEALTH
dropit:
#endif
					ipstat.ips_cantforward++;
					m_freem(m);
					return (1);
				}
			}

			/*
			 * locate outgoing interface
			 */
			(void)memcpy(&ipaddr.sin_addr, cp + off,
			    sizeof(ipaddr.sin_addr));

			if (opt == IPOPT_SSRR) {
#define	INA	struct in_ifaddr *
#define	SA	struct sockaddr *
			    if ((ia = (INA)ifa_ifwithdstaddr((SA)&ipaddr)) == NULL)
				ia = (INA)ifa_ifwithnet((SA)&ipaddr);
			} else
				ia = ip_rtaddr(ipaddr.sin_addr);
			if (ia == NULL) {
				type = ICMP_UNREACH;
				code = ICMP_UNREACH_SRCFAIL;
				goto bad;
			}
			ip->ip_dst = ipaddr.sin_addr;
			(void)memcpy(cp + off, &(IA_SIN(ia)->sin_addr),
			    sizeof(struct in_addr));
			cp[IPOPT_OFFSET] += sizeof(struct in_addr);
			/*
			 * Let ip_intr's mcast routing check handle mcast pkts
			 */
			forward = !IN_MULTICAST(ntohl(ip->ip_dst.s_addr));
			break;

		case IPOPT_RR:
#ifdef IPSTEALTH
			if (ipstealth && pass == 0)
				break;
#endif
			if (optlen < IPOPT_OFFSET + sizeof(*cp)) {
				code = &cp[IPOPT_OFFSET] - (u_char *)ip;
				goto bad;
			}
			if ((off = cp[IPOPT_OFFSET]) < IPOPT_MINOFF) {
				code = &cp[IPOPT_OFFSET] - (u_char *)ip;
				goto bad;
			}
			/*
			 * If no space remains, ignore.
			 */
			off--;			/* 0 origin */
			if (off > optlen - (int)sizeof(struct in_addr))
				break;
			(void)memcpy(&ipaddr.sin_addr, &ip->ip_dst,
			    sizeof(ipaddr.sin_addr));
			/*
			 * locate outgoing interface; if we're the destination,
			 * use the incoming interface (should be same).
			 */
			if ((ia = (INA)ifa_ifwithaddr((SA)&ipaddr)) == NULL &&
			    (ia = ip_rtaddr(ipaddr.sin_addr)) == NULL) {
				type = ICMP_UNREACH;
				code = ICMP_UNREACH_HOST;
				goto bad;
			}
			(void)memcpy(cp + off, &(IA_SIN(ia)->sin_addr),
			    sizeof(struct in_addr));
			cp[IPOPT_OFFSET] += sizeof(struct in_addr);
			break;

		case IPOPT_TS:
#ifdef IPSTEALTH
			if (ipstealth && pass == 0)
				break;
#endif
			code = cp - (u_char *)ip;
			if (optlen < 4 || optlen > 40) {
				code = &cp[IPOPT_OLEN] - (u_char *)ip;
				goto bad;
			}
			if ((off = cp[IPOPT_OFFSET]) < 5) {
				code = &cp[IPOPT_OLEN] - (u_char *)ip;
				goto bad;
			}
			if (off > optlen - (int)sizeof(int32_t)) {
				cp[IPOPT_OFFSET + 1] += (1 << 4);
				if ((cp[IPOPT_OFFSET + 1] & 0xf0) == 0) {
					code = &cp[IPOPT_OFFSET] - (u_char *)ip;
					goto bad;
				}
				break;
			}
			off--;				/* 0 origin */
			sin = (struct in_addr *)(cp + off);
			switch (cp[IPOPT_OFFSET + 1] & 0x0f) {

			case IPOPT_TS_TSONLY:
				break;

			case IPOPT_TS_TSANDADDR:
				if (off + sizeof(n_time) +
				    sizeof(struct in_addr) > optlen) {
					code = &cp[IPOPT_OFFSET] - (u_char *)ip;
					goto bad;
				}
				ipaddr.sin_addr = dst;
				ia = (INA)ifaof_ifpforaddr((SA)&ipaddr,
							    m->m_pkthdr.rcvif);
				if (ia == NULL)
					continue;
				(void)memcpy(sin, &IA_SIN(ia)->sin_addr,
				    sizeof(struct in_addr));
				cp[IPOPT_OFFSET] += sizeof(struct in_addr);
				off += sizeof(struct in_addr);
				break;

			case IPOPT_TS_PRESPEC:
				if (off + sizeof(n_time) +
				    sizeof(struct in_addr) > optlen) {
					code = &cp[IPOPT_OFFSET] - (u_char *)ip;
					goto bad;
				}
				(void)memcpy(&ipaddr.sin_addr, sin,
				    sizeof(struct in_addr));
				if (ifa_ifwithaddr((SA)&ipaddr) == NULL)
					continue;
				cp[IPOPT_OFFSET] += sizeof(struct in_addr);
				off += sizeof(struct in_addr);
				break;

			default:
				code = &cp[IPOPT_OFFSET + 1] - (u_char *)ip;
				goto bad;
			}
			ntime = iptime();
			(void)memcpy(cp + off, &ntime, sizeof(n_time));
			cp[IPOPT_OFFSET] += sizeof(n_time);
		}
	}
	if (forward && ipforwarding) {
		ip_forward(m, 1);
		return (1);
	}
	return (0);
bad:
	icmp_error(m, type, code, 0, 0);
	ipstat.ips_badoptions++;
	return (1);
}

/*
 * Given address of next destination (final or next hop),
 * return internet address info of interface to be used to get there.
 */
struct in_ifaddr *
ip_rtaddr(dst)
	struct in_addr dst;
{
	struct route sro;
	struct sockaddr_in *sin;
	struct in_ifaddr *ifa;

	bzero(&sro, sizeof(sro));
	sin = (struct sockaddr_in *)&sro.ro_dst;
	sin->sin_family = AF_INET;
	sin->sin_len = sizeof(*sin);
	sin->sin_addr = dst;
	rtalloc_ign(&sro, RTF_CLONING);

	if (sro.ro_rt == NULL)
		return ((struct in_ifaddr *)0);

	ifa = ifatoia(sro.ro_rt->rt_ifa);
	RTFREE(sro.ro_rt);
	return ifa;
}

/*
 * Save incoming source route for use in replies,
 * to be picked up later by ip_srcroute if the receiver is interested.
 */
static void
save_rte(m, option, dst)
	struct mbuf *m;
	u_char *option;
	struct in_addr dst;
{
	unsigned olen;
	struct ipopt_tag *opts;

	opts = (struct ipopt_tag *)m_tag_get(PACKET_TAG_IPOPTIONS,
					sizeof(struct ipopt_tag), M_NOWAIT);
	if (opts == NULL)
		return;

	olen = option[IPOPT_OLEN];
#ifdef DIAGNOSTIC
	if (ipprintfs)
		printf("save_rte: olen %d\n", olen);
#endif
	if (olen > sizeof(opts->ip_srcrt) - (1 + sizeof(dst)))
		return;
	bcopy(option, opts->ip_srcrt.srcopt, olen);
	opts->ip_nhops = (olen - IPOPT_OFFSET - 1) / sizeof(struct in_addr);
	opts->ip_srcrt.dst = dst;
	m_tag_prepend(m, (struct m_tag *)opts);
}

/*
 * Retrieve incoming source route for use in replies,
 * in the same form used by setsockopt.
 * The first hop is placed before the options, will be removed later.
 */
struct mbuf *
ip_srcroute(m0)
	struct mbuf *m0;
{
	register struct in_addr *p, *q;
	register struct mbuf *m;
	struct ipopt_tag *opts;

	opts = (struct ipopt_tag *)m_tag_find(m0, PACKET_TAG_IPOPTIONS, NULL);
	if (opts == NULL)
		return ((struct mbuf *)0);

	if (opts->ip_nhops == 0)
		return ((struct mbuf *)0);
	m = m_get(M_DONTWAIT, MT_HEADER);
	if (m == NULL)
		return ((struct mbuf *)0);

#define OPTSIZ	(sizeof(opts->ip_srcrt.nop) + sizeof(opts->ip_srcrt.srcopt))

	/* length is (nhops+1)*sizeof(addr) + sizeof(nop + srcrt header) */
	m->m_len = opts->ip_nhops * sizeof(struct in_addr) +
	    sizeof(struct in_addr) + OPTSIZ;
#ifdef DIAGNOSTIC
	if (ipprintfs)
		printf("ip_srcroute: nhops %d mlen %d", opts->ip_nhops, m->m_len);
#endif

	/*
	 * First save first hop for return route
	 */
	p = &(opts->ip_srcrt.route[opts->ip_nhops - 1]);
	*(mtod(m, struct in_addr *)) = *p--;
#ifdef DIAGNOSTIC
	if (ipprintfs)
		printf(" hops %lx", (u_long)ntohl(mtod(m, struct in_addr *)->s_addr));
#endif

	/*
	 * Copy option fields and padding (nop) to mbuf.
	 */
	opts->ip_srcrt.nop = IPOPT_NOP;
	opts->ip_srcrt.srcopt[IPOPT_OFFSET] = IPOPT_MINOFF;
	(void)memcpy(mtod(m, caddr_t) + sizeof(struct in_addr),
	    &(opts->ip_srcrt.nop), OPTSIZ);
	q = (struct in_addr *)(mtod(m, caddr_t) +
	    sizeof(struct in_addr) + OPTSIZ);
#undef OPTSIZ
	/*
	 * Record return path as an IP source route,
	 * reversing the path (pointers are now aligned).
	 */
	while (p >= opts->ip_srcrt.route) {
#ifdef DIAGNOSTIC
		if (ipprintfs)
			printf(" %lx", (u_long)ntohl(q->s_addr));
#endif
		*q++ = *p--;
	}
	/*
	 * Last hop goes to final destination.
	 */
	*q = opts->ip_srcrt.dst;
#ifdef DIAGNOSTIC
	if (ipprintfs)
		printf(" %lx\n", (u_long)ntohl(q->s_addr));
#endif
	m_tag_delete(m0, (struct m_tag *)opts);
	return (m);
}

/*
 * Strip out IP options, at higher
 * level protocol in the kernel.
 * Second argument is buffer to which options
 * will be moved, and return value is their length.
 * XXX should be deleted; last arg currently ignored.
 */
void
ip_stripoptions(m, mopt)
	register struct mbuf *m;
	struct mbuf *mopt;
{
	register int i;
	struct ip *ip = mtod(m, struct ip *);
	register caddr_t opts;
	int olen;

	olen = (ip->ip_hl << 2) - sizeof (struct ip);
	opts = (caddr_t)(ip + 1);
	i = m->m_len - (sizeof (struct ip) + olen);
	bcopy(opts + olen, opts, (unsigned)i);
	m->m_len -= olen;
	if (m->m_flags & M_PKTHDR)
		m->m_pkthdr.len -= olen;
	ip->ip_v = IPVERSION;
	ip->ip_hl = sizeof(struct ip) >> 2;
}

u_char inetctlerrmap[PRC_NCMDS] = {
	0,		0,		0,		0,
	0,		EMSGSIZE,	EHOSTDOWN,	EHOSTUNREACH,
	EHOSTUNREACH,	EHOSTUNREACH,	ECONNREFUSED,	ECONNREFUSED,
	EMSGSIZE,	EHOSTUNREACH,	0,		0,
	0,		0,		EHOSTUNREACH,	0,
	ENOPROTOOPT,	ECONNREFUSED
};

/*
 * Forward a packet.  If some error occurs return the sender
 * an icmp packet.  Note we can't always generate a meaningful
 * icmp message because icmp doesn't have a large enough repertoire
 * of codes and types.
 *
 * If not forwarding, just drop the packet.  This could be confusing
 * if ipforwarding was zero but some routing protocol was advancing
 * us as a gateway to somewhere.  However, we must let the routing
 * protocol deal with that.
 *
 * The srcrt parameter indicates whether the packet is being forwarded
 * via a source route.
 */
void
ip_forward(struct mbuf *m, int srcrt)
{
	struct ip *ip = mtod(m, struct ip *);
	struct in_ifaddr *ia = NULL;
	int error, type = 0, code = 0;
	struct mbuf *mcopy;
	struct in_addr dest;
	struct ifnet *destifp, dummyifp;

#ifdef DIAGNOSTIC
	if (ipprintfs)
		printf("forward: src %lx dst %lx ttl %x\n",
		    (u_long)ip->ip_src.s_addr, (u_long)ip->ip_dst.s_addr,
		    ip->ip_ttl);
#endif


	if (m->m_flags & (M_BCAST|M_MCAST) || in_canforward(ip->ip_dst) == 0) {
		ipstat.ips_cantforward++;
		m_freem(m);
		return;
	}
#ifdef IPSTEALTH
	if (!ipstealth) {
#endif
		if (ip->ip_ttl <= IPTTLDEC) {
			icmp_error(m, ICMP_TIMXCEED, ICMP_TIMXCEED_INTRANS,
			    0, 0);
			return;
		}
#ifdef IPSTEALTH
	}
#endif

	if (!srcrt && (ia = ip_rtaddr(ip->ip_dst)) == NULL) {
		icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, 0);
		return;
	}

	/*
	 * Save the IP header and at most 8 bytes of the payload,
	 * in case we need to generate an ICMP message to the src.
	 *
	 * XXX this can be optimized a lot by saving the data in a local
	 * buffer on the stack (72 bytes at most), and only allocating the
	 * mbuf if really necessary. The vast majority of the packets
	 * are forwarded without having to send an ICMP back (either
	 * because unnecessary, or because rate limited), so we are
	 * really we are wasting a lot of work here.
	 *
	 * We don't use m_copy() because it might return a reference
	 * to a shared cluster. Both this function and ip_output()
	 * assume exclusive access to the IP header in `m', so any
	 * data in a cluster may change before we reach icmp_error().
	 */
	MGET(mcopy, M_DONTWAIT, m->m_type);
	if (mcopy != NULL && !m_dup_pkthdr(mcopy, m, M_DONTWAIT)) {
		/*
		 * It's probably ok if the pkthdr dup fails (because
		 * the deep copy of the tag chain failed), but for now
		 * be conservative and just discard the copy since
		 * code below may some day want the tags.
		 */
		m_free(mcopy);
		mcopy = NULL;
	}
	if (mcopy != NULL) {
		mcopy->m_len = imin((ip->ip_hl << 2) + 8,
		    (int)ip->ip_len);
		mcopy->m_pkthdr.len = mcopy->m_len;
		m_copydata(m, 0, mcopy->m_len, mtod(mcopy, caddr_t));
	}

#ifdef IPSTEALTH
	if (!ipstealth) {
#endif
		ip->ip_ttl -= IPTTLDEC;
#ifdef IPSTEALTH
	}
#endif

	/*
	 * If forwarding packet using same interface that it came in on,
	 * perhaps should send a redirect to sender to shortcut a hop.
	 * Only send redirect if source is sending directly to us,
	 * and if packet was not source routed (or has any options).
	 * Also, don't send redirect if forwarding using a default route
	 * or a route modified by a redirect.
	 */
	dest.s_addr = 0;
	if (!srcrt && ipsendredirects && ia->ia_ifp == m->m_pkthdr.rcvif) {
		struct sockaddr_in *sin;
		struct route ro;
		struct rtentry *rt;

		bzero(&ro, sizeof(ro));
		sin = (struct sockaddr_in *)&ro.ro_dst;
		sin->sin_family = AF_INET;
		sin->sin_len = sizeof(*sin);
		sin->sin_addr = ip->ip_dst;
		rtalloc_ign(&ro, RTF_CLONING);

		rt = ro.ro_rt;

		if (rt && (rt->rt_flags & (RTF_DYNAMIC|RTF_MODIFIED)) == 0 &&
		    satosin(rt_key(rt))->sin_addr.s_addr != 0) {
#define	RTA(rt)	((struct in_ifaddr *)(rt->rt_ifa))
			u_long src = ntohl(ip->ip_src.s_addr);

			if (RTA(rt) &&
			    (src & RTA(rt)->ia_subnetmask) == RTA(rt)->ia_subnet) {
				if (rt->rt_flags & RTF_GATEWAY)
					dest.s_addr = satosin(rt->rt_gateway)->sin_addr.s_addr;
				else
					dest.s_addr = ip->ip_dst.s_addr;
				/* Router requirements says to only send host redirects */
				type = ICMP_REDIRECT;
				code = ICMP_REDIRECT_HOST;
#ifdef DIAGNOSTIC
				if (ipprintfs)
					printf("redirect (%d) to %lx\n", code, (u_long)dest.s_addr);
#endif
			}
		}
		if (rt)
			RTFREE(rt);
	}

	error = ip_output(m, (struct mbuf *)0, NULL, IP_FORWARDING, 0, NULL);
	if (error)
		ipstat.ips_cantforward++;
	else {
		ipstat.ips_forward++;
		if (type)
			ipstat.ips_redirectsent++;
		else {
			if (mcopy)
				m_freem(mcopy);
			return;
		}
	}
	if (mcopy == NULL)
		return;
	destifp = NULL;

	switch (error) {

	case 0:				/* forwarded, but need redirect */
		/* type, code set above */
		break;

	case ENETUNREACH:		/* shouldn't happen, checked above */
	case EHOSTUNREACH:
	case ENETDOWN:
	case EHOSTDOWN:
	default:
		type = ICMP_UNREACH;
		code = ICMP_UNREACH_HOST;
		break;

	case EMSGSIZE:
		type = ICMP_UNREACH;
		code = ICMP_UNREACH_NEEDFRAG;
#if defined(IPSEC) || defined(FAST_IPSEC)
		/*
		 * If the packet is routed over IPsec tunnel, tell the
		 * originator the tunnel MTU.
		 *	tunnel MTU = if MTU - sizeof(IP) - ESP/AH hdrsiz
		 * XXX quickhack!!!
		 */
		{
			struct secpolicy *sp = NULL;
			int ipsecerror;
			int ipsechdr;
			struct route *ro;

#ifdef IPSEC
			sp = ipsec4_getpolicybyaddr(mcopy,
						    IPSEC_DIR_OUTBOUND,
						    IP_FORWARDING,
						    &ipsecerror);
#else /* FAST_IPSEC */
			sp = ipsec_getpolicybyaddr(mcopy,
						   IPSEC_DIR_OUTBOUND,
						   IP_FORWARDING,
						   &ipsecerror);
#endif
			if (sp != NULL) {
				/* count IPsec header size */
				ipsechdr = ipsec4_hdrsiz(mcopy,
							 IPSEC_DIR_OUTBOUND,
							 NULL);

				/*
				 * find the correct route for outer IPv4
				 * header, compute tunnel MTU.
				 *
				 * XXX BUG ALERT
				 * The "dummyifp" code relies upon the fact
				 * that icmp_error() touches only ifp->if_mtu.
				 */
				/*XXX*/
				destifp = NULL;
				if (sp->req != NULL
				 && sp->req->sav != NULL
				 && sp->req->sav->sah != NULL) {
					ro = &sp->req->sav->sah->sa_route;
					if (ro->ro_rt && ro->ro_rt->rt_ifp) {
						dummyifp.if_mtu =
						    ro->ro_rt->rt_rmx.rmx_mtu ?
						    ro->ro_rt->rt_rmx.rmx_mtu :
						    ro->ro_rt->rt_ifp->if_mtu;
						dummyifp.if_mtu -= ipsechdr;
						destifp = &dummyifp;
					}
				}

#ifdef IPSEC
				key_freesp(sp);
#else /* FAST_IPSEC */
				KEY_FREESP(&sp);
#endif
				ipstat.ips_cantfrag++;
				break;
			} else 
#endif /*IPSEC || FAST_IPSEC*/
		/*
		 * When doing source routing 'ia' can be NULL.  Fall back
		 * to the minimum guaranteed routeable packet size and use
		 * the same hack as IPSEC to setup a dummyifp for icmp.
		 */
		if (ia == NULL) {
			dummyifp.if_mtu = IP_MSS;
			destifp = &dummyifp;
		} else
			destifp = ia->ia_ifp;
#if defined(IPSEC) || defined(FAST_IPSEC)
		}
#endif /*IPSEC || FAST_IPSEC*/
		ipstat.ips_cantfrag++;
		break;

	case ENOBUFS:
		/*
		 * A router should not generate ICMP_SOURCEQUENCH as
		 * required in RFC1812 Requirements for IP Version 4 Routers.
		 * Source quench could be a big problem under DoS attacks,
		 * or if the underlying interface is rate-limited.
		 * Those who need source quench packets may re-enable them
		 * via the net.inet.ip.sendsourcequench sysctl.
		 */
		if (ip_sendsourcequench == 0) {
			m_freem(mcopy);
			return;
		} else {
			type = ICMP_SOURCEQUENCH;
			code = 0;
		}
		break;

	case EACCES:			/* ipfw denied packet */
		m_freem(mcopy);
		return;
	}
	icmp_error(mcopy, type, code, dest.s_addr, destifp);
}

void
ip_savecontrol(inp, mp, ip, m)
	register struct inpcb *inp;
	register struct mbuf **mp;
	register struct ip *ip;
	register struct mbuf *m;
{
	if (inp->inp_socket->so_options & (SO_BINTIME | SO_TIMESTAMP)) {
		struct bintime bt;

		bintime(&bt);
		if (inp->inp_socket->so_options & SO_BINTIME) {
			*mp = sbcreatecontrol((caddr_t) &bt, sizeof(bt),
			SCM_BINTIME, SOL_SOCKET);
			if (*mp)
				mp = &(*mp)->m_next;
		}
		if (inp->inp_socket->so_options & SO_TIMESTAMP) {
			struct timeval tv;

			bintime2timeval(&bt, &tv);
			*mp = sbcreatecontrol((caddr_t) &tv, sizeof(tv),
				SCM_TIMESTAMP, SOL_SOCKET);
			if (*mp)
				mp = &(*mp)->m_next;
		}
	}
	if (inp->inp_flags & INP_RECVDSTADDR) {
		*mp = sbcreatecontrol((caddr_t) &ip->ip_dst,
		    sizeof(struct in_addr), IP_RECVDSTADDR, IPPROTO_IP);
		if (*mp)
			mp = &(*mp)->m_next;
	}
	if (inp->inp_flags & INP_RECVTTL) {
		*mp = sbcreatecontrol((caddr_t) &ip->ip_ttl,
		    sizeof(u_char), IP_RECVTTL, IPPROTO_IP);
		if (*mp)
			mp = &(*mp)->m_next;
	}
#ifdef notyet
	/* XXX
	 * Moving these out of udp_input() made them even more broken
	 * than they already were.
	 */
	/* options were tossed already */
	if (inp->inp_flags & INP_RECVOPTS) {
		*mp = sbcreatecontrol((caddr_t) opts_deleted_above,
		    sizeof(struct in_addr), IP_RECVOPTS, IPPROTO_IP);
		if (*mp)
			mp = &(*mp)->m_next;
	}
	/* ip_srcroute doesn't do what we want here, need to fix */
	if (inp->inp_flags & INP_RECVRETOPTS) {
		*mp = sbcreatecontrol((caddr_t) ip_srcroute(m),
		    sizeof(struct in_addr), IP_RECVRETOPTS, IPPROTO_IP);
		if (*mp)
			mp = &(*mp)->m_next;
	}
#endif
	if (inp->inp_flags & INP_RECVIF) {
		struct ifnet *ifp;
		struct sdlbuf {
			struct sockaddr_dl sdl;
			u_char	pad[32];
		} sdlbuf;
		struct sockaddr_dl *sdp;
		struct sockaddr_dl *sdl2 = &sdlbuf.sdl;

		if (((ifp = m->m_pkthdr.rcvif)) 
		&& ( ifp->if_index && (ifp->if_index <= if_index))) {
			sdp = (struct sockaddr_dl *)
			    (ifaddr_byindex(ifp->if_index)->ifa_addr);
			/*
			 * Change our mind and don't try copy.
			 */
			if ((sdp->sdl_family != AF_LINK)
			|| (sdp->sdl_len > sizeof(sdlbuf))) {
				goto makedummy;
			}
			bcopy(sdp, sdl2, sdp->sdl_len);
		} else {
makedummy:	
			sdl2->sdl_len
				= offsetof(struct sockaddr_dl, sdl_data[0]);
			sdl2->sdl_family = AF_LINK;
			sdl2->sdl_index = 0;
			sdl2->sdl_nlen = sdl2->sdl_alen = sdl2->sdl_slen = 0;
		}
		*mp = sbcreatecontrol((caddr_t) sdl2, sdl2->sdl_len,
			IP_RECVIF, IPPROTO_IP);
		if (*mp)
			mp = &(*mp)->m_next;
	}
}

/*
 * XXX these routines are called from the upper part of the kernel.
 * They need to be locked when we remove Giant.
 *
 * They could also be moved to ip_mroute.c, since all the RSVP
 *  handling is done there already.
 */
static int ip_rsvp_on;
struct socket *ip_rsvpd;
int
ip_rsvp_init(struct socket *so)
{
	if (so->so_type != SOCK_RAW ||
	    so->so_proto->pr_protocol != IPPROTO_RSVP)
		return EOPNOTSUPP;

	if (ip_rsvpd != NULL)
		return EADDRINUSE;

	ip_rsvpd = so;
	/*
	 * This may seem silly, but we need to be sure we don't over-increment
	 * the RSVP counter, in case something slips up.
	 */
	if (!ip_rsvp_on) {
		ip_rsvp_on = 1;
		rsvp_on++;
	}

	return 0;
}

int
ip_rsvp_done(void)
{
	ip_rsvpd = NULL;
	/*
	 * This may seem silly, but we need to be sure we don't over-decrement
	 * the RSVP counter, in case something slips up.
	 */
	if (ip_rsvp_on) {
		ip_rsvp_on = 0;
		rsvp_on--;
	}
	return 0;
}

void
rsvp_input(struct mbuf *m, int off)	/* XXX must fixup manually */
{
	if (rsvp_input_p) { /* call the real one if loaded */
		rsvp_input_p(m, off);
		return;
	}

	/* Can still get packets with rsvp_on = 0 if there is a local member
	 * of the group to which the RSVP packet is addressed.  But in this
	 * case we want to throw the packet away.
	 */
	
	if (!rsvp_on) {
		m_freem(m);
		return;
	}

	if (ip_rsvpd != NULL) { 
		rip_input(m, off);
		return;
	}
	/* Drop the packet */
	m_freem(m);
}

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ip_input.c.diff"

--- ip_input.org.c	Mon Dec 27 01:53:29 2004
+++ ip_input.c	Mon Dec 27 01:51:55 2004
@@ -27,7 +27,7 @@
  * SUCH DAMAGE.
  *
  *	@(#)ip_input.c	8.2 (Berkeley) 1/4/94
- * $FreeBSD: /repoman/r/ncvs/src/sys/netinet/ip_input.c,v 1.292 2004/10/19 15:45:57 andre Exp $
+ * $FreeBSD: src/sys/netinet/ip_input.c,v 1.283.2.7 2004/10/03 17:04:40 mlaier Exp $
  */
 
 #include "opt_bootp.h"
@@ -156,7 +156,7 @@
 static int	ipprintfs = 0;
 #endif
 
-struct pfil_head inet_pfil_hook;	/* Packet filter hooks */
+struct pfil_head inet_pfil_hook;
 
 static struct	ifqueue ipintrq;
 static int	ipqmaxlen = IFQ_MAXLEN;
@@ -261,7 +261,7 @@
 		if (pr->pr_domain->dom_family == PF_INET &&
 		    pr->pr_protocol && pr->pr_protocol != IPPROTO_RAW) {
 			/* Be careful to only index valid IP protocols. */
-			if (pr->pr_protocol <= IPPROTO_MAX)
+			if (pr->pr_protocol && pr->pr_protocol < IPPROTO_MAX)
 				ip_protox[pr->pr_protocol] = pr - inetsw;
 		}
 
@@ -311,16 +311,29 @@
   	
 	if (m->m_flags & M_FASTFWD_OURS) {
 		/*
-		 * Firewall or NAT changed destination to local.
-		 * We expect ip_len and ip_off to be in host byte order.
+		 * ip_fastforward firewall changed dest to local.
+		 * We expect ip_len and ip_off in host byte order.
 		 */
-		m->m_flags &= ~M_FASTFWD_OURS;
-		/* Set up some basics that will be used later. */
+		m->m_flags &= ~M_FASTFWD_OURS;	/* for reflected mbufs */
+		/* Set up some basic stuff */
 		ip = mtod(m, struct ip *);
 		hlen = ip->ip_hl << 2;
   		goto ours;
   	}
 
+	if (m->m_flags & M_FASTFWD_PREPROC){
+		/*
+		 * Packets that require further analysis or destined
+		 * to our own addresses in ip_fastforward.
+		 * We expect ip_len and ip_off in host byte order.
+		 */
+		m->m_flags &= ~M_FASTFWD_PREPROC; /* for reflected mbufs */
+		/* Setup some basic stuff */
+		ip = mtod(m, struct ip *);
+		hlen = ip->ip_hl << 2;
+		goto preprocessed;
+	}
+
 	ipstat.ips_total++;
 
 	if (m->m_pkthdr.len < sizeof(struct ip))
@@ -408,6 +421,9 @@
 		} else
 			m_adj(m, ip->ip_len - m->m_pkthdr.len);
 	}
+
+preprocessed:
+
 #if defined(IPSEC) && !defined(IPSEC_FILTERGIF)
 	/*
 	 * Bypass packet filtering for packets from a tunnel (gif).
@@ -1143,67 +1159,6 @@
 	IPQ_UNLOCK();
 	in_rtqdrain();
 }
-
-/*
- * The protocol to be inserted into ip_protox[] must be already registered
- * in inetsw[], either statically or through pf_proto_register().
- */
-int
-ipproto_register(u_char ipproto)
-{
-	struct protosw *pr;
-
-	/* Sanity checks. */
-	if (ipproto == 0)
-		return (EPROTONOSUPPORT);
-
-	/*
-	 * The protocol slot must not be occupied by another protocol
-	 * already.  An index pointing to IPPROTO_RAW is unused.
-	 */
-	pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW);
-	if (pr == NULL)
-		return (EPFNOSUPPORT);
-	if (ip_protox[ipproto] != pr - inetsw)	/* IPPROTO_RAW */
-		return (EEXIST);
-
-	/* Find the protocol position in inetsw[] and set the index. */
-	for (pr = inetdomain.dom_protosw;
-	     pr < inetdomain.dom_protoswNPROTOSW; pr++) {
-		if (pr->pr_domain->dom_family == PF_INET &&
-		    pr->pr_protocol && pr->pr_protocol == ipproto) {
-			/* Be careful to only index valid IP protocols. */
-			if (pr->pr_protocol <= IPPROTO_MAX) {
-				ip_protox[pr->pr_protocol] = pr - inetsw;
-				return (0);
-			} else
-				return (EINVAL);
-		}
-	}
-	return (EPROTONOSUPPORT);
-}
-
-int
-ipproto_unregister(u_char ipproto)
-{
-	struct protosw *pr;
-
-	/* Sanity checks. */
-	if (ipproto == 0)
-		return (EPROTONOSUPPORT);
-
-	/* Check if the protocol was indeed registered. */
-	pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW);
-	if (pr == NULL)
-		return (EPFNOSUPPORT);
-	if (ip_protox[ipproto] == pr - inetsw)  /* IPPROTO_RAW */
-		return (ENOENT);
-
-	/* Reset the protocol slot to IPPROTO_RAW. */
-	ip_protox[ipproto] = pr - inetsw;
-	return (0);
-}
-
 
 /*
  * Do option processing on a datagram,

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ip_var.h"

/*
 * Copyright (c) 1982, 1986, 1993
 *	The Regents of the University of California.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 *	@(#)ip_var.h	8.2 (Berkeley) 1/9/95
 * $FreeBSD: src/sys/netinet/ip_var.h,v 1.89.2.2 2004/09/23 16:38:53 andre Exp $
 */

#ifndef _NETINET_IP_VAR_H_
#define	_NETINET_IP_VAR_H_

#include <sys/queue.h>

/*
 * Overlay for ip header used by other protocols (tcp, udp).
 */
struct ipovly {
	u_char	ih_x1[9];		/* (unused) */
	u_char	ih_pr;			/* protocol */
	u_short	ih_len;			/* protocol length */
	struct	in_addr ih_src;		/* source internet address */
	struct	in_addr ih_dst;		/* destination internet address */
};

#ifdef _KERNEL
/*
 * Ip reassembly queue structure.  Each fragment
 * being reassembled is attached to one of these structures.
 * They are timed out after ipq_ttl drops to 0, and may also
 * be reclaimed if memory becomes tight.
 */
struct ipq {
	TAILQ_ENTRY(ipq) ipq_list;	/* to other reass headers */
	u_char	ipq_ttl;		/* time for reass q to live */
	u_char	ipq_p;			/* protocol of this fragment */
	u_short	ipq_id;			/* sequence id for reassembly */
	struct mbuf *ipq_frags;		/* to ip headers of fragments */
	struct	in_addr ipq_src,ipq_dst;
	u_char	ipq_nfrags;		/* # frags in this packet */
	struct label *ipq_label;		/* MAC label */
};
#endif /* _KERNEL */

/*
 * Structure stored in mbuf in inpcb.ip_options
 * and passed to ip_output when ip options are in use.
 * The actual length of the options (including ipopt_dst)
 * is in m_len.
 */
#define MAX_IPOPTLEN	40

struct ipoption {
	struct	in_addr ipopt_dst;	/* first-hop dst if source routed */
	char	ipopt_list[MAX_IPOPTLEN];	/* options proper */
};

/*
 * Structure attached to inpcb.ip_moptions and
 * passed to ip_output when IP multicast options are in use.
 */
struct ip_moptions {
	struct	ifnet *imo_multicast_ifp; /* ifp for outgoing multicasts */
	struct in_addr imo_multicast_addr; /* ifindex/addr on MULTICAST_IF */
	u_char	imo_multicast_ttl;	/* TTL for outgoing multicasts */
	u_char	imo_multicast_loop;	/* 1 => hear sends if a member */
	u_short	imo_num_memberships;	/* no. memberships this socket */
	struct	in_multi *imo_membership[IP_MAX_MEMBERSHIPS];
	u_long	imo_multicast_vif;	/* vif num outgoing multicasts */
};

struct	ipstat {
	u_long	ips_total;		/* total packets received */
	u_long	ips_badsum;		/* checksum bad */
	u_long	ips_tooshort;		/* packet too short */
	u_long	ips_toosmall;		/* not enough data */
	u_long	ips_badhlen;		/* ip header length < data size */
	u_long	ips_badlen;		/* ip length < ip header length */
	u_long	ips_fragments;		/* fragments received */
	u_long	ips_fragdropped;	/* frags dropped (dups, out of space) */
	u_long	ips_fragtimeout;	/* fragments timed out */
	u_long	ips_forward;		/* packets forwarded */
	u_long	ips_fastforward;	/* packets fast forwarded */
	u_long	ips_transit_re;		/* packets sent to receive path from fastfwd */
	u_long	ips_cantforward;	/* packets rcvd for unreachable dest */
	u_long	ips_redirectsent;	/* packets forwarded on same net */
	u_long	ips_noproto;		/* unknown or unsupported protocol */
	u_long	ips_delivered;		/* datagrams delivered to upper level*/
	u_long	ips_localout;		/* total ip packets generated here */
	u_long	ips_odropped;		/* lost packets due to nobufs, etc. */
	u_long	ips_reassembled;	/* total packets reassembled ok */
	u_long	ips_fragmented;		/* datagrams successfully fragmented */
	u_long	ips_ofragments;		/* output fragments created */
	u_long	ips_cantfrag;		/* don't fragment flag was set, etc. */
	u_long	ips_badoptions;		/* error in option processing */
	u_long	ips_noroute;		/* packets discarded due to no route */
	u_long	ips_badvers;		/* ip version != 4 */
	u_long	ips_rawout;		/* total raw ip packets generated */
	u_long	ips_toolong;		/* ip length > max ip packet size */
	u_long	ips_notmember;		/* multicasts for unregistered grps */
	u_long	ips_nogif;		/* no match gif found */
	u_long	ips_badaddr;		/* invalid address on header */
};

#ifdef _KERNEL

/* flags passed to ip_output as last parameter */
#define	IP_FORWARDING		0x1		/* most of ip header exists */
#define	IP_RAWOUTPUT		0x2		/* raw ip header exists */
#define	IP_SENDONES		0x4		/* send all-ones broadcast */
#define	IP_ROUTETOIF		SO_DONTROUTE	/* bypass routing tables */
#define	IP_ALLOWBROADCAST	SO_BROADCAST	/* can send broadcast packets */

/* mbuf flag used by ip_fastfwd */
#define	M_FASTFWD_OURS		M_PROTO1	/* changed dst to local */
#define	M_FASTFWD_PREPROC	M_PROTO2	/* bypass pre processing */

struct ip;
struct inpcb;
struct route;
struct sockopt;

extern struct	ipstat	ipstat;
extern u_short	ip_id;				/* ip packet ctr, for ids */
extern int	ip_defttl;			/* default IP ttl */
extern int	ipforwarding;			/* ip forwarding */
extern int	ip_doopts;			/* process or ignore IP options */
#ifdef IPSTEALTH
extern int	ipstealth;			/* stealth forwarding */
#endif

extern u_char	ip_protox[];
extern struct socket *ip_rsvpd;	/* reservation protocol daemon */
extern struct socket *ip_mrouter; /* multicast routing daemon */
extern int	(*legal_vif_num)(int);
extern u_long	(*ip_mcast_src)(int);
extern int rsvp_on;
extern struct	pr_usrreqs rip_usrreqs;

int	 ip_ctloutput(struct socket *, struct sockopt *sopt);
void	 ip_drain(void);
int	 ip_fragment(struct ip *ip, struct mbuf **m_frag, int mtu,
	    u_long if_hwassist_flags, int sw_csum);
void	 ip_freemoptions(struct ip_moptions *);
void	 ip_init(void);
extern int	 (*ip_mforward)(struct ip *, struct ifnet *, struct mbuf *,
			  struct ip_moptions *);
int	 ip_output(struct mbuf *,
	    struct mbuf *, struct route *, int, struct ip_moptions *,
	    struct inpcb *);
struct mbuf *
	 ip_reass(struct mbuf *);
struct in_ifaddr *
	 ip_rtaddr(struct in_addr);
void	 ip_savecontrol(struct inpcb *, struct mbuf **, struct ip *,
		struct mbuf *);
void	 ip_slowtimo(void);
struct mbuf *
	 ip_srcroute(struct mbuf *);
void	 ip_stripoptions(struct mbuf *, struct mbuf *);
u_int16_t	ip_randomid(void);
int	rip_ctloutput(struct socket *, struct sockopt *);
void	rip_ctlinput(int, struct sockaddr *, void *);
void	rip_init(void);
void	rip_input(struct mbuf *, int);
int	rip_output(struct mbuf *, struct socket *, u_long);
void	ipip_input(struct mbuf *, int);
void	rsvp_input(struct mbuf *, int);
int	ip_rsvp_init(struct socket *);
int	ip_rsvp_done(void);
extern int	(*ip_rsvp_vif)(struct socket *, struct sockopt *);
extern void	(*ip_rsvp_force_done)(struct socket *);
extern void	(*rsvp_input_p)(struct mbuf *m, int off);

extern	struct pfil_head inet_pfil_hook;	/* packet filter hooks */

void	in_delayed_cksum(struct mbuf *m);

static __inline uint16_t ip_newid(void);
extern int ip_do_randomid;

static __inline uint16_t
ip_newid(void)
{
	if (ip_do_randomid)
		return ip_randomid();

	return htons(ip_id++);
}

#endif /* _KERNEL */

#endif /* !_NETINET_IP_VAR_H_ */

--FCuugMFkClbJLl1L
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ip_var.h.diff"

--- ip_var.org.h	Mon Dec 27 01:48:09 2004
+++ ip_var.h	Sun Dec 26 22:32:58 2004
@@ -27,7 +27,7 @@
  * SUCH DAMAGE.
  *
  *	@(#)ip_var.h	8.2 (Berkeley) 1/9/95
- * $FreeBSD: /repoman/r/ncvs/src/sys/netinet/ip_var.h,v 1.92 2004/10/19 15:45:57 andre Exp $
+ * $FreeBSD: src/sys/netinet/ip_var.h,v 1.89.2.2 2004/09/23 16:38:53 andre Exp $
  */
 
 #ifndef _NETINET_IP_VAR_H_
@@ -104,6 +104,7 @@
 	u_long	ips_fragtimeout;	/* fragments timed out */
 	u_long	ips_forward;		/* packets forwarded */
 	u_long	ips_fastforward;	/* packets fast forwarded */
+	u_long	ips_transit_re;		/* packets sent to receive path from fastfwd */
 	u_long	ips_cantforward;	/* packets rcvd for unreachable dest */
 	u_long	ips_redirectsent;	/* packets forwarded on same net */
 	u_long	ips_noproto;		/* unknown or unsupported protocol */
@@ -135,6 +136,7 @@
 
 /* mbuf flag used by ip_fastfwd */
 #define	M_FASTFWD_OURS		M_PROTO1	/* changed dst to local */
+#define	M_FASTFWD_PREPROC	M_PROTO2	/* bypass pre processing */
 
 struct ip;
 struct inpcb;
@@ -149,6 +151,7 @@
 #ifdef IPSTEALTH
 extern int	ipstealth;			/* stealth forwarding */
 #endif
+
 extern u_char	ip_protox[];
 extern struct socket *ip_rsvpd;	/* reservation protocol daemon */
 extern struct socket *ip_mrouter; /* multicast routing daemon */
@@ -168,8 +171,6 @@
 int	 ip_output(struct mbuf *,
 	    struct mbuf *, struct route *, int, struct ip_moptions *,
 	    struct inpcb *);
-int	 ipproto_register(u_char);
-int	 ipproto_unregister(u_char);
 struct mbuf *
 	 ip_reass(struct mbuf *);
 struct in_ifaddr *

--FCuugMFkClbJLl1L--

From owner-freebsd-net@FreeBSD.ORG  Mon Dec 27 11:02:15 2004
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 071CA16A4DF
	for <freebsd-net@freebsd.org>; Mon, 27 Dec 2004 11:02:15 +0000 (GMT)
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D2C5D43D49
	for <freebsd-net@freebsd.org>; Mon, 27 Dec 2004 11:02:14 +0000 (GMT)
	(envelope-from owner-bugmaster@freebsd.org)
Received: from freefall.freebsd.org (peter@localhost [127.0.0.1])
	by freefall.freebsd.org (8.13.1/8.13.1) with ESMTP id iBRB2Eco030216
	for <freebsd-net@freebsd.org>; Mon, 27 Dec 2004 11:02:14 GMT
	(envelope-from owner-bugmaster@freebsd.org)
Received: (from peter@localhost)
	by freefall.freebsd.org (8.13.1/8.13.1/Submit) id iBRB2Eqd030210
	for freebsd-net@freebsd.org; Mon, 27 Dec 2004 11:02:14 GMT
	(envelope-from owner-bugmaster@freebsd.org)
Date: Mon, 27 Dec 2004 11:02:14 GMT
Message-Id: <200412271102.iBRB2Eqd030210@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: peter set sender to
	owner-bugmaster@freebsd.org using -f
From: FreeBSD bugmaster <bugmaster@freebsd.org>
To: freebsd-net@FreeBSD.org
Subject: Current problem reports assigned to you
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Dec 2004 11:02:15 -0000

Current FreeBSD problem reports
Critical problems
Serious problems

S  Submitted   Tracker     Resp.       Description
-------------------------------------------------------------------------------
o [2002/07/26] kern/41007  net         overfull traffic on third and fourth adap
o [2003/10/14] kern/57985  net         [patch] Missing splx in ether_output_fram

2 problems total.

Non-critical problems

S  Submitted   Tracker     Resp.       Description
-------------------------------------------------------------------------------
o [2003/07/11] kern/54383  net         [nfs] [patch] NFS root configurations wit

1 problem total.

From owner-freebsd-net@FreeBSD.ORG  Wed Dec 29 07:38:34 2004
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 781D716A4CE
	for <freebsd-net@freebsd.org>; Wed, 29 Dec 2004 07:38:34 +0000 (GMT)
Received: from sdf.lonestar.org (mx.freeshell.org [192.94.73.21])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 937B043D49
	for <freebsd-net@freebsd.org>; Wed, 29 Dec 2004 07:38:33 +0000 (GMT)
	(envelope-from wang@sdf.lonestar.org)
Received: from sdf.lonestar.org (IDENT:wang@sdf.lonestar.org [192.94.73.1])
	by sdf.lonestar.org (8.12.10/8.12.10) with ESMTP id iBT7brBq017313
	for <freebsd-net@freebsd.org>; Wed, 29 Dec 2004 07:37:53 GMT
Received: (from wang@localhost)
	by sdf.lonestar.org (8.12.10/8.12.8/Submit) id iBT7brMt002820;
	Wed, 29 Dec 2004 07:37:53 GMT
Date: Wed, 29 Dec 2004 07:37:53 +0000 (UTC)
From: Wang <wang@sdf.lonestar.org>
To: freebsd-net@freebsd.org
Message-ID: <Pine.NEB.4.61.0412290736340.719@sdf.lonestar.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Subject: Intel Pro/1000 Nic - no communication with network
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Dec 2004 07:38:34 -0000


Hi all,


I am trying to get FreeBSD installed and running on a new rack server 
(asus ap140r-e1). The rack has a Intel Pro/1000 Nic (Intelr 82541GI 
Gigabit Controller).


I have tried both freebsd 4.10 and 5.3 - and after installation both 
detect the network card and ifconfig shows it. I provided the ip manually 
via rc.conf lines:


     defaultrouter="10.0.0.2"

     hostname="blah.testbox.com"

     ifconfig_em0="inet 10.0.0.9 netmask 255.0.0.0"


These lines are definately correct because I use them on another freebsd 
5.3 box on my network without any problems (dhcp also works on the other 
5.3 box).


Ifconfig shows for em0:


     em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500

     options=3<RXCSUM,TXCSUM>

     inet 10.0.0.9 netmask 0xff000000 broadcast 10.255.255.255

     ether 00:11:2F:0F:80:f4

     media: Ethernet autoselect

     status: no carrier


I have tried both dhcp (my preference here) and hardcoding the ip/gateway 
details - but neither work. DHCP can not be obtained, and if i manually 
give ifconfig the ip/gateway etc...i just do not seem to get any 
connectivity out of the card, I can't ping any other boxes on the 
network...only pinging the boxes own 10.0.0.9 ip works.


I went to the intel web site and downloaded the driver they have on there 
for freebsd, but it seems the same as what freebsd already has built into 
the kernel (I tried the intel driver regardless, but the result is the 
same...no network communication).


I decided to verify that the card/cable/network itself is ok - so I ran 
the Knoppix Linux Live cd on the rack....it dectected the nic perfectly 
and got dhcp immediately! So I know for sure the problem is something to 
do with freebsd and its setup.


I really have no clue where to turn with this problem, and I don't want to 
have to ditch bsd in favour of linux for this rack - please can anyone 
help? I really need this all complete tomorrow


Thank you in advance,


daveuk

From owner-freebsd-net@FreeBSD.ORG  Wed Dec 29 08:00:07 2004
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2A8B516A4CE
	for <freebsd-net@freebsd.org>; Wed, 29 Dec 2004 08:00:07 +0000 (GMT)
Received: from lakecmmtao03.coxmail.com (lakecmmtao03.coxmail.com
	[68.99.120.70])	by mx1.FreeBSD.org (Postfix) with ESMTP id 5EF9743D41
	for <freebsd-net@freebsd.org>; Wed, 29 Dec 2004 08:00:06 +0000 (GMT)
	(envelope-from steve@freeslacker.net)
Received: from [192.168.69.75] ([68.98.220.74]) by lakecmmtao03.coxmail.com
	ESMTP
	<20041229080003.QDSZ15913.lakecmmtao03.coxmail.com@[192.168.69.75]>;
	Wed, 29 Dec 2004 03:00:03 -0500
Message-ID: <41D26405.2060500@freeslacker.net>
Date: Wed, 29 Dec 2004 01:00:05 -0700
From: Steven Stremciuc <steve@freeslacker.net>
User-Agent: Mozilla Thunderbird 0.8 (Windows/20040913)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Wang <wang@sdf.lonestar.org>
References: <Pine.NEB.4.61.0412290736340.719@sdf.lonestar.org>
In-Reply-To: <Pine.NEB.4.61.0412290736340.719@sdf.lonestar.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-net@freebsd.org
Subject: Re: Intel Pro/1000 Nic - no communication with network
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Dec 2004 08:00:07 -0000

"status: no carrier" when you do an ifconfig indicates a layer 1 or 
physical problem. You need to try plugging in each of the other ethernet 
ports on that server and checking ifconfig to see if the status changes. 
As obvious as this seems (and sorry if you've already tried this) I have 
some supermicro servers with Intel nic's and FreeBSD 5.3-R has them 
mixed up. The ports are labeled 1 and 2 on the case but FreeBSD calls 
ethernet port 2 em0 and ethernet port 1 em1. Maybe that is what you are 
seeing here.

steve

>
> Ifconfig shows for em0:
>
>
>     em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>
>     options=3<RXCSUM,TXCSUM>
>
>     inet 10.0.0.9 netmask 0xff000000 broadcast 10.255.255.255
>
>     ether 00:11:2F:0F:80:f4
>
>     media: Ethernet autoselect
>
>     status: no carrier


From owner-freebsd-net@FreeBSD.ORG  Wed Dec 29 09:02:23 2004
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D3A2E16A4CE
	for <net@freebsd.org>; Wed, 29 Dec 2004 09:02:23 +0000 (GMT)
Received: from relay.pair.com (relay00.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id 3265B43D1D
	for <net@freebsd.org>; Wed, 29 Dec 2004 09:02:23 +0000 (GMT)
	(envelope-from silby@silby.com)
Received: (qmail 70146 invoked from network); 29 Dec 2004 09:02:21 -0000
Received: from unknown (HELO localhost) (unknown)
  by unknown with SMTP; 29 Dec 2004 09:02:21 -0000
X-pair-Authenticated: 209.68.2.70
Date: Wed, 29 Dec 2004 03:02:20 -0600 (CST)
From: Mike Silbersack <silby@silby.com>
To: net@freebsd.org
In-Reply-To: <20041218033226.L28788@odysseus.silby.com>
Message-ID: <20041229025718.U26249@odysseus.silby.com>
References: <20041218033226.L28788@odysseus.silby.com>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="0-1414219215-1104310690=:26249"
Content-ID: <20041229025813.O26249@odysseus.silby.com>
Subject: Update: Alternate port randomization approaches
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Dec 2004 09:02:24 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-1414219215-1104310690=:26249
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; format=flowed
Content-ID: <20041229025813.B26249@odysseus.silby.com>


On Sat, 18 Dec 2004, Mike Silbersack wrote:

> There have been a few reports by users of front end web proxies and other 
> systems under FreeBSD that port randomization causes them problems under 
> load.  This seems to be due to a combination of port randomization and rapid 
> connections to the same host causing ports to be recycled before the ISN has 
> advanced past the end of the previous connection, thereby causing the 
> TIME_WAIT socket on the receiving end to ignore the new SYN.

Based on testing done by Igor Sysoev, I've found that my original patch is 
insufficient; even as little as one randomizaion per second can cause 
problems for some users.  As a result, I've created the attached patch 
(versions for both 6.x and 4.x are included).  It implements a relatively 
simple algorithm:  Port randomization is turned disable once the 
connection rate goes above 20 connections per second, and it is not 
reenabled until the connection rate falls below 20 cps for 5 seconds 
straight.

This appears to work for Igor, and it seems safe enough to commit before 
4.11-RC2.  But, if possible, I'd like a few more sets of eyes to 
doublecheck the concept and code; please take a look at it if you have a 
chance.

Thanks,

Mike "Silby" Silbersack
--0-1414219215-1104310690=:26249
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; NAME="portrandom-gen4-4x.patch"
Content-Transfer-Encoding: BASE64
Content-ID: <20041229025810.L26249@odysseus.silby.com>
Content-Description: 
Content-Disposition: ATTACHMENT; FILENAME="portrandom-gen4-4x.patch"

ZGlmZiAtdSAtciAvdXNyL3NyYy9zeXMub2xkL25ldGluZXQvaW5fcGNiLmMg
L3Vzci9zcmMvc3lzL25ldGluZXQvaW5fcGNiLmMNCi0tLSAvdXNyL3NyYy9z
eXMub2xkL25ldGluZXQvaW5fcGNiLmMJVGh1IERlYyAxNiAwMzoyNjoxMSAy
MDA0DQorKysgL3Vzci9zcmMvc3lzL25ldGluZXQvaW5fcGNiLmMJU2F0IERl
YyAyNSAxNzowNzo1NiAyMDA0DQpAQCAtNjIsNiArNjIsOCBAQA0KICNpbmNs
dWRlIDxuZXRpbmV0L2luX3BjYi5oPg0KICNpbmNsdWRlIDxuZXRpbmV0L2lu
X3Zhci5oPg0KICNpbmNsdWRlIDxuZXRpbmV0L2lwX3Zhci5oPg0KKyNpbmNs
dWRlIDxuZXRpbmV0L3VkcC5oPg0KKyNpbmNsdWRlIDxuZXRpbmV0L3VkcF92
YXIuaD4NCiAjaWZkZWYgSU5FVDYNCiAjaW5jbHVkZSA8bmV0aW5ldC9pcDYu
aD4NCiAjaW5jbHVkZSA8bmV0aW5ldDYvaXA2X3Zhci5oPg0KQEAgLTk1LDgg
Kzk3LDEyIEBADQogaW50CWlwcG9ydF9oaWZpcnN0YXV0byA9IElQUE9SVF9I
SUZJUlNUQVVUTzsJLyogNDkxNTIgKi8NCiBpbnQJaXBwb3J0X2hpbGFzdGF1
dG8gID0gSVBQT1JUX0hJTEFTVEFVVE87CQkvKiA2NTUzNSAqLw0KIA0KLS8q
IFNoYWxsIHdlIGFsbG9jYXRlIGVwaGVtZXJhbCBwb3J0cyBpbiByYW5kb20g
b3JkZXI/ICovDQotaW50CWlwcG9ydF9yYW5kb21pemVkID0gMDsNCisvKiBW
YXJpYWJsZXMgZGVhbGluZyB3aXRoIHJhbmRvbSBlcGhlbWVyYWwgcG9ydCBh
bGxvY2F0aW9uLiAqLw0KK2ludAlpcHBvcnRfcmFuZG9taXplZCA9IDE7CS8q
IHVzZXIgY29udHJvbGxlZCB2aWEgc3lzY3RsICovDQoraW50CWlwcG9ydF9y
YW5kb21jcHMgPSAyMDsJLyogdXNlciBjb250cm9sbGVkIHZpYSBzeXNjdGwg
Ki8NCitpbnQJaXBwb3J0X3N0b3ByYW5kb20gPSAwOwkvKiB0b2dnbGVkIGJ5
IGlwcG9ydF90aWNrICovDQoraW50CWlwcG9ydF90Y3BhbGxvY3M7DQoraW50
CWlwcG9ydF90Y3BsYXN0Y291bnQ7DQogDQogI2RlZmluZSBSQU5HRUNISyh2
YXIsIG1pbiwgbWF4KSBcDQogCWlmICgodmFyKSA8IChtaW4pKSB7ICh2YXIp
ID0gKG1pbik7IH0gXA0KQEAgLTEzNiw2ICsxNDIsOCBAQA0KIAkgICAmaXBw
b3J0X2hpbGFzdGF1dG8sIDAsICZzeXNjdGxfbmV0X2lwcG9ydF9jaGVjaywg
IkkiLCAiIik7DQogU1lTQ1RMX0lOVChfbmV0X2luZXRfaXBfcG9ydHJhbmdl
LCBPSURfQVVUTywgcmFuZG9taXplZCwgQ1RMRkxBR19SVywNCiAJICAgJmlw
cG9ydF9yYW5kb21pemVkLCAwLCAiIik7DQorU1lTQ1RMX0lOVChfbmV0X2lu
ZXRfaXBfcG9ydHJhbmdlLCBPSURfQVVUTywgcmFuZG9tY3BzLA0KKyAgICAg
ICAgICBDVExGTEFHX1JXLCAmaXBwb3J0X3JhbmRvbWNwcywgMCwgIiIpOw0K
IA0KIC8qDQogICogaW5fcGNiLmM6IG1hbmFnZSB0aGUgUHJvdG9jb2wgQ29u
dHJvbCBCbG9ja3MuDQpAQCAtMjAwLDYgKzIwOCw3IEBADQogCXVfc2hvcnQg
bHBvcnQgPSAwOw0KIAlpbnQgd2lsZCA9IDAsIHJldXNlcG9ydCA9IChzby0+
c29fb3B0aW9ucyAmIFNPX1JFVVNFUE9SVCk7DQogCWludCBlcnJvciwgcHJp
c29uID0gMDsNCisJaW50IGRvcmFuZG9tOw0KIA0KIAlpZiAoVEFJTFFfRU1Q
VFkoJmluX2lmYWRkcmhlYWQpKSAvKiBYWFggYnJva2VuISAqLw0KIAkJcmV0
dXJuIChFQUREUk5PVEFWQUlMKTsNCkBAIC0zMTMsNiArMzIyLDIwIEBADQog
CQkJbGFzdHBvcnQgPSAmcGNiaW5mby0+bGFzdHBvcnQ7DQogCQl9DQogCQkv
Kg0KKwkJKiBGb3IgVURQLCB1c2UgcmFuZG9tIHBvcnQgYWxsb2NhdGlvbiBh
cyBsb25nIGFzIHRoZSB1c2VyDQorCQkqIGFsbG93cyBpdC4gIEZvciBUQ1Ag
KGFuZCBhcyBvZiB5ZXQgdW5rbm93bikgY29ubmVjdGlvbnMsDQorCQkqIHVz
ZSByYW5kb20gcG9ydCBhbGxvY2F0aW9uIG9ubHkgaWYgdGhlIHVzZXIgYWxs
b3dzIGl0IEFORA0KKwkJKiBpcHBvcnRfdGljayBhbGxvd3MgaXQuDQorCQkq
Lw0KKwkJaWYgKGlwcG9ydF9yYW5kb21pemVkICYmDQorCQkJKCFpcHBvcnRf
c3RvcHJhbmRvbSB8fCBwY2JpbmZvID09ICZ1ZGJpbmZvKSkNCisJCQlkb3Jh
bmRvbSA9IDE7DQorCQllbHNlDQorCQkJZG9yYW5kb20gPSAwOw0KKwkJLyog
TWFrZSBzdXJlIHRvIG5vdCBpbmNsdWRlIFVEUCBwYWNrZXRzIGluIHRoZSBj
b3VudC4gKi8NCisJCWlmIChwY2JpbmZvICE9ICZ1ZGJpbmZvKQ0KKwkJCWlw
cG9ydF90Y3BhbGxvY3MrKzsNCisJCS8qDQogCQkgKiBTaW1wbGUgY2hlY2sg
dG8gZW5zdXJlIGFsbCBwb3J0cyBhcmUgbm90IHVzZWQgdXAgY2F1c2luZw0K
IAkJICogYSBkZWFkbG9jayBoZXJlLg0KIAkJICoNCkBAIC0zMjMsNyArMzQ2
LDcgQEANCiAJCQkvKg0KIAkJCSAqIGNvdW50aW5nIGRvd24NCiAJCQkgKi8N
Ci0JCQlpZiAoaXBwb3J0X3JhbmRvbWl6ZWQpDQorCQkJaWYgKGRvcmFuZG9t
KQ0KIAkJCQkqbGFzdHBvcnQgPSBmaXJzdCAtDQogCQkJCQkgICAgKGFyYzRy
YW5kb20oKSAlIChmaXJzdCAtIGxhc3QpKTsNCiAJCQljb3VudCA9IGZpcnN0
IC0gbGFzdDsNCkBAIC0zNDMsNyArMzY2LDcgQEANCiAJCQkvKg0KIAkJCSAq
IGNvdW50aW5nIHVwDQogCQkJICovDQotCQkJaWYgKGlwcG9ydF9yYW5kb21p
emVkKQ0KKwkJCWlmIChkb3JhbmRvbSkNCiAJCQkJKmxhc3Rwb3J0ID0gZmly
c3QgKw0KIAkJCQkJICAgIChhcmM0cmFuZG9tKCkgJSAobGFzdCAtIGZpcnN0
KSk7DQogCQkJY291bnQgPSBsYXN0IC0gZmlyc3Q7DQpAQCAtMTA0Niw0ICsx
MDY5LDMwIEBADQogCWlmIChudG9obChpbnAtPmlucF9sYWRkci5zX2FkZHIp
ID09IHAtPnBfcHJpc29uLT5wcl9pcCkNCiAJCXJldHVybiAoMCk7DQogCXJl
dHVybiAoMSk7DQorfQ0KKw0KKy8qDQorICogaXBwb3J0X3RpY2sgcnVucyBv
bmNlIHBlciBzZWNvbmQsIGRldGVybWluaW5nIGlmIHJhbmRvbSBwb3J0DQor
ICogYWxsb2NhdGlvbiBzaG91bGQgYmUgY29udGludWVkLiAgSWYgbW9yZSB0
aGFuIGlwcG9ydF9yYW5kb21jcHMNCisgKiBwb3J0cyBoYXZlIGJlZW4gYWxs
b2NhdGVkIGluIHRoZSBsYXN0IHNlY29uZCwgdGhlbiB3ZSByZXR1cm4gdG8N
CisgKiBzZXF1ZW50aWFsIHBvcnQgYWxsb2NhdGlvbi4gV2UgcmV0dXJuIHRv
IHJhbmRvbSBhbGxvY2F0aW9uIG9ubHkNCisgKiBvbmNlIHdlIGRyb3AgYmVs
b3cgaXBwb3J0X3JhbmRvbWNwcyBmb3IgYXQgbGVhc3QgNSBzZWNvbmRzLg0K
KyAqLw0KKw0KK3ZvaWQNCitpcHBvcnRfdGljayh4dHApDQorCXZvaWQgKnh0
cDsNCit7DQorCWlmIChpcHBvcnRfdGNwYWxsb2NzID4gaXBwb3J0X3RjcGxh
c3Rjb3VudCArIGlwcG9ydF9yYW5kb21jcHMpIHsNCisJCWlmIChpcHBvcnRf
c3RvcHJhbmRvbSA9PSAwKQ0KKwkJCXByaW50ZigiU3RvcHBpbmcgcmFuZG9t
IGFsbG9jYXRpb25cbiIpOw0KKwkJaXBwb3J0X3N0b3ByYW5kb20gPSA1Ow0K
Kwl9IGVsc2Ugew0KKwkJaWYgKGlwcG9ydF9zdG9wcmFuZG9tID09IDEpDQor
CQkJcHJpbnRmKCJHb2luZyBiYWNrIHRvIHJhbmRvbSBhbGxvY2F0aW9uXG4i
KTsNCisJCWlmIChpcHBvcnRfc3RvcHJhbmRvbSA+IDApDQorCQkJaXBwb3J0
X3N0b3ByYW5kb20tLTsNCisJfQ0KKwlpcHBvcnRfdGNwbGFzdGNvdW50ID0g
aXBwb3J0X3RjcGFsbG9jczsNCisJY2FsbG91dF9yZXNldCgmaXBwb3J0X3Rp
Y2tfY2FsbG91dCwgaHosIGlwcG9ydF90aWNrLCBOVUxMKTsNCiB9DQpkaWZm
IC11IC1yIC91c3Ivc3JjL3N5cy5vbGQvbmV0aW5ldC9pbl9wY2IuaCAvdXNy
L3NyYy9zeXMvbmV0aW5ldC9pbl9wY2IuaA0KLS0tIC91c3Ivc3JjL3N5cy5v
bGQvbmV0aW5ldC9pbl9wY2IuaAlUaHUgRGVjIDE2IDAzOjI2OjExIDIwMDQN
CisrKyAvdXNyL3NyYy9zeXMvbmV0aW5ldC9pbl9wY2IuaAlTYXQgRGVjIDI1
IDE3OjA5OjAxIDIwMDQNCkBAIC0zMTAsNiArMzEwLDcgQEANCiBleHRlcm4g
aW50CWlwcG9ydF9sYXN0YXV0bzsNCiBleHRlcm4gaW50CWlwcG9ydF9oaWZp
cnN0YXV0bzsNCiBleHRlcm4gaW50CWlwcG9ydF9oaWxhc3RhdXRvOw0KK2V4
dGVybiBzdHJ1Y3QgY2FsbG91dCBpcHBvcnRfdGlja19jYWxsb3V0Ow0KIA0K
IHZvaWQJaW5fcGNicHVyZ2VpZjAgX19QKChzdHJ1Y3QgaW5wY2IgKiwgc3Ry
dWN0IGlmbmV0ICopKTsNCiB2b2lkCWluX2xvc2luZyBfX1AoKHN0cnVjdCBp
bnBjYiAqKSk7DQpAQCAtMzM1LDYgKzMzNiw3IEBADQogaW50CWluX3NldHBl
ZXJhZGRyIF9fUCgoc3RydWN0IHNvY2tldCAqc28sIHN0cnVjdCBzb2NrYWRk
ciAqKm5hbSkpOw0KIGludAlpbl9zZXRzb2NrYWRkciBfX1AoKHN0cnVjdCBz
b2NrZXQgKnNvLCBzdHJ1Y3Qgc29ja2FkZHIgKipuYW0pKTsNCiB2b2lkCWlu
X3BjYnJlbWxpc3RzIF9fUCgoc3RydWN0IGlucGNiICppbnApKTsNCit2b2lk
CWlwcG9ydF90aWNrKHZvaWQgKnh0cCk7DQogaW50CXByaXNvbl94aW5wY2Ig
X19QKChzdHJ1Y3QgcHJvYyAqcCwgc3RydWN0IGlucGNiICppbnApKTsNCiAj
ZW5kaWYgLyogX0tFUk5FTCAqLw0KIA0KZGlmZiAtdSAtciAvdXNyL3NyYy9z
eXMub2xkL25ldGluZXQvaXBfaW5wdXQuYyAvdXNyL3NyYy9zeXMvbmV0aW5l
dC9pcF9pbnB1dC5jDQotLS0gL3Vzci9zcmMvc3lzLm9sZC9uZXRpbmV0L2lw
X2lucHV0LmMJVGh1IERlYyAxNiAwMzoyNjoxMiAyMDA0DQorKysgL3Vzci9z
cmMvc3lzL25ldGluZXQvaXBfaW5wdXQuYwlTYXQgRGVjIDI1IDE3OjE2OjA4
IDIwMDQNCkBAIC00Nyw2ICs0Nyw4IEBADQogDQogI2luY2x1ZGUgPHN5cy9w
YXJhbS5oPg0KICNpbmNsdWRlIDxzeXMvc3lzdG0uaD4NCisjaW5jbHVkZSA8
c3lzL2NhbGxvdXQuaD4NCisjaW5jbHVkZSA8c3lzL2V2ZW50aGFuZGxlci5o
Pg0KICNpbmNsdWRlIDxzeXMvbWJ1Zi5oPg0KICNpbmNsdWRlIDxzeXMvbWFs
bG9jLmg+DQogI2luY2x1ZGUgPHN5cy9kb21haW4uaD4NCkBAIC0xODMsNiAr
MTg1LDcgQEANCiAJKCgoKCh4KSAmIDB4RikgfCAoKCgoeCkgPj4gOCkgJiAw
eEYpIDw8IDQpKSBeICh5KSkgJiBJUFJFQVNTX0hNQVNLKQ0KIA0KIHN0YXRp
YyBzdHJ1Y3QgaXBxIGlwcVtJUFJFQVNTX05IQVNIXTsNCitzdHJ1Y3QgY2Fs
bG91dCBpcHBvcnRfdGlja19jYWxsb3V0Ow0KIGNvbnN0ICBpbnQgICAgaXBp
bnRycV9wcmVzZW50ID0gMTsNCiANCiAjaWZkZWYgSVBDVExfREVGTVRVDQpA
QCAtMjY3LDYgKzI3MCwxMiBAQA0KIAltYXhuaXBxID0gbm1iY2x1c3RlcnMg
LyAzMjsNCiAJbWF4ZnJhZ3NwZXJwYWNrZXQgPSAxNjsNCiANCisJLyogU3Rh
cnQgaXBwb3J0X3RpY2suICovDQorCWNhbGxvdXRfaW5pdCgmaXBwb3J0X3Rp
Y2tfY2FsbG91dCk7DQorCWlwcG9ydF90aWNrKE5VTEwpOw0KKwlFVkVOVEhB
TkRMRVJfUkVHSVNURVIoc2h1dGRvd25fcHJlX3N5bmMsIGlwX2ZpbmksIE5V
TEwsDQorCQlTSFVURE9XTl9QUklfREVGQVVMVCk7DQorDQogI2lmbmRlZiBS
QU5ET01fSVBfSUQNCiAJaXBfaWQgPSB0aW1lX3NlY29uZCAmIDB4ZmZmZjsN
CiAjZW5kaWYNCkBAIC0yNzQsNiArMjgzLDEzIEBADQogDQogCXJlZ2lzdGVy
X25ldGlzcihORVRJU1JfSVAsIGlwaW50cik7DQogfQ0KKw0KK3ZvaWQgaXBf
ZmluaSh4dHApDQorCXZvaWQgKnh0cDsNCit7DQorCWNhbGxvdXRfc3RvcCgm
aXBwb3J0X3RpY2tfY2FsbG91dCk7DQorfQ0KKw0KIA0KIC8qDQogICogWFhY
IHdhdGNoIG91dCB0aGlzIG9uZS4gSXQgaXMgcGVyaGFwcyB1c2VkIGFzIGEg
Y2FjaGUgZm9yDQpkaWZmIC11IC1yIC91c3Ivc3JjL3N5cy5vbGQvbmV0aW5l
dC9pcF92YXIuaCAvdXNyL3NyYy9zeXMvbmV0aW5ldC9pcF92YXIuaA0KLS0t
IC91c3Ivc3JjL3N5cy5vbGQvbmV0aW5ldC9pcF92YXIuaAlUaHUgRGVjIDE2
IDAzOjI2OjEyIDIwMDQNCisrKyAvdXNyL3NyYy9zeXMvbmV0aW5ldC9pcF92
YXIuaAlTYXQgRGVjIDI1IDE3OjEyOjEyIDIwMDQNCkBAIC0xNjAsNiArMTYw
LDcgQEANCiANCiBpbnQJIGlwX2N0bG91dHB1dChzdHJ1Y3Qgc29ja2V0ICos
IHN0cnVjdCBzb2Nrb3B0ICpzb3B0KTsNCiB2b2lkCSBpcF9kcmFpbih2b2lk
KTsNCit2b2lkCSBpcF9maW5pKHZvaWQgKnh0cCk7DQogaW50CSBpcF9mcmFn
bWVudChzdHJ1Y3QgaXAgKmlwLCBzdHJ1Y3QgbWJ1ZiAqKm1fZnJhZywgaW50
IG10dSwNCiAJICAgIHVfbG9uZyBpZl9od2Fzc2lzdF9mbGFncywgaW50IHN3
X2NzdW0pOw0KIHZvaWQJIGlwX2ZyZWVtb3B0aW9ucyhzdHJ1Y3QgaXBfbW9w
dGlvbnMgKik7DQo=

--0-1414219215-1104310690=:26249
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; NAME="portrandom-gen4.patch"
Content-Transfer-Encoding: BASE64
Content-ID: <20041229025810.S26249@odysseus.silby.com>
Content-Description: 
Content-Disposition: ATTACHMENT; FILENAME="portrandom-gen4.patch"

ZGlmZiAtdSAtciAvdXNyL3NyYy9zeXMub2xkL25ldGluZXQvaW5fcGNiLmMg
L3Vzci9zcmMvc3lzL25ldGluZXQvaW5fcGNiLmMNCi0tLSAvdXNyL3NyYy9z
eXMub2xkL25ldGluZXQvaW5fcGNiLmMJRnJpIERlYyAyNCAxOTo0NToxNSAy
MDA0DQorKysgL3Vzci9zcmMvc3lzL25ldGluZXQvaW5fcGNiLmMJU2F0IERl
YyAyNSAxMzo1MToyNCAyMDA0DQpAQCAtNTksNiArNTksOCBAQA0KICNpbmNs
dWRlIDxuZXRpbmV0L2luX3Zhci5oPg0KICNpbmNsdWRlIDxuZXRpbmV0L2lw
X3Zhci5oPg0KICNpbmNsdWRlIDxuZXRpbmV0L3RjcF92YXIuaD4NCisjaW5j
bHVkZSA8bmV0aW5ldC91ZHAuaD4NCisjaW5jbHVkZSA8bmV0aW5ldC91ZHBf
dmFyLmg+DQogI2lmZGVmIElORVQ2DQogI2luY2x1ZGUgPG5ldGluZXQvaXA2
Lmg+DQogI2luY2x1ZGUgPG5ldGluZXQ2L2lwNl92YXIuaD4NCkBAIC05Nyw4
ICs5OSwxMiBAQA0KIGludAlpcHBvcnRfcmVzZXJ2ZWRoaWdoID0gSVBQT1JU
X1JFU0VSVkVEIC0gMTsJLyogMTAyMyAqLw0KIGludAlpcHBvcnRfcmVzZXJ2
ZWRsb3cgPSAwOw0KIA0KLS8qIFNoYWxsIHdlIGFsbG9jYXRlIGVwaGVtZXJh
bCBwb3J0cyBpbiByYW5kb20gb3JkZXI/ICovDQotaW50CWlwcG9ydF9yYW5k
b21pemVkID0gMTsNCisvKiBWYXJpYWJsZXMgZGVhbGluZyB3aXRoIHJhbmRv
bSBlcGhlbWVyYWwgcG9ydCBhbGxvY2F0aW9uLiAqLw0KK2ludAlpcHBvcnRf
cmFuZG9taXplZCA9IDE7CS8qIHVzZXIgY29udHJvbGxlZCB2aWEgc3lzY3Rs
ICovDQoraW50CWlwcG9ydF9yYW5kb21jcHMgPSAyMDsJLyogdXNlciBjb250
cm9sbGVkIHZpYSBzeXNjdGwgKi8NCitpbnQJaXBwb3J0X3N0b3ByYW5kb20g
PSAwOwkvKiB0b2dnbGVkIGJ5IGlwcG9ydF90aWNrICovDQoraW50CWlwcG9y
dF90Y3BhbGxvY3M7DQoraW50CWlwcG9ydF90Y3BsYXN0Y291bnQ7DQogDQog
I2RlZmluZSBSQU5HRUNISyh2YXIsIG1pbiwgbWF4KSBcDQogCWlmICgodmFy
KSA8IChtaW4pKSB7ICh2YXIpID0gKG1pbik7IH0gXA0KQEAgLTE0Myw2ICsx
NDksOCBAQA0KIAkgICBDVExGTEFHX1JXfENUTEZMQUdfU0VDVVJFLCAmaXBw
b3J0X3Jlc2VydmVkbG93LCAwLCAiIik7DQogU1lTQ1RMX0lOVChfbmV0X2lu
ZXRfaXBfcG9ydHJhbmdlLCBPSURfQVVUTywgcmFuZG9taXplZCwNCiAJICAg
Q1RMRkxBR19SVywgJmlwcG9ydF9yYW5kb21pemVkLCAwLCAiIik7DQorU1lT
Q1RMX0lOVChfbmV0X2luZXRfaXBfcG9ydHJhbmdlLCBPSURfQVVUTywgcmFu
ZG9tY3BzLA0KKwkgICBDVExGTEFHX1JXLCAmaXBwb3J0X3JhbmRvbWNwcywg
MCwgIiIpOw0KIA0KIC8qDQogICogaW5fcGNiLmM6IG1hbmFnZSB0aGUgUHJv
dG9jb2wgQ29udHJvbCBCbG9ja3MuDQpAQCAtMjY2LDYgKzI3NCw3IEBADQog
CXVfc2hvcnQgbHBvcnQgPSAwOw0KIAlpbnQgd2lsZCA9IDAsIHJldXNlcG9y
dCA9IChzby0+c29fb3B0aW9ucyAmIFNPX1JFVVNFUE9SVCk7DQogCWludCBl
cnJvciwgcHJpc29uID0gMDsNCisJaW50IGRvcmFuZG9tOw0KIA0KIAlJTlBf
SU5GT19XTE9DS19BU1NFUlQocGNiaW5mbyk7DQogCUlOUF9MT0NLX0FTU0VS
VChpbnApOw0KQEAgLTM5NCw2ICs0MDMsMjAgQEANCiAJCQlsYXN0cG9ydCA9
ICZwY2JpbmZvLT5sYXN0cG9ydDsNCiAJCX0NCiAJCS8qDQorCQkgKiBGb3Ig
VURQLCB1c2UgcmFuZG9tIHBvcnQgYWxsb2NhdGlvbiBhcyBsb25nIGFzIHRo
ZSB1c2VyDQorCQkgKiBhbGxvd3MgaXQuICBGb3IgVENQIChhbmQgYXMgb2Yg
eWV0IHVua25vd24pIGNvbm5lY3Rpb25zLA0KKwkJICogdXNlIHJhbmRvbSBw
b3J0IGFsbG9jYXRpb24gb25seSBpZiB0aGUgdXNlciBhbGxvd3MgaXQgQU5E
DQorCQkgKiBpcHBvcnRfdGljayBhbGxvd3MgaXQuDQorCQkgKi8NCisJCWlm
IChpcHBvcnRfcmFuZG9taXplZCAmJg0KKwkJCSghaXBwb3J0X3N0b3ByYW5k
b20gfHwgcGNiaW5mbyA9PSAmdWRiaW5mbykpDQorCQkJZG9yYW5kb20gPSAx
Ow0KKwkJZWxzZQ0KKwkJCWRvcmFuZG9tID0gMDsNCisJCS8qIE1ha2Ugc3Vy
ZSB0byBub3QgaW5jbHVkZSBVRFAgcGFja2V0cyBpbiB0aGUgY291bnQuICov
DQorCQlpZiAocGNiaW5mbyAhPSAmdWRiaW5mbykNCisJCQlpcHBvcnRfdGNw
YWxsb2NzKys7DQorCQkvKg0KIAkJICogU2ltcGxlIGNoZWNrIHRvIGVuc3Vy
ZSBhbGwgcG9ydHMgYXJlIG5vdCB1c2VkIHVwIGNhdXNpbmcNCiAJCSAqIGEg
ZGVhZGxvY2sgaGVyZS4NCiAJCSAqDQpAQCAtNDA0LDcgKzQyNyw3IEBADQog
CQkJLyoNCiAJCQkgKiBjb3VudGluZyBkb3duDQogCQkJICovDQotCQkJaWYg
KGlwcG9ydF9yYW5kb21pemVkKQ0KKwkJCWlmIChkb3JhbmRvbSkNCiAJCQkJ
Kmxhc3Rwb3J0ID0gZmlyc3QgLQ0KIAkJCQkJICAgIChhcmM0cmFuZG9tKCkg
JSAoZmlyc3QgLSBsYXN0KSk7DQogCQkJY291bnQgPSBmaXJzdCAtIGxhc3Q7
DQpAQCAtNDIyLDcgKzQ0NSw3IEBADQogCQkJLyoNCiAJCQkgKiBjb3VudGlu
ZyB1cA0KIAkJCSAqLw0KLQkJCWlmIChpcHBvcnRfcmFuZG9taXplZCkNCisJ
CQlpZiAoZG9yYW5kb20pDQogCQkJCSpsYXN0cG9ydCA9IGZpcnN0ICsNCiAJ
CQkJCSAgICAoYXJjNHJhbmRvbSgpICUgKGxhc3QgLSBmaXJzdCkpOw0KIAkJ
CWNvdW50ID0gbGFzdCAtIGZpcnN0Ow0KQEAgLTExODAsNCArMTIwMywzMCBA
QA0KIAlTT0NLX1VOTE9DSyhzbyk7DQogCUlOUF9VTkxPQ0soaW5wKTsNCiAj
ZW5kaWYNCit9DQorDQorLyoNCisgKiBpcHBvcnRfdGljayBydW5zIG9uY2Ug
cGVyIHNlY29uZCwgZGV0ZXJtaW5pbmcgaWYgcmFuZG9tIHBvcnQNCisgKiBh
bGxvY2F0aW9uIHNob3VsZCBiZSBjb250aW51ZWQuICBJZiBtb3JlIHRoYW4g
aXBwb3J0X3JhbmRvbWNwcw0KKyAqIHBvcnRzIGhhdmUgYmVlbiBhbGxvY2F0
ZWQgaW4gdGhlIGxhc3Qgc2Vjb25kLCB0aGVuIHdlIHJldHVybiB0bw0KKyAq
IHNlcXVlbnRpYWwgcG9ydCBhbGxvY2F0aW9uLiBXZSByZXR1cm4gdG8gcmFu
ZG9tIGFsbG9jYXRpb24gb25seQ0KKyAqIG9uY2Ugd2UgZHJvcCBiZWxvdyBp
cHBvcnRfcmFuZG9tY3BzIGZvciBhdCBsZWFzdCA1IHNlY29uZHMuDQorICov
DQorDQordm9pZA0KK2lwcG9ydF90aWNrKHh0cCkNCisJdm9pZCAqeHRwOw0K
K3sNCisJaWYgKGlwcG9ydF90Y3BhbGxvY3MgPiBpcHBvcnRfdGNwbGFzdGNv
dW50ICsgaXBwb3J0X3JhbmRvbWNwcykgew0KKwkJaWYgKGlwcG9ydF9zdG9w
cmFuZG9tID09IDApDQorCQkJcHJpbnRmKCJTdG9wcGluZyByYW5kb20gYWxs
b2NhdGlvblxuIik7DQorCQlpcHBvcnRfc3RvcHJhbmRvbSA9IDU7DQorCX0g
ZWxzZSB7DQorCQlpZiAoaXBwb3J0X3N0b3ByYW5kb20gPT0gMSkNCisJCQlw
cmludGYoIkdvaW5nIGJhY2sgdG8gcmFuZG9tIGFsbG9jYXRpb25cbiIpOw0K
KwkJaWYgKGlwcG9ydF9zdG9wcmFuZG9tID4gMCkNCisJCQlpcHBvcnRfc3Rv
cHJhbmRvbS0tOw0KKwl9DQorCWlwcG9ydF90Y3BsYXN0Y291bnQgPSBpcHBv
cnRfdGNwYWxsb2NzOw0KKwljYWxsb3V0X3Jlc2V0KCZpcHBvcnRfdGlja19j
YWxsb3V0LCBoeiwgaXBwb3J0X3RpY2ssIE5VTEwpOw0KIH0NCmRpZmYgLXUg
LXIgL3Vzci9zcmMvc3lzLm9sZC9uZXRpbmV0L2luX3BjYi5oIC91c3Ivc3Jj
L3N5cy9uZXRpbmV0L2luX3BjYi5oDQotLS0gL3Vzci9zcmMvc3lzLm9sZC9u
ZXRpbmV0L2luX3BjYi5oCUZyaSBEZWMgMjQgMTk6NDU6MTUgMjAwNA0KKysr
IC91c3Ivc3JjL3N5cy9uZXRpbmV0L2luX3BjYi5oCUZyaSBEZWMgMjQgMjA6
MDI6MTQgMjAwNA0KQEAgLTMzMyw2ICszMzMsNyBAQA0KIGV4dGVybiBpbnQJ
aXBwb3J0X2xhc3RhdXRvOw0KIGV4dGVybiBpbnQJaXBwb3J0X2hpZmlyc3Rh
dXRvOw0KIGV4dGVybiBpbnQJaXBwb3J0X2hpbGFzdGF1dG87DQorZXh0ZXJu
IHN0cnVjdCBjYWxsb3V0IGlwcG9ydF90aWNrX2NhbGxvdXQ7DQogDQogdm9p
ZAlpbl9wY2JwdXJnZWlmMChzdHJ1Y3QgaW5wY2JpbmZvICosIHN0cnVjdCBp
Zm5ldCAqKTsNCiBpbnQJaW5fcGNiYWxsb2Moc3RydWN0IHNvY2tldCAqLCBz
dHJ1Y3QgaW5wY2JpbmZvICosIGNvbnN0IGNoYXIgKik7DQpAQCAtMzYyLDYg
KzM2Myw3IEBADQogCWluX3NvY2thZGRyKGluX3BvcnRfdCBwb3J0LCBzdHJ1
Y3QgaW5fYWRkciAqYWRkcik7DQogdm9pZAlpbl9wY2Jzb3NldGxhYmVsKHN0
cnVjdCBzb2NrZXQgKnNvKTsNCiB2b2lkCWluX3BjYnJlbWxpc3RzKHN0cnVj
dCBpbnBjYiAqaW5wKTsNCit2b2lkCWlwcG9ydF90aWNrKHZvaWQgKnh0cCk7
DQogI2VuZGlmIC8qIF9LRVJORUwgKi8NCiANCiAjZW5kaWYgLyogIV9ORVRJ
TkVUX0lOX1BDQl9IXyAqLw0KZGlmZiAtdSAtciAvdXNyL3NyYy9zeXMub2xk
L25ldGluZXQvaXBfaW5wdXQuYyAvdXNyL3NyYy9zeXMvbmV0aW5ldC9pcF9p
bnB1dC5jDQotLS0gL3Vzci9zcmMvc3lzLm9sZC9uZXRpbmV0L2lwX2lucHV0
LmMJRnJpIERlYyAyNCAxOTo0NToxNSAyMDA0DQorKysgL3Vzci9zcmMvc3lz
L25ldGluZXQvaXBfaW5wdXQuYwlTYXQgRGVjIDI1IDEzOjM3OjUxIDIwMDQN
CkBAIC0zOCw2ICszOCw3IEBADQogDQogI2luY2x1ZGUgPHN5cy9wYXJhbS5o
Pg0KICNpbmNsdWRlIDxzeXMvc3lzdG0uaD4NCisjaW5jbHVkZSA8c3lzL2Nh
bGxvdXQuaD4NCiAjaW5jbHVkZSA8c3lzL21hYy5oPg0KICNpbmNsdWRlIDxz
eXMvbWJ1Zi5oPg0KICNpbmNsdWRlIDxzeXMvbWFsbG9jLmg+DQpAQCAtMTg2
LDYgKzE4Nyw3IEBADQogDQogc3RhdGljIFRBSUxRX0hFQUQoaXBxaGVhZCwg
aXBxKSBpcHFbSVBSRUFTU19OSEFTSF07DQogc3RydWN0IG10eCBpcHFsb2Nr
Ow0KK3N0cnVjdCBjYWxsb3V0IGlwcG9ydF90aWNrX2NhbGxvdXQ7DQogDQog
I2RlZmluZQlJUFFfTE9DSygpCW10eF9sb2NrKCZpcHFsb2NrKQ0KICNkZWZp
bmUJSVBRX1VOTE9DSygpCW10eF91bmxvY2soJmlwcWxvY2spDQpAQCAtMjc5
LDExICsyODEsMjMgQEANCiAJbWF4bmlwcSA9IG5tYmNsdXN0ZXJzIC8gMzI7
DQogCW1heGZyYWdzcGVycGFja2V0ID0gMTY7DQogDQorCS8qIFN0YXJ0IGlw
cG9ydF90aWNrLiAqLw0KKwljYWxsb3V0X2luaXQoJmlwcG9ydF90aWNrX2Nh
bGxvdXQsIENBTExPVVRfTVBTQUZFKTsNCisJaXBwb3J0X3RpY2soTlVMTCk7
DQorCUVWRU5USEFORExFUl9SRUdJU1RFUihzaHV0ZG93bl9wcmVfc3luYywg
aXBfZmluaSwgTlVMTCwNCisJCVNIVVRET1dOX1BSSV9ERUZBVUxUKTsNCisN
CiAJLyogSW5pdGlhbGl6ZSB2YXJpb3VzIG90aGVyIHJlbWFpbmluZyB0aGlu
Z3MuICovDQogCWlwX2lkID0gdGltZV9zZWNvbmQgJiAweGZmZmY7DQogCWlw
aW50cnEuaWZxX21heGxlbiA9IGlwcW1heGxlbjsNCiAJbXR4X2luaXQoJmlw
aW50cnEuaWZxX210eCwgImlwX2lucSIsIE5VTEwsIE1UWF9ERUYpOw0KIAlu
ZXRpc3JfcmVnaXN0ZXIoTkVUSVNSX0lQLCBpcF9pbnB1dCwgJmlwaW50cnEs
IE5FVElTUl9NUFNBRkUpOw0KK30NCisNCit2b2lkIGlwX2ZpbmkoeHRwKQ0K
Kwl2b2lkICp4dHA7DQorew0KKwljYWxsb3V0X3N0b3AoJmlwcG9ydF90aWNr
X2NhbGxvdXQpOw0KIH0NCiANCiAvKg0KT25seSBpbiAvdXNyL3NyYy9zeXMv
bmV0aW5ldDogaXBfaW5wdXQuYy5vcmlnDQpkaWZmIC11IC1yIC91c3Ivc3Jj
L3N5cy5vbGQvbmV0aW5ldC9pcF92YXIuaCAvdXNyL3NyYy9zeXMvbmV0aW5l
dC9pcF92YXIuaA0KLS0tIC91c3Ivc3JjL3N5cy5vbGQvbmV0aW5ldC9pcF92
YXIuaAlGcmkgRGVjIDI0IDE5OjQ1OjE1IDIwMDQNCisrKyAvdXNyL3NyYy9z
eXMvbmV0aW5ldC9pcF92YXIuaAlTYXQgRGVjIDI1IDEzOjI5OjU0IDIwMDQN
CkBAIC0xNTksNiArMTU5LDcgQEANCiANCiBpbnQJIGlwX2N0bG91dHB1dChz
dHJ1Y3Qgc29ja2V0ICosIHN0cnVjdCBzb2Nrb3B0ICpzb3B0KTsNCiB2b2lk
CSBpcF9kcmFpbih2b2lkKTsNCit2b2lkCSBpcF9maW5pKHZvaWQgKnh0cCk7
DQogaW50CSBpcF9mcmFnbWVudChzdHJ1Y3QgaXAgKmlwLCBzdHJ1Y3QgbWJ1
ZiAqKm1fZnJhZywgaW50IG10dSwNCiAJICAgIHVfbG9uZyBpZl9od2Fzc2lz
dF9mbGFncywgaW50IHN3X2NzdW0pOw0KIHZvaWQJIGlwX2ZyZWVtb3B0aW9u
cyhzdHJ1Y3QgaXBfbW9wdGlvbnMgKik7DQo=

--0-1414219215-1104310690=:26249--

From owner-freebsd-net@FreeBSD.ORG  Wed Dec 29 13:09:53 2004
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B52D416A4CE
	for <net@FreeBSD.org>; Wed, 29 Dec 2004 13:09:53 +0000 (GMT)
Received: from mp2.macomnet.net (mp2.macomnet.net [195.128.64.6])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E78AB43D31
	for <net@FreeBSD.org>; Wed, 29 Dec 2004 13:09:52 +0000 (GMT)
	(envelope-from maxim@FreeBSD.org)
Received-SPF: pass (mp2.macomnet.net: domain of maxim@FreeBSD.org designates
	127.0.0.1 as permitted sender) receiver=mp2.macomnet.net; client_ip=127.0.0.1;
	envelope-from=maxim@FreeBSD.org;
Received: from localhost (localhost [127.0.0.1])
	by mp2.macomnet.net (8.12.11/8.12.11) with ESMTP id iBTD9oGW075360;
	Wed, 29 Dec 2004 16:09:50 +0300 (MSK)
	(envelope-from maxim@FreeBSD.org)
Date: Wed, 29 Dec 2004 16:09:50 +0300 (MSK)
From: Maxim Konovalov <maxim@FreeBSD.org>
To: Mike Silbersack <silby@silby.com>
In-Reply-To: <20041229025718.U26249@odysseus.silby.com>
Message-ID: <20041229155419.I74642@mp2.macomnet.net>
References: <20041218033226.L28788@odysseus.silby.com>
 <20041229025718.U26249@odysseus.silby.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-SpamTest-Info: Profile: Formal (188/041227)
X-SpamTest-Info: Profile: Detect Hard (4/030526)
X-SpamTest-Info: Profile: SysLog
X-SpamTest-Info: Profile: Marking - Keywords (2/030321)
X-SpamTest-Status: Not detected
X-SpamTest-Version: SMTP-Filter Version 2.0.0 [0124], SpamtestISP/Release
cc: net@FreeBSD.org
Subject: Re: Update: Alternate port randomization approaches
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Dec 2004 13:09:53 -0000

On Wed, 29 Dec 2004, 03:02-0600, Mike Silbersack wrote:

> On Sat, 18 Dec 2004, Mike Silbersack wrote:
>
> > There have been a few reports by users of front end web proxies and other
> > systems under FreeBSD that port randomization causes them problems under
> > load.  This seems to be due to a combination of port randomization and
> > rapid connections to the same host causing ports to be recycled before
> > the ISN has advanced past the end of the previous connection, thereby
> > causing the TIME_WAIT socket on the receiving end to ignore the new SYN.
>
> Based on testing done by Igor Sysoev, I've found that my original patch is
> insufficient; even as little as one randomizaion per second can cause problems
> for some users.  As a result, I've created the attached patch (versions for
> both 6.x and 4.x are included).  It implements a relatively simple algorithm:
> Port randomization is turned disable once the connection rate goes above 20
> connections per second, and it is not reenabled until the connection rate
> falls below 20 cps for 5 seconds straight.
>
> This appears to work for Igor, and it seems safe enough to commit before
> 4.11-RC2.  But, if possible, I'd like a few more sets of eyes to doublecheck
> the concept and code; please take a look at it if you have a chance.

Again, it's not clear for me why we don't follow our usual
deveplopment cycle here: commit & test in HEAD and then MFC to STABLE?

-- 
Maxim Konovalov

From owner-freebsd-net@FreeBSD.ORG  Wed Dec 29 18:34:18 2004
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: by hub.freebsd.org (Postfix, from userid 1017)
	id EB74216A4DB; Wed, 29 Dec 2004 18:34:18 +0000 (GMT)
Date: Wed, 29 Dec 2004 18:34:18 +0000
From: Tony Ackerman <tackerman@hub.freebsd.org>
To: freebsd-net@freebsd.org
Message-ID: <20041229183418.GA53016@hub.freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.1i
Subject: Intel Pro/1000 Nic - no communication with network
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Dec 2004 18:34:19 -0000

If you have multiple adapters in the system there could be some confusion caused by
the way that the adapters were enumerated.  Try "pciconf -l |grep 20000" to view all
of the Ethernet adapters in the system and which drivers are attached to them.  What
is your output from this command?  Are you getting any link indicators LEDs lit?

From owner-freebsd-net@FreeBSD.ORG  Thu Dec 30 10:42:18 2004
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8508C16A4CE
	for <net@FreeBSD.org>; Thu, 30 Dec 2004 10:42:18 +0000 (GMT)
Received: from relay01.pair.com (relay01.pair.com [209.68.5.15])
	by mx1.FreeBSD.org (Postfix) with SMTP id 067CF43D4C
	for <net@FreeBSD.org>; Thu, 30 Dec 2004 10:42:18 +0000 (GMT)
	(envelope-from silby@silby.com)
Received: (qmail 23465 invoked from network); 30 Dec 2004 10:42:16 -0000
Received: from unknown (HELO localhost) (unknown)
  by unknown with SMTP; 30 Dec 2004 10:42:16 -0000
X-pair-Authenticated: 209.68.2.70
Date: Thu, 30 Dec 2004 04:42:15 -0600 (CST)
From: Mike Silbersack <silby@silby.com>
To: Maxim Konovalov <maxim@FreeBSD.org>
In-Reply-To: <20041229155419.I74642@mp2.macomnet.net>
Message-ID: <20041230042939.L35911@odysseus.silby.com>
References: <20041218033226.L28788@odysseus.silby.com>
	<20041229155419.I74642@mp2.macomnet.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
cc: net@FreeBSD.org
Subject: Re: Update: Alternate port randomization approaches
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Dec 2004 10:42:18 -0000


On Wed, 29 Dec 2004, Maxim Konovalov wrote:

> On Wed, 29 Dec 2004, 03:02-0600, Mike Silbersack wrote:
>> This appears to work for Igor, and it seems safe enough to commit before
>> 4.11-RC2.  But, if possible, I'd like a few more sets of eyes to doublecheck
>> the concept and code; please take a look at it if you have a chance.
>
> Again, it's not clear for me why we don't follow our usual
> deveplopment cycle here: commit & test in HEAD and then MFC to STABLE?
>
> -- 
> Maxim Konovalov

The problems random port allocation exposes only occur in situations where 
machine A is making repeated connections to machine B, so it's limited to 
situations like front-end web proxies, connections to database servers, 
and a few other things.  General web servers, ftp servers, SMTP servers, 
etc, aren't affected.  So, committing to -current won't cause us to learn 
anything; specific testers are needed.

I should have worked on this issue months ago, but I didn't, so I'm trying 
to come up with something safe as quickly as possible.  This is 
necessitated because 4.11 is going to be the last in the 4.11 series, so 
this can't be pushed off until after 4.11 is published - there'd be little 
point in bothering at that time.

Igor has been generous enough to test the various iterations of this patch 
as I've developed them and tested on a production system to see if they 
work for him.  Based on his results, I think we're pretty close to an 
acceptable compromised between security (full randomization) and proper 
operation (no randomization.)  We're now looking at settings more along 
the lines of a 10 connections per second ceiling and a 45 second threshold 
before randomization is reenabled, FWIW.

I'm not too concerned about general testing because these patches are 
quite simple; they're modifications of the previous behavior, so they 
won't create any new problems.  As far as bugs in the implementaton go, 
well, anyone is welcome to do a quick review. :)

Mike "Silby" Silbersack