From owner-freebsd-net@FreeBSD.ORG Sun Dec 26 02:19:42 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 569EA16A4CE; Sun, 26 Dec 2004 02:19:42 +0000 (GMT) Received: from smtp.uol.com.br (smtpout1.uol.com.br [200.221.4.192]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2C1CE43D1F; Sun, 26 Dec 2004 02:19:41 +0000 (GMT) (envelope-from jonny@jonny.eng.br) Received: from [200.164.27.103] (200164027103.user.veloxzone.com.br [200.164.27.103]) by scorpion1.uol.com.br (Postfix) with ESMTP id 79E8D774E; Sun, 26 Dec 2004 00:19:32 -0200 (BRST) Message-ID: <41CE1FB5.4080401@jonny.eng.br> Date: Sun, 26 Dec 2004 00:19:33 -0200 From: =?ISO-8859-1?Q?Jo=E3o_Carlos_Mendes_Lu=EDs?= User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Robert Watson References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit cc: Jeff Behl cc: freebsd-performance@freebsd.org cc: freebsd-net@freebsd.org Subject: Re: %cpu in system - squid performance in FreeBSD 5.3 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Dec 2004 02:19:42 -0000 Robert Watson wrote: > On Thu, 23 Dec 2004, Jeff Behl wrote: > >>As a follow up to the below (original message at the very bottom), I >>installed a load balancer in front of the machines which terminates the >>tcp connections from clients and opens up a few, persistent connections >>to each server over which requests are pipelined. In this scenario >>everything is copasetic: > > I'm not very familiar with Squid's architecture, but I would anticipate > that what you're seeing is that the cost of additional connections served > in parallel is pretty high due to the use of processes. Specifically: if > each TCP connection being served gets its own process, and there are a lot > of TCP connections, you'll be doing a lot of process forking, context > switching, exceeding cache sizes, etc. With just a couple of connections, > even if they're doing the same "work", the overhead is much lower. > Depending on how much time you're willing to invest in this, we can > probably do quite a bit to diagnose where the cost is coming from and look > for any specific problems or areas we could optimize. It must not be this. Squid is mostly a single process system, with scheduling based on descriptors and select/poll. Recent versions added some parallelism in other processes, but just for file reading/writing (diskd) and regular expression processing for ACLs. Even DNS, which previously ran on blocking I/O in secondary processes now run internally in the select/poll scheduler. I also have some experience in older versions of squid, in which the same machine running the same version of squid, and changing Linux for FreeBSD raised the maximum simultaneus conection limit. > I might start by turning on kernel profiling and doing a profile dump > under load. Be aware that turning on profiling uses up a lot of CPU > itself, so will reduce the capacity of the system. There's probably > documentation elsewhere, but the process I use to set up profiling is > here: I did not make any tests on this, but I would expect profiling to fail, since every step of the scheduler is very small, and deals with the smallest I/O available at that time. Indeed, based on the original report I would search for some optimization on descriptor searching in poll or select, whichever squid has chosen to use on FreeBSD (probably select, looking at the top output). This is one of the crucial points on squid performance. The other one is disk access, for sure, but the experimente describe would not change disk access patterns, would it? > http://www.watson.org/~robert/freebsd/netperf/profile/ > > Note that it warns the some results may be incorrect on SMP. I think it > would be useful to give it a try anyway just to see if we get something > useful. As I said before, beeing a single process scheduler, squid does not gain much from SMP. The secondary processes would benefit from the extra CPU, though. Maybe interrupt processing also, if the giant lock does not interfere in any part of the processing path. > As a final question: other than CPU consumption, do you have a reliable > way to measure how efficiently the system is operating -- in particular, > how fast it is able to serve data? Having some sort of metric for > performance can be quite useful in optimizing, as it can tell us whether One thing I fail to measure in FreeBSD is the reason for delays in disk access times. How can I prove that the delay is on disk, and determine how to optimize it? systat -v is very useful, but does not give me all answers. >>last pid: 3377; load averages: 0.12, 0.09, 0.08 >>up 0+17:24:53 10:02:13 >>31 processes: 1 running, 30 sleeping >>CPU states: 5.1% user, 0.0% nice, 1.8% system, 1.2% interrupt, 92.0% >>idle >>Mem: 75M Active, 187M Inact, 168M Wired, 40K Cache, 214M Buf, 1482M Free >>Swap: 4069M Total, 4069M Free >> >> PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU >>COMMAND >> 474 squid 96 0 68276K 62480K select 0 53:38 16.80% 16.80% >>squid >> 311 bind 20 0 10628K 6016K kserel 0 12:28 0.00% 0.00% >>named Jonny -- João Carlos Mendes Luís - Networking Engineer - jonny@jonny.eng.br From owner-freebsd-net@FreeBSD.ORG Sun Dec 26 07:14:16 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A480216A4CE; Sun, 26 Dec 2004 07:14:16 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3FEBD43D2D; Sun, 26 Dec 2004 07:14:16 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.13.1/8.13.1) with ESMTP id iBQ7AxKU055168; Sun, 26 Dec 2004 02:10:59 -0500 (EST) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)iBQ7Axnk055164; Sun, 26 Dec 2004 07:10:59 GMT (envelope-from robert@fledge.watson.org) Date: Sun, 26 Dec 2004 07:10:59 +0000 (GMT) From: Robert Watson X-Sender: robert@fledge.watson.org To: =?ISO-8859-1?Q?Jo=E3o_Carlos_Mendes_Lu=EDs?= In-Reply-To: <41CE1FB5.4080401@jonny.eng.br> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE cc: Jeff Behl cc: freebsd-performance@freebsd.org cc: freebsd-net@freebsd.org Subject: Re: %cpu in system - squid performance in FreeBSD 5.3 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Dec 2004 07:14:16 -0000 On Sun, 26 Dec 2004, Jo=E3o Carlos Mendes Lu=EDs wrote: > It must not be this. Squid is mostly a single process system, with= =20 > scheduling based on descriptors and select/poll. Recent versions added= =20 > some parallelism in other processes, but just for file reading/writing=20 > (diskd) and regular expression processing for ACLs. Even DNS, which=20 > previously ran on blocking I/O in secondary processes now run internally= =20 > in the select/poll scheduler. Thanks for this information. > > I might start by turning on kernel profiling and doing a profile dump > > under load. Be aware that turning on profiling uses up a lot of CPU > > itself, so will reduce the capacity of the system. There's probably > > documentation elsewhere, but the process I use to set up profiling is > > here: >=20 > I did not make any tests on this, but I would expect profiling to > fail, since every step of the scheduler is very small, and deals with > the smallest I/O available at that time.=20 This is kernel profiling, not application profiling, and would hopefully give us information on where the kernel was spending most of its time, since in the environment in question system time appears to be dominant.=20 If SMP in theory makes little difference to Squid performance, then switching to a UP kernel may well make kernel profiling more reliable and hence more useful in tracking systemn time. > Indeed, based on the original report I would search for some > optimization on descriptor searching in poll or select, whichever squid > has chosen to use on FreeBSD (probably select, looking at the top > output). This is one of the crucial points on squid performance. The > other one is disk access, for sure, but the experimente describe would > not change disk access patterns, would it?=20 The reporter described a very high percentage of system time -- time spent blocked on disk I/O isn't billed to system time; if spending lots of time waiting on disk I/O for a single process, you'd see idle time rather than system time predominating, I believe. > > As a final question: other than CPU consumption, do you have a reliable > > way to measure how efficiently the system is operating -- in particular= , > > how fast it is able to serve data? Having some sort of metric for > > performance can be quite useful in optimizing, as it can tell us whethe= r >=20 > One thing I fail to measure in FreeBSD is the reason for delays in > disk access times. How can I prove that the delay is on disk, and > determine how to optimize it? systat -v is very useful, but does not > give me all answers.=20 I'm not sure there are useful summary tools at a system-wide level for this, but it is possible to use KTR(9) to trace the associated scheduler and disk events. In particular, I recently added high level tracing of g_down and g_up GEOM events to KTR. Jeff Roberson is about to commit a scheduler visualization tool that interprets KTR events relating to the scheduler that may also be useful. It would certainly be extremely useful to have a tool for normal system operation that could be pointed at a process to say "show me the percent of time spent on various wait channels for pid 50". ktrace(1) has the ability to track context switches but appears not to provide enough information to figure out why the context switch took place currently. I'll investigate this in the next couple of days -- the trick is to gather this sort of statistic without too much additional overhead. If that's not easily possible, then simply post-processing KTR may be the right approach. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research From owner-freebsd-net@FreeBSD.ORG Sun Dec 26 18:10:48 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B421216A4CE for ; Sun, 26 Dec 2004 18:10:48 +0000 (GMT) Received: from borgtech.ca (borgtech.ca [216.187.106.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id 735DE43D39 for ; Sun, 26 Dec 2004 18:10:48 +0000 (GMT) (envelope-from asegu@borgtech.ca) Received: from asegulaptop (ao3-m223.net.t-com.hr [195.29.34.223]) by borgtech.ca (Postfix) with ESMTP id 3C55654A5; Sun, 26 Dec 2004 18:12:40 +0000 (GMT) From: "Andrew Seguin" To: Date: Sun, 26 Dec 2004 19:10:29 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.5510 Thread-Index: AcTmwms23PKinut7T5aclTXOWMtivgEsaavA In-Reply-To: <8510784015.20041220213227@star-sw.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Message-Id: <20041226181240.3C55654A5@borgtech.ca> cc: "'Nickolay A. Kritsky'" Subject: RE: FW: Curiosity in IPFW/Freebsd bridge. [more] 802.1q VLAN at fault? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Dec 2004 18:10:48 -0000 My apologies for not replying sooner. However, a few days before Christmas, I got the time to make the test and the news is... it works. A small curiosity however is that I had problem with the 'promisc' flag being turned off. I ended up creating a small startup script to set the sysctl and configure the netcards manually. I thank all who helped me get this working! Andrew -----Original Message----- From: Nickolay A. Kritsky [mailto:nkritsky@star-sw.com] Sent: Monday, December 20, 2004 7:32 PM To: asegu@borgtech.ca Cc: freebsd-net@freebsd.org Subject: RE: FW: Curiosity in IPFW/Freebsd bridge. [more] 802.1q VLAN at fault? Hello asegu, This one should work OK. But do not forget to put parent interfaces in up and promisc mode in your rc.conf, otherwise you will not see any vlan-bridging. Sunday, December 19, 2004, 11:33:57 PM, asegu@borgtech.ca wrote: abc> Ok, the whole discussion to date led to how VLAN traffic wasn't being abc> registered by IPFW in my system. I think that it'll probably be too late abc> for a code change to fix my problem, so I'm going to go the route of abc> changing the network configuration. abc> I've rebuilt to 4.10 and.. And I had no luck there (IPFW _really_ doesn't abc> see the traffic now!). On the other hand, I've read about vlan pseudo-dev abc> and goten myself access to the switch's configuration. abc> So tomorrow evening I plan on changing the vlan id used to 3, and then in abc> freebsd, use the following configuration(and I post this to the list to abc> see if anybody knows that this is going to fail) fxp1 -->> router (uses ID 2) fxp0 -->> switch (uses ID 2, will switch to ID 3) abc> ifconfig vlan1 vlan 3 vlandev fxp0 abc> ifconfig vlan0 vlan 2 vlandev fxp1 abc> sysctl net.link.ether.bridge_cfg=vlan1,vlan0 abc> sysctl net.link.ether.bridge_ipfw=1 abc> Does anybody think this will allow IPFW to see the packets? or that this abc> will outright fail? abc> Thank you everybody, abc> Andrew -- Best regards, ; Nickolay A. Kritsky ; SysAdmin STAR Software LLC ; mailto:nkritsky@star-sw.com -- No virus found in this incoming message. Checked by AVG Anti-Virus. Version: 7.0.296 / Virus Database: 265.6.0 - Release Date: 12/17/2004 -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.296 / Virus Database: 265.6.4 - Release Date: 12/22/2004 From owner-freebsd-net@FreeBSD.ORG Mon Dec 27 07:05:16 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E3E1516A4CE for ; Mon, 27 Dec 2004 07:05:16 +0000 (GMT) Received: from mx01.bos.ma.towardex.com (mx01.bos.ma.towardex.com [65.124.16.9]) by mx1.FreeBSD.org (Postfix) with ESMTP id C2E0A43D31 for ; Mon, 27 Dec 2004 07:05:14 +0000 (GMT) (envelope-from haesu@mx01.bos.ma.towardex.com) Received: by mx01.bos.ma.towardex.com (TowardEX ESMTP 3.0p11_DAKN, from userid 1001) id 59CD82F946; Mon, 27 Dec 2004 02:05:14 -0500 (EST) Date: Mon, 27 Dec 2004 02:05:14 -0500 From: James To: freebsd-net@freebsd.org Message-ID: <20041227070514.GA68890@scylla.towardex.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="FCuugMFkClbJLl1L" Content-Disposition: inline User-Agent: Mutt/1.4.1i Subject: Receive path for ip_fastforward X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Dec 2004 07:05:17 -0000 --FCuugMFkClbJLl1L Content-Type: text/plain; charset=us-ascii Content-Disposition: inline As requested, here you go. What is included in the email attachments: 1. Modified files in raw format (for easier reads) - ip_fastfwd.c (sys/netinet) - ip_input.c (sys/netinet) - in.c (sys/netinet) - ip_var.h (sys/netinet) - inet.c (usr.bin/netstat) 2. Unified diff files for each above in .diff format so you can see the changes better with developer's eyes. Notes: - ip_fastfwd.c: Production code, proven to work very well; currently in use on actual production routers pushing 300Mb/s traffic on the network. - ip_input.c: No changes other than mbuf tagging for packets preprocessed by ip_fastfwd in Steps 1 and 2 (the basic sanity/fallback checks). - ip_var.h: Adds one additional variable (ipstat.ips_transit_re) to track packets forwarded to receive path by ip_fastforward. - netstat/inet.c: Adds tracking information for ipstat.ips_transit_re in netstat(1) program. - in.c: Quickie hack (the code we are using on production routers is vastly different, so had to be quickly hacked up for this patch) to add receive path routes to routing table during SIOCSIFADDR. Been tested so far without any problems -- network address, broadcast address, our_own_addresses are installed with lo0/127.0.0.1 as next-hop during SIOCSIFADDR and are properly deleted during SIOCDIFADDR. But please make changes on this as necessary as this is a hack that may present some broken issues. What it is and what it does: - For more information about what Receive Path ACL is all about: - The receive path installs IP addresses that should be forwarded to router's own control plane stack (ip_input and upwards) as /32 host routes to the routing table. During ip_fastforward stage, if the route to destination is a local/receive-path route (RTF_LOCAL), or if the packet needs to be punted to slow ip_input processing path because a further analysis is required, that packet is subject firewall rules that filter on the lo0 interface under INBOUND direction, before being released to ip_input. The receive path work does _NOT_ actually forward the packet to lo0 driver. Doing so will actually break a number of protocols including OSPF and add further processing overhead for packets that need to be punted to ip_input. Instead, packets are simply subject to loopback filtering firewall rules before exiting ip_fastforward. User's Guide: --> Caveat before you start: The receive path uses pfil_hooks firewall API to subject control plane bound packets to loopback filtering rules. At this time, IPFW2 is *NOT* supported. pf(4) is fully supported and is proven to work fine for this application. IPFW does not work since it captures ifnet variable out of mbuf header instead of the ifnet provided by pfil_hooks. Step 1: sysctl -w net.inet.ip.fastforwarding=1 Note: Fast forwarding MUST BE ENABLED in order for receive path to operate. Step 2: Setup pf(4) firewall rules to filter on lo0 at inbound direction. Be sure to allow packets sourced from 127.0.0.0/8 as many routing protocol software packages (including Zebra and Quagga) use loopback interface for their inter-process communications. Also be sure to allow any OSPF or routing protocols your router is running. Example of loopback filtering firewall out of a production router. The example below assumes your router is an edge router with just BGP running: cr1.walt# pfctl -sr pass quick on ge-0/0/0.2 all pass quick on ge-0/1/0.12 all pass quick on ge-0/1/0.203 all pass in quick on lo0 proto tcp from any to any port = ssh keep state pass in quick on lo0 proto tcp from any to any port = bgp keep state pass in quick on lo0 proto tcp from any port = ftp-data to any pass in quick on lo0 proto tcp from any port = ftp to any pass in quick on lo0 proto tcp from any port = http to any pass in quick on lo0 proto udp from any to any port 33434:33534 pass in quick on lo0 proto udp from any port = domain to any pass in quick on lo0 proto icmp all pass in quick on lo0 inet from 127.0.0.0/24 to any block drop in quick on lo0 all pass quick all cr1.walt# Step 3: Packets successfully punted to ip_input either because they are too complex to be dealt with inside fast forwarding path, or because they are destined to router's own addresses, can be tracked by using the netstat(1) utility (after you patch it ofcourse). Example: cr1.walt# netstat -sn -f inet | grep forward 55647205 packets forwarded (52951423 packets fast forwarded) 4927978 packets not forwardable 345712 packets forwarded to receive path cr1.walt# As referenced by the BSD License, I am not liable for any damages arising from your use of this feature submission. Questions: let me know. -J -- James Jun TowardEX Technologies, Inc. Technical Lead Boston IPv4/IPv6 Web Hosting, Colocation and james@towardex.com Network design/consulting & configuration services cell: 1(978)-394-2867 web: http://www.towardex.com , noc: www.twdx.net --FCuugMFkClbJLl1L Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="in.c" /* * Copyright (c) 1982, 1986, 1991, 1993 * The Regents of the University of California. All rights reserved. * Copyright (C) 2001 WIDE Project. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)in.c 8.4 (Berkeley) 1/9/95 * $FreeBSD: src/sys/netinet/in.c,v 1.77.2.1 2004/12/12 19:12:35 mlaier Exp $ */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include static MALLOC_DEFINE(M_IPMADDR, "in_multi", "internet multicast address"); static int in_mask2len(struct in_addr *); static void in_len2mask(struct in_addr *, int); static int in_lifaddr_ioctl(struct socket *, u_long, caddr_t, struct ifnet *, struct thread *); static int in_addprefix(struct in_ifaddr *, int); static int in_scrubprefix(struct in_ifaddr *); static void in_socktrim(struct sockaddr_in *); static int in_ifinit(struct ifnet *, struct in_ifaddr *, struct sockaddr_in *, int); static int subnetsarelocal = 0; SYSCTL_INT(_net_inet_ip, OID_AUTO, subnets_are_local, CTLFLAG_RW, &subnetsarelocal, 0, "Treat all subnets as directly connected"); struct in_multihead in_multihead; /* XXX BSS initialization */ extern struct inpcbinfo ripcbinfo; extern struct inpcbinfo udbinfo; /* * Return 1 if an internet address is for a ``local'' host * (one to which we have a connection). If subnetsarelocal * is true, this includes other subnets of the local net. * Otherwise, it includes only the directly-connected (sub)nets. */ int in_localaddr(in) struct in_addr in; { register u_long i = ntohl(in.s_addr); register struct in_ifaddr *ia; if (subnetsarelocal) { TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link) if ((i & ia->ia_netmask) == ia->ia_net) return (1); } else { TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link) if ((i & ia->ia_subnetmask) == ia->ia_subnet) return (1); } return (0); } /* * Return 1 if an internet address is for the local host and configured * on one of its interfaces. */ int in_localip(in) struct in_addr in; { struct in_ifaddr *ia; LIST_FOREACH(ia, INADDR_HASH(in.s_addr), ia_hash) { if (IA_SIN(ia)->sin_addr.s_addr == in.s_addr) return 1; } return 0; } /* * Determine whether an IP address is in a reserved set of addresses * that may not be forwarded, or whether datagrams to that destination * may be forwarded. */ int in_canforward(in) struct in_addr in; { register u_long i = ntohl(in.s_addr); register u_long net; if (IN_EXPERIMENTAL(i) || IN_MULTICAST(i)) return (0); if (IN_CLASSA(i)) { net = i & IN_CLASSA_NET; if (net == 0 || net == (IN_LOOPBACKNET << IN_CLASSA_NSHIFT)) return (0); } return (1); } /* * Sub-routine for in_ifaddrecv() and in_ifremrecv(). * --james@towardex.com 12/17/2004 */ static void in_ifrecv_request(int call, int cmd, struct in_ifaddr *ia) { struct sockaddr_in all1_sa; struct rtentry *nrt = NULL; struct ifaddr *ifa; int e = 0; struct sockaddr_in subnet = { sizeof(struct sockaddr_in), AF_INET }; struct sockaddr_in loopback = { sizeof(struct sockaddr_in), AF_INET }; ifa = &ia->ia_ifa; bzero(&all1_sa, sizeof(all1_sa)); all1_sa.sin_family = AF_INET; all1_sa.sin_len = sizeof(struct sockaddr_in); all1_sa.sin_addr.s_addr = (u_int32_t)0xffffffff; /* We need to manually specify loopback for network and broadcast * addresses because we can't just let L2 rtrequest handlers to * deal with ifa->if_addr set as gateway address. */ loopback.sin_family = AF_INET; loopback.sin_addr.s_addr = ntohl(INADDR_LOOPBACK); /* * Set the rtflags to RTF_LLINFO so existing apps are happy * with our changes. */ switch (call) { case 0: /* own address request */ rtrequest(cmd, ifa->ifa_addr, sintosa(&loopback), (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt); break; case 1: /* network address request */ rtrequest(cmd, sintosa(&ia->ia_dstaddr), sintosa(&loopback), (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt); break; case 2: /* broadcast address request */ subnet.sin_addr.s_addr = htonl(ia->ia_subnet); subnet.sin_family = AF_INET; rtrequest(cmd, sintosa(&subnet), sintosa(&loopback), (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt); break; default: break; } if (nrt) { RT_LOCK(nrt); /* * Make sure rt_ifa be equal to IFA, the second argument of * the function. We need this because when we refer to * rt_ifa->ia_flags, we assume that the rt_ifa points to * the address, not the loopback. */ if (cmd == RTM_ADD && ifa != nrt->rt_ifa) { IFAFREE(nrt->rt_ifa); IFAREF(ifa); nrt->rt_ifa = ifa; } /* * Report to routing socket. */ rt_newaddrmsg(cmd, ifa, e, nrt); if (cmd == RTM_DELETE) { rtfree(nrt); } else { /* the cmd must be RTM_ADD here */ RT_REMREF(nrt); RT_UNLOCK(nrt); } } } /* * Add own address as loopback rtentry (receive path). We previously add * the route only if necessary (such as point to point circuit), or when * triggered by route cloning. However, a proper RIB and FIB implementation * must contain own-addrs as receive paths, allowing software to manage * its own addresses separately from prefixes. This is required for receive * adjacency/path in ip_fastforward() --james@towardex.com 2004/12/17 */ static void in_ifaddrecv(struct in_ifaddr *ia) { struct rtentry *rt; int need_loop, need_netdst, need_bcast; struct sockaddr_in subnet = { sizeof(struct sockaddr_in), AF_INET }; /* If there is no loopback entry, allocate one */ rt = rtalloc1(ia->ia_ifa.ifa_addr, 0, 0); need_loop = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 || (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0); /* If there is no network entry, allocate one */ if(rt) rtfree(rt); rt = rtalloc1(sintosa(&ia->ia_dstaddr), 0, 0); need_netdst = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 || (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0); /* If there is no broadcast entry, allocate one */ subnet.sin_addr.s_addr = htonl(ia->ia_subnet); subnet.sin_family = AF_INET; if(rt) rtfree(rt); rt = rtalloc1(sintosa(&subnet), 0, 0); need_bcast = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 || (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0); if(rt) rtfree(rt); if(need_loop) in_ifrecv_request(0, RTM_ADD, ia); if(need_netdst) in_ifrecv_request(1, RTM_ADD, ia); if(need_bcast) in_ifrecv_request(2, RTM_ADD, ia); } /* * Remove loopback rtentry's of receive path generated by in_ifaddrecv() * if they exist. -- james 12/17/2004 */ static void in_ifremrecv(struct in_ifaddr *ia) { struct rtentry *rt; /* * Delete the route for ownaddr if it really exists. */ rt = rtalloc1(ia->ia_ifa.ifa_addr, 0, 0); if (rt != NULL && (rt->rt_flags & RTF_HOST) != 0 && (rt->rt_ifp->if_flags & IFF_LOOPBACK) != 0) { rtfree(rt); in_ifrecv_request(0, RTM_DELETE, ia); } /* XXX * Broadcast and network addresses are removed by * by regular interface detach handlers, but we * need to verify the design aspect of this more * later. */ } /* * Trim a mask in a sockaddr */ static void in_socktrim(ap) struct sockaddr_in *ap; { register char *cplim = (char *) &ap->sin_addr; register char *cp = (char *) (&ap->sin_addr + 1); ap->sin_len = 0; while (--cp >= cplim) if (*cp) { (ap)->sin_len = cp - (char *) (ap) + 1; break; } } static int in_mask2len(mask) struct in_addr *mask; { int x, y; u_char *p; p = (u_char *)mask; for (x = 0; x < sizeof(*mask); x++) { if (p[x] != 0xff) break; } y = 0; if (x < sizeof(*mask)) { for (y = 0; y < 8; y++) { if ((p[x] & (0x80 >> y)) == 0) break; } } return x * 8 + y; } static void in_len2mask(mask, len) struct in_addr *mask; int len; { int i; u_char *p; p = (u_char *)mask; bzero(mask, sizeof(*mask)); for (i = 0; i < len / 8; i++) p[i] = 0xff; if (len % 8) p[i] = (0xff00 >> (len % 8)) & 0xff; } /* * Generic internet control operations (ioctl's). * Ifp is 0 if not an interface-specific ioctl. */ /* ARGSUSED */ int in_control(so, cmd, data, ifp, td) struct socket *so; u_long cmd; caddr_t data; register struct ifnet *ifp; struct thread *td; { register struct ifreq *ifr = (struct ifreq *)data; register struct in_ifaddr *ia = 0, *iap; register struct ifaddr *ifa; struct in_addr dst; struct in_ifaddr *oia; struct in_aliasreq *ifra = (struct in_aliasreq *)data; struct sockaddr_in oldaddr; int error, hostIsNew, iaIsNew, maskIsNew, s; iaIsNew = 0; switch (cmd) { case SIOCALIFADDR: case SIOCDLIFADDR: if (td && (error = suser(td)) != 0) return error; /*fall through*/ case SIOCGLIFADDR: if (!ifp) return EINVAL; return in_lifaddr_ioctl(so, cmd, data, ifp, td); } /* * Find address for this interface, if it exists. * * If an alias address was specified, find that one instead of * the first one on the interface, if possible. */ if (ifp) { dst = ((struct sockaddr_in *)&ifr->ifr_addr)->sin_addr; LIST_FOREACH(iap, INADDR_HASH(dst.s_addr), ia_hash) if (iap->ia_ifp == ifp && iap->ia_addr.sin_addr.s_addr == dst.s_addr) { ia = iap; break; } if (ia == NULL) TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { iap = ifatoia(ifa); if (iap->ia_addr.sin_family == AF_INET) { ia = iap; break; } } } switch (cmd) { case SIOCAIFADDR: case SIOCDIFADDR: if (ifp == 0) return (EADDRNOTAVAIL); if (ifra->ifra_addr.sin_family == AF_INET) { for (oia = ia; ia; ia = TAILQ_NEXT(ia, ia_link)) { if (ia->ia_ifp == ifp && ia->ia_addr.sin_addr.s_addr == ifra->ifra_addr.sin_addr.s_addr) break; } if ((ifp->if_flags & IFF_POINTOPOINT) && (cmd == SIOCAIFADDR) && (ifra->ifra_dstaddr.sin_addr.s_addr == INADDR_ANY)) { return EDESTADDRREQ; } } if (cmd == SIOCDIFADDR && ia == 0) return (EADDRNOTAVAIL); /* FALLTHROUGH */ case SIOCSIFADDR: case SIOCSIFNETMASK: case SIOCSIFDSTADDR: if (td && (error = suser(td)) != 0) return error; if (ifp == 0) return (EADDRNOTAVAIL); if (ia == (struct in_ifaddr *)0) { ia = (struct in_ifaddr *) malloc(sizeof *ia, M_IFADDR, M_WAITOK | M_ZERO); if (ia == (struct in_ifaddr *)NULL) return (ENOBUFS); /* * Protect from ipintr() traversing address list * while we're modifying it. */ s = splnet(); TAILQ_INSERT_TAIL(&in_ifaddrhead, ia, ia_link); ifa = &ia->ia_ifa; IFA_LOCK_INIT(ifa); ifa->ifa_addr = (struct sockaddr *)&ia->ia_addr; ifa->ifa_dstaddr = (struct sockaddr *)&ia->ia_dstaddr; ifa->ifa_netmask = (struct sockaddr *)&ia->ia_sockmask; ifa->ifa_refcnt = 1; TAILQ_INSERT_TAIL(&ifp->if_addrhead, ifa, ifa_link); ia->ia_sockmask.sin_len = 8; ia->ia_sockmask.sin_family = AF_INET; if (ifp->if_flags & IFF_BROADCAST) { ia->ia_broadaddr.sin_len = sizeof(ia->ia_addr); ia->ia_broadaddr.sin_family = AF_INET; } ia->ia_ifp = ifp; splx(s); iaIsNew = 1; } break; case SIOCSIFBRDADDR: if (td && (error = suser(td)) != 0) return error; /* FALLTHROUGH */ case SIOCGIFADDR: case SIOCGIFNETMASK: case SIOCGIFDSTADDR: case SIOCGIFBRDADDR: if (ia == (struct in_ifaddr *)0) return (EADDRNOTAVAIL); break; } switch (cmd) { case SIOCGIFADDR: *((struct sockaddr_in *)&ifr->ifr_addr) = ia->ia_addr; return (0); case SIOCGIFBRDADDR: if ((ifp->if_flags & IFF_BROADCAST) == 0) return (EINVAL); *((struct sockaddr_in *)&ifr->ifr_dstaddr) = ia->ia_broadaddr; return (0); case SIOCGIFDSTADDR: if ((ifp->if_flags & IFF_POINTOPOINT) == 0) return (EINVAL); *((struct sockaddr_in *)&ifr->ifr_dstaddr) = ia->ia_dstaddr; return (0); case SIOCGIFNETMASK: *((struct sockaddr_in *)&ifr->ifr_addr) = ia->ia_sockmask; return (0); case SIOCSIFDSTADDR: if ((ifp->if_flags & IFF_POINTOPOINT) == 0) return (EINVAL); oldaddr = ia->ia_dstaddr; ia->ia_dstaddr = *(struct sockaddr_in *)&ifr->ifr_dstaddr; if (ifp->if_ioctl && (error = (*ifp->if_ioctl) (ifp, SIOCSIFDSTADDR, (caddr_t)ia))) { ia->ia_dstaddr = oldaddr; return (error); } if (ia->ia_flags & IFA_ROUTE) { ia->ia_ifa.ifa_dstaddr = (struct sockaddr *)&oldaddr; rtinit(&(ia->ia_ifa), (int)RTM_DELETE, RTF_HOST); ia->ia_ifa.ifa_dstaddr = (struct sockaddr *)&ia->ia_dstaddr; rtinit(&(ia->ia_ifa), (int)RTM_ADD, RTF_HOST|RTF_UP); } return (0); case SIOCSIFBRDADDR: if ((ifp->if_flags & IFF_BROADCAST) == 0) return (EINVAL); ia->ia_broadaddr = *(struct sockaddr_in *)&ifr->ifr_broadaddr; return (0); case SIOCSIFADDR: error = in_ifinit(ifp, ia, (struct sockaddr_in *) &ifr->ifr_addr, 1); if (error != 0 && iaIsNew) break; if (error == 0) EVENTHANDLER_INVOKE(ifaddr_event, ifp); return (0); case SIOCSIFNETMASK: ia->ia_sockmask.sin_addr = ifra->ifra_addr.sin_addr; ia->ia_subnetmask = ntohl(ia->ia_sockmask.sin_addr.s_addr); return (0); case SIOCAIFADDR: maskIsNew = 0; hostIsNew = 1; error = 0; if (ia->ia_addr.sin_family == AF_INET) { if (ifra->ifra_addr.sin_len == 0) { ifra->ifra_addr = ia->ia_addr; hostIsNew = 0; } else if (ifra->ifra_addr.sin_addr.s_addr == ia->ia_addr.sin_addr.s_addr) hostIsNew = 0; } if (ifra->ifra_mask.sin_len) { in_ifscrub(ifp, ia); ia->ia_sockmask = ifra->ifra_mask; ia->ia_sockmask.sin_family = AF_INET; ia->ia_subnetmask = ntohl(ia->ia_sockmask.sin_addr.s_addr); maskIsNew = 1; } if ((ifp->if_flags & IFF_POINTOPOINT) && (ifra->ifra_dstaddr.sin_family == AF_INET)) { in_ifscrub(ifp, ia); ia->ia_dstaddr = ifra->ifra_dstaddr; maskIsNew = 1; /* We lie; but the effect's the same */ } if (ifra->ifra_addr.sin_family == AF_INET && (hostIsNew || maskIsNew)) error = in_ifinit(ifp, ia, &ifra->ifra_addr, 0); if (error != 0 && iaIsNew) break; if ((ifp->if_flags & IFF_BROADCAST) && (ifra->ifra_broadaddr.sin_family == AF_INET)) ia->ia_broadaddr = ifra->ifra_broadaddr; if (error == 0) EVENTHANDLER_INVOKE(ifaddr_event, ifp); return (error); case SIOCDIFADDR: /* * in_ifscrub kills the interface route. */ in_ifscrub(ifp, ia); /* * in_ifadown gets rid of all the rest of * the routes. This is not quite the right * thing to do, but at least if we are running * a routing process they will come back. */ in_ifadown(&ia->ia_ifa, 1); /* * XXX horrible hack to detect that we are being called * from if_detach() */ if (ifaddr_byindex(ifp->if_index) == NULL) { in_pcbpurgeif0(&ripcbinfo, ifp); in_pcbpurgeif0(&udbinfo, ifp); } EVENTHANDLER_INVOKE(ifaddr_event, ifp); error = 0; break; default: if (ifp == 0 || ifp->if_ioctl == 0) return (EOPNOTSUPP); return ((*ifp->if_ioctl)(ifp, cmd, data)); } /* * Protect from ipintr() traversing address list while we're modifying * it. */ s = splnet(); TAILQ_REMOVE(&ifp->if_addrhead, &ia->ia_ifa, ifa_link); TAILQ_REMOVE(&in_ifaddrhead, ia, ia_link); LIST_REMOVE(ia, ia_hash); IFAFREE(&ia->ia_ifa); splx(s); return (error); } /* * SIOC[GAD]LIFADDR. * SIOCGLIFADDR: get first address. (?!?) * SIOCGLIFADDR with IFLR_PREFIX: * get first address that matches the specified prefix. * SIOCALIFADDR: add the specified address. * SIOCALIFADDR with IFLR_PREFIX: * EINVAL since we can't deduce hostid part of the address. * SIOCDLIFADDR: delete the specified address. * SIOCDLIFADDR with IFLR_PREFIX: * delete the first address that matches the specified prefix. * return values: * EINVAL on invalid parameters * EADDRNOTAVAIL on prefix match failed/specified address not found * other values may be returned from in_ioctl() */ static int in_lifaddr_ioctl(so, cmd, data, ifp, td) struct socket *so; u_long cmd; caddr_t data; struct ifnet *ifp; struct thread *td; { struct if_laddrreq *iflr = (struct if_laddrreq *)data; struct ifaddr *ifa; /* sanity checks */ if (!data || !ifp) { panic("invalid argument to in_lifaddr_ioctl"); /*NOTRECHED*/ } switch (cmd) { case SIOCGLIFADDR: /* address must be specified on GET with IFLR_PREFIX */ if ((iflr->flags & IFLR_PREFIX) == 0) break; /*FALLTHROUGH*/ case SIOCALIFADDR: case SIOCDLIFADDR: /* address must be specified on ADD and DELETE */ if (iflr->addr.ss_family != AF_INET) return EINVAL; if (iflr->addr.ss_len != sizeof(struct sockaddr_in)) return EINVAL; /* XXX need improvement */ if (iflr->dstaddr.ss_family && iflr->dstaddr.ss_family != AF_INET) return EINVAL; if (iflr->dstaddr.ss_family && iflr->dstaddr.ss_len != sizeof(struct sockaddr_in)) return EINVAL; break; default: /*shouldn't happen*/ return EOPNOTSUPP; } if (sizeof(struct in_addr) * 8 < iflr->prefixlen) return EINVAL; switch (cmd) { case SIOCALIFADDR: { struct in_aliasreq ifra; if (iflr->flags & IFLR_PREFIX) return EINVAL; /* copy args to in_aliasreq, perform ioctl(SIOCAIFADDR_IN6). */ bzero(&ifra, sizeof(ifra)); bcopy(iflr->iflr_name, ifra.ifra_name, sizeof(ifra.ifra_name)); bcopy(&iflr->addr, &ifra.ifra_addr, iflr->addr.ss_len); if (iflr->dstaddr.ss_family) { /*XXX*/ bcopy(&iflr->dstaddr, &ifra.ifra_dstaddr, iflr->dstaddr.ss_len); } ifra.ifra_mask.sin_family = AF_INET; ifra.ifra_mask.sin_len = sizeof(struct sockaddr_in); in_len2mask(&ifra.ifra_mask.sin_addr, iflr->prefixlen); return in_control(so, SIOCAIFADDR, (caddr_t)&ifra, ifp, td); } case SIOCGLIFADDR: case SIOCDLIFADDR: { struct in_ifaddr *ia; struct in_addr mask, candidate, match; struct sockaddr_in *sin; int cmp; bzero(&mask, sizeof(mask)); if (iflr->flags & IFLR_PREFIX) { /* lookup a prefix rather than address. */ in_len2mask(&mask, iflr->prefixlen); sin = (struct sockaddr_in *)&iflr->addr; match.s_addr = sin->sin_addr.s_addr; match.s_addr &= mask.s_addr; /* if you set extra bits, that's wrong */ if (match.s_addr != sin->sin_addr.s_addr) return EINVAL; cmp = 1; } else { if (cmd == SIOCGLIFADDR) { /* on getting an address, take the 1st match */ cmp = 0; /*XXX*/ } else { /* on deleting an address, do exact match */ in_len2mask(&mask, 32); sin = (struct sockaddr_in *)&iflr->addr; match.s_addr = sin->sin_addr.s_addr; cmp = 1; } } TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { if (ifa->ifa_addr->sa_family != AF_INET6) continue; if (!cmp) break; candidate.s_addr = ((struct sockaddr_in *)&ifa->ifa_addr)->sin_addr.s_addr; candidate.s_addr &= mask.s_addr; if (candidate.s_addr == match.s_addr) break; } if (!ifa) return EADDRNOTAVAIL; ia = (struct in_ifaddr *)ifa; if (cmd == SIOCGLIFADDR) { /* fill in the if_laddrreq structure */ bcopy(&ia->ia_addr, &iflr->addr, ia->ia_addr.sin_len); if ((ifp->if_flags & IFF_POINTOPOINT) != 0) { bcopy(&ia->ia_dstaddr, &iflr->dstaddr, ia->ia_dstaddr.sin_len); } else bzero(&iflr->dstaddr, sizeof(iflr->dstaddr)); iflr->prefixlen = in_mask2len(&ia->ia_sockmask.sin_addr); iflr->flags = 0; /*XXX*/ return 0; } else { struct in_aliasreq ifra; /* fill in_aliasreq and do ioctl(SIOCDIFADDR_IN6) */ bzero(&ifra, sizeof(ifra)); bcopy(iflr->iflr_name, ifra.ifra_name, sizeof(ifra.ifra_name)); bcopy(&ia->ia_addr, &ifra.ifra_addr, ia->ia_addr.sin_len); if ((ifp->if_flags & IFF_POINTOPOINT) != 0) { bcopy(&ia->ia_dstaddr, &ifra.ifra_dstaddr, ia->ia_dstaddr.sin_len); } bcopy(&ia->ia_sockmask, &ifra.ifra_dstaddr, ia->ia_sockmask.sin_len); return in_control(so, SIOCDIFADDR, (caddr_t)&ifra, ifp, td); } } } return EOPNOTSUPP; /*just for safety*/ } /* * Delete any existing route for an interface. */ void in_ifscrub(ifp, ia) register struct ifnet *ifp; register struct in_ifaddr *ia; { in_scrubprefix(ia); /* * delete receive path rtentry's if they exist. */ in_ifremrecv(ia); } /* * Initialize an interface's internet address * and routing table entry. */ static int in_ifinit(ifp, ia, sin, scrub) register struct ifnet *ifp; register struct in_ifaddr *ia; struct sockaddr_in *sin; int scrub; { register u_long i = ntohl(sin->sin_addr.s_addr); struct sockaddr_in oldaddr; int s = splimp(), flags = RTF_UP, error = 0; oldaddr = ia->ia_addr; if (oldaddr.sin_family == AF_INET) LIST_REMOVE(ia, ia_hash); ia->ia_addr = *sin; if (ia->ia_addr.sin_family == AF_INET) LIST_INSERT_HEAD(INADDR_HASH(ia->ia_addr.sin_addr.s_addr), ia, ia_hash); /* * Give the interface a chance to initialize * if this is its first address, * and to validate the address if necessary. */ if (ifp->if_ioctl && (error = (*ifp->if_ioctl)(ifp, SIOCSIFADDR, (caddr_t)ia))) { splx(s); /* LIST_REMOVE(ia, ia_hash) is done in in_control */ ia->ia_addr = oldaddr; if (ia->ia_addr.sin_family == AF_INET) LIST_INSERT_HEAD(INADDR_HASH(ia->ia_addr.sin_addr.s_addr), ia, ia_hash); return (error); } splx(s); if (scrub) { ia->ia_ifa.ifa_addr = (struct sockaddr *)&oldaddr; in_ifscrub(ifp, ia); ia->ia_ifa.ifa_addr = (struct sockaddr *)&ia->ia_addr; } if (IN_CLASSA(i)) ia->ia_netmask = IN_CLASSA_NET; else if (IN_CLASSB(i)) ia->ia_netmask = IN_CLASSB_NET; else ia->ia_netmask = IN_CLASSC_NET; /* * The subnet mask usually includes at least the standard network part, * but may may be smaller in the case of supernetting. * If it is set, we believe it. */ if (ia->ia_subnetmask == 0) { ia->ia_subnetmask = ia->ia_netmask; ia->ia_sockmask.sin_addr.s_addr = htonl(ia->ia_subnetmask); } else ia->ia_netmask &= ia->ia_subnetmask; ia->ia_net = i & ia->ia_netmask; ia->ia_subnet = i & ia->ia_subnetmask; in_socktrim(&ia->ia_sockmask); /* * Add route for the network. */ ia->ia_ifa.ifa_metric = ifp->if_metric; if (ifp->if_flags & IFF_BROADCAST) { ia->ia_broadaddr.sin_addr.s_addr = htonl(ia->ia_subnet | ~ia->ia_subnetmask); ia->ia_netbroadcast.s_addr = htonl(ia->ia_net | ~ ia->ia_netmask); } else if (ifp->if_flags & IFF_LOOPBACK) { ia->ia_dstaddr = ia->ia_addr; flags |= RTF_HOST; } else if (ifp->if_flags & IFF_POINTOPOINT) { if (ia->ia_dstaddr.sin_family != AF_INET) return (0); flags |= RTF_HOST; } if ((error = in_addprefix(ia, flags)) != 0) return (error); /* * If the interface supports multicast, join the "all hosts" * multicast group on that interface. */ if (ifp->if_flags & IFF_MULTICAST) { struct in_addr addr; addr.s_addr = htonl(INADDR_ALLHOSTS_GROUP); in_addmulti(&addr, ifp); } /* * Bring online receive adjacency routes. * -james 2004/12/17 * * Deleted old 2004-09-09 kludge code; this is a cleaner * approach, derived from KAME implementation for INET6. */ in_ifaddrecv(ia); return (error); } #define rtinitflags(x) \ ((((x)->ia_ifp->if_flags & (IFF_LOOPBACK | IFF_POINTOPOINT)) != 0) \ ? RTF_HOST : 0) /* * Check if we have a route for the given prefix already or add a one * accordingly. */ static int in_addprefix(target, flags) struct in_ifaddr *target; int flags; { struct in_ifaddr *ia; struct in_addr prefix, mask, p; int error; if ((flags & RTF_HOST) != 0) prefix = target->ia_dstaddr.sin_addr; else { prefix = target->ia_addr.sin_addr; mask = target->ia_sockmask.sin_addr; prefix.s_addr &= mask.s_addr; } TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link) { if (rtinitflags(ia)) p = ia->ia_dstaddr.sin_addr; else { p = ia->ia_addr.sin_addr; p.s_addr &= ia->ia_sockmask.sin_addr.s_addr; } if (prefix.s_addr != p.s_addr) continue; /* * If we got a matching prefix route inserted by other * interface address, we are done here. */ if (ia->ia_flags & IFA_ROUTE) return 0; } /* * No-one seem to have this prefix route, so we try to insert it. */ error = rtinit(&target->ia_ifa, (int)RTM_ADD, flags); if (!error) target->ia_flags |= IFA_ROUTE; return error; } /* * If there is no other address in the system that can serve a route to the * same prefix, remove the route. Hand over the route to the new address * otherwise. */ static int in_scrubprefix(target) struct in_ifaddr *target; { struct in_ifaddr *ia; struct in_addr prefix, mask, p; int error; if ((target->ia_flags & IFA_ROUTE) == 0) return 0; if (rtinitflags(target)) prefix = target->ia_dstaddr.sin_addr; else { prefix = target->ia_addr.sin_addr; mask = target->ia_sockmask.sin_addr; prefix.s_addr &= mask.s_addr; } TAILQ_FOREACH(ia, &in_ifaddrhead, ia_link) { if (rtinitflags(ia)) p = ia->ia_dstaddr.sin_addr; else { p = ia->ia_addr.sin_addr; p.s_addr &= ia->ia_sockmask.sin_addr.s_addr; } if (prefix.s_addr != p.s_addr) continue; /* * If we got a matching prefix address, move IFA_ROUTE and * the route itself to it. Make sure that routing daemons * get a heads-up. */ if ((ia->ia_flags & IFA_ROUTE) == 0) { rtinit(&(target->ia_ifa), (int)RTM_DELETE, rtinitflags(target)); target->ia_flags &= ~IFA_ROUTE; error = rtinit(&ia->ia_ifa, (int)RTM_ADD, rtinitflags(ia) | RTF_UP); if (error == 0) ia->ia_flags |= IFA_ROUTE; return error; } } /* * As no-one seem to have this prefix, we can remove the route. */ rtinit(&(target->ia_ifa), (int)RTM_DELETE, rtinitflags(target)); target->ia_flags &= ~IFA_ROUTE; return 0; } #undef rtinitflags /* * Return 1 if the address might be a local broadcast address. */ int in_broadcast(in, ifp) struct in_addr in; struct ifnet *ifp; { register struct ifaddr *ifa; u_long t; if (in.s_addr == INADDR_BROADCAST || in.s_addr == INADDR_ANY) return 1; if ((ifp->if_flags & IFF_BROADCAST) == 0) return 0; t = ntohl(in.s_addr); /* * Look through the list of addresses for a match * with a broadcast address. */ #define ia ((struct in_ifaddr *)ifa) TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) if (ifa->ifa_addr->sa_family == AF_INET && (in.s_addr == ia->ia_broadaddr.sin_addr.s_addr || in.s_addr == ia->ia_netbroadcast.s_addr || /* * Check for old-style (host 0) broadcast. */ t == ia->ia_subnet || t == ia->ia_net) && /* * Check for an all one subnetmask. These * only exist when an interface gets a secondary * address. */ ia->ia_subnetmask != (u_long)0xffffffff) return 1; return (0); #undef ia } /* * Add an address to the list of IP multicast addresses for a given interface. */ struct in_multi * in_addmulti(ap, ifp) register struct in_addr *ap; register struct ifnet *ifp; { register struct in_multi *inm; int error; struct sockaddr_in sin; struct ifmultiaddr *ifma; int s = splnet(); /* * Call generic routine to add membership or increment * refcount. It wants addresses in the form of a sockaddr, * so we build one here (being careful to zero the unused bytes). */ bzero(&sin, sizeof sin); sin.sin_family = AF_INET; sin.sin_len = sizeof sin; sin.sin_addr = *ap; error = if_addmulti(ifp, (struct sockaddr *)&sin, &ifma); if (error) { splx(s); return 0; } /* * If ifma->ifma_protospec is null, then if_addmulti() created * a new record. Otherwise, we are done. */ if (ifma->ifma_protospec != 0) { splx(s); return ifma->ifma_protospec; } /* XXX - if_addmulti uses M_WAITOK. Can this really be called at interrupt time? If so, need to fix if_addmulti. XXX */ inm = (struct in_multi *)malloc(sizeof(*inm), M_IPMADDR, M_NOWAIT | M_ZERO); if (inm == NULL) { splx(s); return (NULL); } inm->inm_addr = *ap; inm->inm_ifp = ifp; inm->inm_ifma = ifma; ifma->ifma_protospec = inm; LIST_INSERT_HEAD(&in_multihead, inm, inm_link); /* * Let IGMP know that we have joined a new IP multicast group. */ igmp_joingroup(inm); splx(s); return (inm); } /* * Delete a multicast address record. */ void in_delmulti(inm) register struct in_multi *inm; { struct ifmultiaddr *ifma = inm->inm_ifma; struct in_multi my_inm; int s = splnet(); my_inm.inm_ifp = NULL ; /* don't send the leave msg */ if (ifma->ifma_refcount == 1) { /* * No remaining claims to this record; let IGMP know that * we are leaving the multicast group. * But do it after the if_delmulti() which might reset * the interface and nuke the packet. */ my_inm = *inm ; ifma->ifma_protospec = 0; LIST_REMOVE(inm, inm_link); free(inm, M_IPMADDR); } /* XXX - should be separate API for when we have an ifma? */ if_delmulti(ifma->ifma_ifp, ifma->ifma_addr); if (my_inm.inm_ifp != NULL) igmp_leavegroup(&my_inm); splx(s); } --FCuugMFkClbJLl1L Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="in.c.diff" --- in.org.c Mon Dec 27 01:43:19 2004 +++ in.c Mon Dec 27 01:42:40 2004 @@ -28,7 +28,7 @@ * SUCH DAMAGE. * * @(#)in.c 8.4 (Berkeley) 1/9/95 - * $FreeBSD: /repoman/r/ncvs/src/sys/netinet/in.c,v 1.77.2.1 2004/12/12 19:12:35 mlaier Exp $ + * $FreeBSD: src/sys/netinet/in.c,v 1.77.2.1 2004/12/12 19:12:35 mlaier Exp $ */ #include @@ -136,6 +136,159 @@ } /* + * Sub-routine for in_ifaddrecv() and in_ifremrecv(). + * --james@towardex.com 12/17/2004 + */ +static void +in_ifrecv_request(int call, int cmd, struct in_ifaddr *ia) +{ + struct sockaddr_in all1_sa; + struct rtentry *nrt = NULL; + struct ifaddr *ifa; + int e = 0; + struct sockaddr_in subnet = { sizeof(struct sockaddr_in), AF_INET }; + struct sockaddr_in loopback = { sizeof(struct sockaddr_in), AF_INET }; + + ifa = &ia->ia_ifa; + + bzero(&all1_sa, sizeof(all1_sa)); + all1_sa.sin_family = AF_INET; + all1_sa.sin_len = sizeof(struct sockaddr_in); + all1_sa.sin_addr.s_addr = (u_int32_t)0xffffffff; + + /* We need to manually specify loopback for network and broadcast + * addresses because we can't just let L2 rtrequest handlers to + * deal with ifa->if_addr set as gateway address. + */ + loopback.sin_family = AF_INET; + loopback.sin_addr.s_addr = ntohl(INADDR_LOOPBACK); + + /* + * Set the rtflags to RTF_LLINFO so existing apps are happy + * with our changes. + */ + switch (call) { + case 0: /* own address request */ + rtrequest(cmd, ifa->ifa_addr, sintosa(&loopback), + (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt); + break; + case 1: /* network address request */ + rtrequest(cmd, sintosa(&ia->ia_dstaddr), sintosa(&loopback), + (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt); + break; + case 2: /* broadcast address request */ + subnet.sin_addr.s_addr = htonl(ia->ia_subnet); + subnet.sin_family = AF_INET; + + rtrequest(cmd, sintosa(&subnet), sintosa(&loopback), + (struct sockaddr *)&all1_sa, RTF_UP|RTF_HOST|RTF_LLINFO|RTF_LOCAL, &nrt); + break; + default: + break; + } + + if (nrt) { + RT_LOCK(nrt); + /* + * Make sure rt_ifa be equal to IFA, the second argument of + * the function. We need this because when we refer to + * rt_ifa->ia_flags, we assume that the rt_ifa points to + * the address, not the loopback. + */ + if (cmd == RTM_ADD && ifa != nrt->rt_ifa) { + IFAFREE(nrt->rt_ifa); + IFAREF(ifa); + nrt->rt_ifa = ifa; + } + /* + * Report to routing socket. + */ + rt_newaddrmsg(cmd, ifa, e, nrt); + if (cmd == RTM_DELETE) { + rtfree(nrt); + } else { + /* the cmd must be RTM_ADD here */ + RT_REMREF(nrt); + RT_UNLOCK(nrt); + } + } +} + + +/* + * Add own address as loopback rtentry (receive path). We previously add + * the route only if necessary (such as point to point circuit), or when + * triggered by route cloning. However, a proper RIB and FIB implementation + * must contain own-addrs as receive paths, allowing software to manage + * its own addresses separately from prefixes. This is required for receive + * adjacency/path in ip_fastforward() --james@towardex.com 2004/12/17 + */ +static void +in_ifaddrecv(struct in_ifaddr *ia) +{ + struct rtentry *rt; + int need_loop, need_netdst, need_bcast; + struct sockaddr_in subnet = { sizeof(struct sockaddr_in), AF_INET }; + + /* If there is no loopback entry, allocate one */ + rt = rtalloc1(ia->ia_ifa.ifa_addr, 0, 0); + need_loop = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 || + (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0); + + /* If there is no network entry, allocate one */ + if(rt) rtfree(rt); + rt = rtalloc1(sintosa(&ia->ia_dstaddr), 0, 0); + need_netdst = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 || + (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0); + + /* If there is no broadcast entry, allocate one */ + subnet.sin_addr.s_addr = htonl(ia->ia_subnet); + subnet.sin_family = AF_INET; + if(rt) rtfree(rt); + rt = rtalloc1(sintosa(&subnet), 0, 0); + need_bcast = (rt == NULL || (rt->rt_flags & RTF_HOST) == 0 || + (rt->rt_ifp->if_flags & IFF_LOOPBACK) == 0); + + if(rt) + rtfree(rt); + + if(need_loop) + in_ifrecv_request(0, RTM_ADD, ia); + if(need_netdst) + in_ifrecv_request(1, RTM_ADD, ia); + if(need_bcast) + in_ifrecv_request(2, RTM_ADD, ia); +} + + +/* + * Remove loopback rtentry's of receive path generated by in_ifaddrecv() + * if they exist. -- james 12/17/2004 + */ +static void +in_ifremrecv(struct in_ifaddr *ia) +{ + struct rtentry *rt; + + /* + * Delete the route for ownaddr if it really exists. + */ + rt = rtalloc1(ia->ia_ifa.ifa_addr, 0, 0); + if (rt != NULL && (rt->rt_flags & RTF_HOST) != 0 && + (rt->rt_ifp->if_flags & IFF_LOOPBACK) != 0) { + rtfree(rt); + in_ifrecv_request(0, RTM_DELETE, ia); + } + + /* XXX + * Broadcast and network addresses are removed by + * by regular interface detach handlers, but we + * need to verify the design aspect of this more + * later. + */ +} + +/* * Trim a mask in a sockaddr */ static void @@ -658,6 +811,11 @@ register struct in_ifaddr *ia; { in_scrubprefix(ia); + + /* + * delete receive path rtentry's if they exist. + */ + in_ifremrecv(ia); } /* @@ -752,6 +910,16 @@ addr.s_addr = htonl(INADDR_ALLHOSTS_GROUP); in_addmulti(&addr, ifp); } + + /* + * Bring online receive adjacency routes. + * -james 2004/12/17 + * + * Deleted old 2004-09-09 kludge code; this is a cleaner + * approach, derived from KAME implementation for INET6. + */ + in_ifaddrecv(ia); + return (error); } @@ -806,6 +974,8 @@ target->ia_flags |= IFA_ROUTE; return error; } + + /* * If there is no other address in the system that can serve a route to the --FCuugMFkClbJLl1L Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="inet.c" /* * Copyright (c) 1983, 1988, 1993, 1995 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Berkeley and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #if 0 #ifndef lint static char sccsid[] = "@(#)inet.c 8.5 (Berkeley) 5/24/95"; #endif /* not lint */ #endif #include __FBSDID("$FreeBSD: src/usr.bin/netstat/inet.c,v 1.67 2004/07/26 20:18:11 charnier Exp $"); #include #include #include #include #include #include #include #include #include #include #ifdef INET6 #include #endif /* INET6 */ #include #include #include #include #include #include #include #include #include #define TCPSTATES #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "netstat.h" char *inetname (struct in_addr *); void inetprint (struct in_addr *, int, const char *, int); #ifdef INET6 static int udp_done, tcp_done; #endif /* INET6 */ /* * Print a summary of connections related to an Internet * protocol. For TCP, also give state of connection. * Listening processes (aflag) are suppressed unless the * -a (all) flag is specified. */ void protopr(u_long proto, /* for sysctl version we pass proto # */ const char *name, int af1) { int istcp; static int first = 1; char *buf; const char *mibvar, *vchar; struct tcpcb *tp = NULL; struct inpcb *inp; struct xinpgen *xig, *oxig; struct xsocket *so; size_t len; istcp = 0; switch (proto) { case IPPROTO_TCP: #ifdef INET6 if (tcp_done != 0) return; else tcp_done = 1; #endif istcp = 1; mibvar = "net.inet.tcp.pcblist"; break; case IPPROTO_UDP: #ifdef INET6 if (udp_done != 0) return; else udp_done = 1; #endif mibvar = "net.inet.udp.pcblist"; break; case IPPROTO_DIVERT: mibvar = "net.inet.divert.pcblist"; break; default: mibvar = "net.inet.raw.pcblist"; break; } len = 0; if (sysctlbyname(mibvar, 0, &len, 0, 0) < 0) { if (errno != ENOENT) warn("sysctl: %s", mibvar); return; } if ((buf = malloc(len)) == 0) { warnx("malloc %lu bytes", (u_long)len); return; } if (sysctlbyname(mibvar, buf, &len, 0, 0) < 0) { warn("sysctl: %s", mibvar); free(buf); return; } oxig = xig = (struct xinpgen *)buf; for (xig = (struct xinpgen *)((char *)xig + xig->xig_len); xig->xig_len > sizeof(struct xinpgen); xig = (struct xinpgen *)((char *)xig + xig->xig_len)) { if (istcp) { tp = &((struct xtcpcb *)xig)->xt_tp; inp = &((struct xtcpcb *)xig)->xt_inp; so = &((struct xtcpcb *)xig)->xt_socket; } else { inp = &((struct xinpcb *)xig)->xi_inp; so = &((struct xinpcb *)xig)->xi_socket; } /* Ignore sockets for protocols other than the desired one. */ if (so->xso_protocol != (int)proto) continue; /* Ignore PCBs which were freed during copyout. */ if (inp->inp_gencnt > oxig->xig_gen) continue; if ((af1 == AF_INET && (inp->inp_vflag & INP_IPV4) == 0) #ifdef INET6 || (af1 == AF_INET6 && (inp->inp_vflag & INP_IPV6) == 0) #endif /* INET6 */ || (af1 == AF_UNSPEC && ((inp->inp_vflag & INP_IPV4) == 0 #ifdef INET6 && (inp->inp_vflag & INP_IPV6) == 0 #endif /* INET6 */ )) ) continue; if (!aflag && ( (istcp && tp->t_state == TCPS_LISTEN) || (af1 == AF_INET && inet_lnaof(inp->inp_laddr) == INADDR_ANY) #ifdef INET6 || (af1 == AF_INET6 && IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr)) #endif /* INET6 */ || (af1 == AF_UNSPEC && (((inp->inp_vflag & INP_IPV4) != 0 && inet_lnaof(inp->inp_laddr) == INADDR_ANY) #ifdef INET6 || ((inp->inp_vflag & INP_IPV6) != 0 && IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr)) #endif )) )) continue; if (first) { if (!Lflag) { printf("Active Internet connections"); if (aflag) printf(" (including servers)"); } else printf( "Current listen queue sizes (qlen/incqlen/maxqlen)"); putchar('\n'); if (Aflag) printf("%-8.8s ", "Socket"); if (Lflag) printf("%-5.5s %-14.14s %-22.22s\n", "Proto", "Listen", "Local Address"); else printf((Aflag && !Wflag) ? "%-5.5s %-6.6s %-6.6s %-18.18s %-18.18s %s\n" : "%-5.5s %-6.6s %-6.6s %-22.22s %-22.22s %s\n", "Proto", "Recv-Q", "Send-Q", "Local Address", "Foreign Address", "(state)"); first = 0; } if (Lflag && so->so_qlimit == 0) continue; if (Aflag) { if (istcp) printf("%8lx ", (u_long)inp->inp_ppcb); else printf("%8lx ", (u_long)so->so_pcb); } #ifdef INET6 if ((inp->inp_vflag & INP_IPV6) != 0) vchar = ((inp->inp_vflag & INP_IPV4) != 0) ? "46" : "6 "; else #endif vchar = ((inp->inp_vflag & INP_IPV4) != 0) ? "4 " : " "; printf("%-3.3s%-2.2s ", name, vchar); if (Lflag) { char buf1[15]; snprintf(buf1, 15, "%d/%d/%d", so->so_qlen, so->so_incqlen, so->so_qlimit); printf("%-14.14s ", buf1); } else { printf("%6u %6u ", so->so_rcv.sb_cc, so->so_snd.sb_cc); } if (numeric_port) { if (inp->inp_vflag & INP_IPV4) { inetprint(&inp->inp_laddr, (int)inp->inp_lport, name, 1); if (!Lflag) inetprint(&inp->inp_faddr, (int)inp->inp_fport, name, 1); } #ifdef INET6 else if (inp->inp_vflag & INP_IPV6) { inet6print(&inp->in6p_laddr, (int)inp->inp_lport, name, 1); if (!Lflag) inet6print(&inp->in6p_faddr, (int)inp->inp_fport, name, 1); } /* else nothing printed now */ #endif /* INET6 */ } else if (inp->inp_flags & INP_ANONPORT) { if (inp->inp_vflag & INP_IPV4) { inetprint(&inp->inp_laddr, (int)inp->inp_lport, name, 1); if (!Lflag) inetprint(&inp->inp_faddr, (int)inp->inp_fport, name, 0); } #ifdef INET6 else if (inp->inp_vflag & INP_IPV6) { inet6print(&inp->in6p_laddr, (int)inp->inp_lport, name, 1); if (!Lflag) inet6print(&inp->in6p_faddr, (int)inp->inp_fport, name, 0); } /* else nothing printed now */ #endif /* INET6 */ } else { if (inp->inp_vflag & INP_IPV4) { inetprint(&inp->inp_laddr, (int)inp->inp_lport, name, 0); if (!Lflag) inetprint(&inp->inp_faddr, (int)inp->inp_fport, name, inp->inp_lport != inp->inp_fport); } #ifdef INET6 else if (inp->inp_vflag & INP_IPV6) { inet6print(&inp->in6p_laddr, (int)inp->inp_lport, name, 0); if (!Lflag) inet6print(&inp->in6p_faddr, (int)inp->inp_fport, name, inp->inp_lport != inp->inp_fport); } /* else nothing printed now */ #endif /* INET6 */ } if (istcp && !Lflag) { if (tp->t_state < 0 || tp->t_state >= TCP_NSTATES) printf("%d", tp->t_state); else { printf("%s", tcpstates[tp->t_state]); #if defined(TF_NEEDSYN) && defined(TF_NEEDFIN) /* Show T/TCP `hidden state' */ if (tp->t_flags & (TF_NEEDSYN|TF_NEEDFIN)) putchar('*'); #endif /* defined(TF_NEEDSYN) && defined(TF_NEEDFIN) */ } } putchar('\n'); } if (xig != oxig && xig->xig_gen != oxig->xig_gen) { if (oxig->xig_count > xig->xig_count) { printf("Some %s sockets may have been deleted.\n", name); } else if (oxig->xig_count < xig->xig_count) { printf("Some %s sockets may have been created.\n", name); } else { printf("Some %s sockets may have been created or deleted.\n", name); } } free(buf); } /* * Dump TCP statistics structure. */ void tcp_stats(u_long off __unused, const char *name, int af1 __unused) { struct tcpstat tcpstat, zerostat; size_t len = sizeof tcpstat; if (zflag) memset(&zerostat, 0, len); if (sysctlbyname("net.inet.tcp.stats", &tcpstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { warn("sysctl: net.inet.tcp.stats"); return; } #ifdef INET6 if (tcp_done != 0) return; else tcp_done = 1; #endif printf ("%s:\n", name); #define p(f, m) if (tcpstat.f || sflag <= 1) \ printf(m, tcpstat.f, plural(tcpstat.f)) #define p1a(f, m) if (tcpstat.f || sflag <= 1) \ printf(m, tcpstat.f) #define p2(f1, f2, m) if (tcpstat.f1 || tcpstat.f2 || sflag <= 1) \ printf(m, tcpstat.f1, plural(tcpstat.f1), tcpstat.f2, plural(tcpstat.f2)) #define p2a(f1, f2, m) if (tcpstat.f1 || tcpstat.f2 || sflag <= 1) \ printf(m, tcpstat.f1, plural(tcpstat.f1), tcpstat.f2) #define p3(f, m) if (tcpstat.f || sflag <= 1) \ printf(m, tcpstat.f, plurales(tcpstat.f)) p(tcps_sndtotal, "\t%lu packet%s sent\n"); p2(tcps_sndpack,tcps_sndbyte, "\t\t%lu data packet%s (%lu byte%s)\n"); p2(tcps_sndrexmitpack, tcps_sndrexmitbyte, "\t\t%lu data packet%s (%lu byte%s) retransmitted\n"); p(tcps_sndrexmitbad, "\t\t%lu data packet%s unnecessarily retransmitted\n"); p(tcps_mturesent, "\t\t%lu resend%s initiated by MTU discovery\n"); p2a(tcps_sndacks, tcps_delack, "\t\t%lu ack-only packet%s (%lu delayed)\n"); p(tcps_sndurg, "\t\t%lu URG only packet%s\n"); p(tcps_sndprobe, "\t\t%lu window probe packet%s\n"); p(tcps_sndwinup, "\t\t%lu window update packet%s\n"); p(tcps_sndctrl, "\t\t%lu control packet%s\n"); p(tcps_rcvtotal, "\t%lu packet%s received\n"); p2(tcps_rcvackpack, tcps_rcvackbyte, "\t\t%lu ack%s (for %lu byte%s)\n"); p(tcps_rcvdupack, "\t\t%lu duplicate ack%s\n"); p(tcps_rcvacktoomuch, "\t\t%lu ack%s for unsent data\n"); p2(tcps_rcvpack, tcps_rcvbyte, "\t\t%lu packet%s (%lu byte%s) received in-sequence\n"); p2(tcps_rcvduppack, tcps_rcvdupbyte, "\t\t%lu completely duplicate packet%s (%lu byte%s)\n"); p(tcps_pawsdrop, "\t\t%lu old duplicate packet%s\n"); p2(tcps_rcvpartduppack, tcps_rcvpartdupbyte, "\t\t%lu packet%s with some dup. data (%lu byte%s duped)\n"); p2(tcps_rcvoopack, tcps_rcvoobyte, "\t\t%lu out-of-order packet%s (%lu byte%s)\n"); p2(tcps_rcvpackafterwin, tcps_rcvbyteafterwin, "\t\t%lu packet%s (%lu byte%s) of data after window\n"); p(tcps_rcvwinprobe, "\t\t%lu window probe%s\n"); p(tcps_rcvwinupd, "\t\t%lu window update packet%s\n"); p(tcps_rcvafterclose, "\t\t%lu packet%s received after close\n"); p(tcps_rcvbadsum, "\t\t%lu discarded for bad checksum%s\n"); p(tcps_rcvbadoff, "\t\t%lu discarded for bad header offset field%s\n"); p1a(tcps_rcvshort, "\t\t%lu discarded because packet too short\n"); p(tcps_connattempt, "\t%lu connection request%s\n"); p(tcps_accepts, "\t%lu connection accept%s\n"); p(tcps_badsyn, "\t%lu bad connection attempt%s\n"); p(tcps_listendrop, "\t%lu listen queue overflow%s\n"); p(tcps_badrst, "\t%lu ignored RSTs in the window%s\n"); p(tcps_connects, "\t%lu connection%s established (including accepts)\n"); p2(tcps_closed, tcps_drops, "\t%lu connection%s closed (including %lu drop%s)\n"); p(tcps_cachedrtt, "\t\t%lu connection%s updated cached RTT on close\n"); p(tcps_cachedrttvar, "\t\t%lu connection%s updated cached RTT variance on close\n"); p(tcps_cachedssthresh, "\t\t%lu connection%s updated cached ssthresh on close\n"); p(tcps_conndrops, "\t%lu embryonic connection%s dropped\n"); p2(tcps_rttupdated, tcps_segstimed, "\t%lu segment%s updated rtt (of %lu attempt%s)\n"); p(tcps_rexmttimeo, "\t%lu retransmit timeout%s\n"); p(tcps_timeoutdrop, "\t\t%lu connection%s dropped by rexmit timeout\n"); p(tcps_persisttimeo, "\t%lu persist timeout%s\n"); p(tcps_persistdrop, "\t\t%lu connection%s dropped by persist timeout\n"); p(tcps_keeptimeo, "\t%lu keepalive timeout%s\n"); p(tcps_keepprobe, "\t\t%lu keepalive probe%s sent\n"); p(tcps_keepdrops, "\t\t%lu connection%s dropped by keepalive\n"); p(tcps_predack, "\t%lu correct ACK header prediction%s\n"); p(tcps_preddat, "\t%lu correct data packet header prediction%s\n"); p(tcps_sc_added, "\t%lu syncache entrie%s added\n"); p1a(tcps_sc_retransmitted, "\t\t%lu retransmitted\n"); p1a(tcps_sc_dupsyn, "\t\t%lu dupsyn\n"); p1a(tcps_sc_dropped, "\t\t%lu dropped\n"); p1a(tcps_sc_completed, "\t\t%lu completed\n"); p1a(tcps_sc_bucketoverflow, "\t\t%lu bucket overflow\n"); p1a(tcps_sc_cacheoverflow, "\t\t%lu cache overflow\n"); p1a(tcps_sc_reset, "\t\t%lu reset\n"); p1a(tcps_sc_stale, "\t\t%lu stale\n"); p1a(tcps_sc_aborted, "\t\t%lu aborted\n"); p1a(tcps_sc_badack, "\t\t%lu badack\n"); p1a(tcps_sc_unreach, "\t\t%lu unreach\n"); p(tcps_sc_zonefail, "\t\t%lu zone failure%s\n"); p(tcps_sc_sendcookie, "\t%lu cookie%s sent\n"); p(tcps_sc_recvcookie, "\t%lu cookie%s received\n"); p(tcps_sack_recovery_episode, "\t%lu SACK recovery episode%s\n"); p(tcps_sack_rexmits, "\t%lu segment rexmit%s in SACK recovery episodes\n"); p(tcps_sack_rexmit_bytes, "\t%lu byte rexmit%s in SACK recovery episodes\n"); p(tcps_sack_rcv_blocks, "\t%lu SACK option%s (SACK blocks) received\n"); p(tcps_sack_send_blocks, "\t%lu SACK option%s (SACK blocks) sent\n"); #undef p #undef p1a #undef p2 #undef p2a #undef p3 } /* * Dump UDP statistics structure. */ void udp_stats(u_long off __unused, const char *name, int af1 __unused) { struct udpstat udpstat, zerostat; size_t len = sizeof udpstat; u_long delivered; if (zflag) memset(&zerostat, 0, len); if (sysctlbyname("net.inet.udp.stats", &udpstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { warn("sysctl: net.inet.udp.stats"); return; } #ifdef INET6 if (udp_done != 0) return; else udp_done = 1; #endif printf("%s:\n", name); #define p(f, m) if (udpstat.f || sflag <= 1) \ printf(m, udpstat.f, plural(udpstat.f)) #define p1a(f, m) if (udpstat.f || sflag <= 1) \ printf(m, udpstat.f) p(udps_ipackets, "\t%lu datagram%s received\n"); p1a(udps_hdrops, "\t%lu with incomplete header\n"); p1a(udps_badlen, "\t%lu with bad data length field\n"); p1a(udps_badsum, "\t%lu with bad checksum\n"); p1a(udps_nosum, "\t%lu with no checksum\n"); p1a(udps_noport, "\t%lu dropped due to no socket\n"); p(udps_noportbcast, "\t%lu broadcast/multicast datagram%s dropped due to no socket\n"); p1a(udps_fullsock, "\t%lu dropped due to full socket buffers\n"); p1a(udpps_pcbhashmiss, "\t%lu not for hashed pcb\n"); delivered = udpstat.udps_ipackets - udpstat.udps_hdrops - udpstat.udps_badlen - udpstat.udps_badsum - udpstat.udps_noport - udpstat.udps_noportbcast - udpstat.udps_fullsock; if (delivered || sflag <= 1) printf("\t%lu delivered\n", delivered); p(udps_opackets, "\t%lu datagram%s output\n"); #undef p #undef p1a } /* * Dump IP statistics structure. */ void ip_stats(u_long off __unused, const char *name, int af1 __unused) { struct ipstat ipstat, zerostat; size_t len = sizeof ipstat; if (zflag) memset(&zerostat, 0, len); if (sysctlbyname("net.inet.ip.stats", &ipstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { warn("sysctl: net.inet.ip.stats"); return; } printf("%s:\n", name); #define p(f, m) if (ipstat.f || sflag <= 1) \ printf(m, ipstat.f, plural(ipstat.f)) #define p1a(f, m) if (ipstat.f || sflag <= 1) \ printf(m, ipstat.f) p(ips_total, "\t%lu total packet%s received\n"); p(ips_badsum, "\t%lu bad header checksum%s\n"); p1a(ips_toosmall, "\t%lu with size smaller than minimum\n"); p1a(ips_tooshort, "\t%lu with data size < data length\n"); p1a(ips_toolong, "\t%lu with ip length > max ip packet size\n"); p1a(ips_badhlen, "\t%lu with header length < data size\n"); p1a(ips_badlen, "\t%lu with data length < header length\n"); p1a(ips_badoptions, "\t%lu with bad options\n"); p1a(ips_badvers, "\t%lu with incorrect version number\n"); p(ips_fragments, "\t%lu fragment%s received\n"); p(ips_fragdropped, "\t%lu fragment%s dropped (dup or out of space)\n"); p(ips_fragtimeout, "\t%lu fragment%s dropped after timeout\n"); p(ips_reassembled, "\t%lu packet%s reassembled ok\n"); p(ips_delivered, "\t%lu packet%s for this host\n"); p(ips_noproto, "\t%lu packet%s for unknown/unsupported protocol\n"); p(ips_forward, "\t%lu packet%s forwarded"); p(ips_fastforward, " (%lu packet%s fast forwarded)"); if (ipstat.ips_forward || sflag <= 1) putchar('\n'); p(ips_cantforward, "\t%lu packet%s not forwardable\n"); p(ips_transit_re, "\t%lu packet%s forwarded to receive path\n"); p(ips_notmember, "\t%lu packet%s received for unknown multicast group\n"); p(ips_redirectsent, "\t%lu redirect%s sent\n"); p(ips_localout, "\t%lu packet%s sent from this host\n"); p(ips_rawout, "\t%lu packet%s sent with fabricated ip header\n"); p(ips_odropped, "\t%lu output packet%s dropped due to no bufs, etc.\n"); p(ips_noroute, "\t%lu output packet%s discarded due to no route\n"); p(ips_fragmented, "\t%lu output datagram%s fragmented\n"); p(ips_ofragments, "\t%lu fragment%s created\n"); p(ips_cantfrag, "\t%lu datagram%s that can't be fragmented\n"); p(ips_nogif, "\t%lu tunneling packet%s that can't find gif\n"); p(ips_badaddr, "\t%lu datagram%s with bad address in header\n"); #undef p #undef p1a } static const char *icmpnames[] = { "echo reply", "#1", "#2", "destination unreachable", "source quench", "routing redirect", "#6", "#7", "echo", "router advertisement", "router solicitation", "time exceeded", "parameter problem", "time stamp", "time stamp reply", "information request", "information request reply", "address mask request", "address mask reply", }; /* * Dump ICMP statistics. */ void icmp_stats(u_long off __unused, const char *name, int af1 __unused) { struct icmpstat icmpstat, zerostat; int i, first; int mib[4]; /* CTL_NET + PF_INET + IPPROTO_ICMP + req */ size_t len; mib[0] = CTL_NET; mib[1] = PF_INET; mib[2] = IPPROTO_ICMP; mib[3] = ICMPCTL_STATS; len = sizeof icmpstat; if (zflag) memset(&zerostat, 0, len); if (sysctl(mib, 4, &icmpstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { warn("sysctl: net.inet.icmp.stats"); return; } printf("%s:\n", name); #define p(f, m) if (icmpstat.f || sflag <= 1) \ printf(m, icmpstat.f, plural(icmpstat.f)) #define p1a(f, m) if (icmpstat.f || sflag <= 1) \ printf(m, icmpstat.f) #define p2(f, m) if (icmpstat.f || sflag <= 1) \ printf(m, icmpstat.f, plurales(icmpstat.f)) p(icps_error, "\t%lu call%s to icmp_error\n"); p(icps_oldicmp, "\t%lu error%s not generated in response to an icmp message\n"); for (first = 1, i = 0; i < ICMP_MAXTYPE + 1; i++) if (icmpstat.icps_outhist[i] != 0) { if (first) { printf("\tOutput histogram:\n"); first = 0; } printf("\t\t%s: %lu\n", icmpnames[i], icmpstat.icps_outhist[i]); } p(icps_badcode, "\t%lu message%s with bad code fields\n"); p(icps_tooshort, "\t%lu message%s < minimum length\n"); p(icps_checksum, "\t%lu bad checksum%s\n"); p(icps_badlen, "\t%lu message%s with bad length\n"); p1a(icps_bmcastecho, "\t%lu multicast echo requests ignored\n"); p1a(icps_bmcasttstamp, "\t%lu multicast timestamp requests ignored\n"); for (first = 1, i = 0; i < ICMP_MAXTYPE + 1; i++) if (icmpstat.icps_inhist[i] != 0) { if (first) { printf("\tInput histogram:\n"); first = 0; } printf("\t\t%s: %lu\n", icmpnames[i], icmpstat.icps_inhist[i]); } p(icps_reflect, "\t%lu message response%s generated\n"); p2(icps_badaddr, "\t%lu invalid return address%s\n"); p(icps_noroute, "\t%lu no return route%s\n"); #undef p #undef p1a #undef p2 mib[3] = ICMPCTL_MASKREPL; len = sizeof i; if (sysctl(mib, 4, &i, &len, (void *)0, 0) < 0) return; printf("\tICMP address mask responses are %sabled\n", i ? "en" : "dis"); } /* * Dump IGMP statistics structure. */ void igmp_stats(u_long off __unused, const char *name, int af1 __unused) { struct igmpstat igmpstat, zerostat; size_t len = sizeof igmpstat; if (zflag) memset(&zerostat, 0, len); if (sysctlbyname("net.inet.igmp.stats", &igmpstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { warn("sysctl: net.inet.igmp.stats"); return; } printf("%s:\n", name); #define p(f, m) if (igmpstat.f || sflag <= 1) \ printf(m, igmpstat.f, plural(igmpstat.f)) #define py(f, m) if (igmpstat.f || sflag <= 1) \ printf(m, igmpstat.f, igmpstat.f != 1 ? "ies" : "y") p(igps_rcv_total, "\t%u message%s received\n"); p(igps_rcv_tooshort, "\t%u message%s received with too few bytes\n"); p(igps_rcv_badsum, "\t%u message%s received with bad checksum\n"); py(igps_rcv_queries, "\t%u membership quer%s received\n"); py(igps_rcv_badqueries, "\t%u membership quer%s received with invalid field(s)\n"); p(igps_rcv_reports, "\t%u membership report%s received\n"); p(igps_rcv_badreports, "\t%u membership report%s received with invalid field(s)\n"); p(igps_rcv_ourreports, "\t%u membership report%s received for groups to which we belong\n"); p(igps_snd_reports, "\t%u membership report%s sent\n"); #undef p #undef py } /* * Dump PIM statistics structure. */ void pim_stats(u_long off __unused, const char *name, int af1 __unused) { struct pimstat pimstat, zerostat; size_t len = sizeof pimstat; if (zflag) memset(&zerostat, 0, len); if (sysctlbyname("net.inet.pim.stats", &pimstat, &len, zflag ? &zerostat : NULL, zflag ? len : 0) < 0) { if (errno != ENOENT) warn("sysctl: net.inet.pim.stats"); return; } printf("%s:\n", name); #define p(f, m) if (pimstat.f || sflag <= 1) \ printf(m, pimstat.f, plural(pimstat.f)) #define py(f, m) if (pimstat.f || sflag <= 1) \ printf(m, pimstat.f, pimstat.f != 1 ? "ies" : "y") p(pims_rcv_total_msgs, "\t%llu message%s received\n"); p(pims_rcv_total_bytes, "\t%llu byte%s received\n"); p(pims_rcv_tooshort, "\t%llu message%s received with too few bytes\n"); p(pims_rcv_badsum, "\t%llu message%s received with bad checksum\n"); p(pims_rcv_badversion, "\t%llu message%s received with bad version\n"); p(pims_rcv_registers_msgs, "\t%llu data register message%s received\n"); p(pims_rcv_registers_bytes, "\t%llu data register byte%s received\n"); p(pims_rcv_registers_wrongiif, "\t%llu data register message%s received on wrong iif\n"); p(pims_rcv_badregisters, "\t%llu bad register%s received\n"); p(pims_snd_registers_msgs, "\t%llu data register message%s sent\n"); p(pims_snd_registers_bytes, "\t%llu data register byte%s sent\n"); #undef p #undef py } /* * Pretty print an Internet address (net address + port). */ void inetprint(struct in_addr *in, int port, const char *proto, int num_port) { struct servent *sp = 0; char line[80], *cp; int width; if (Wflag) sprintf(line, "%s.", inetname(in)); else sprintf(line, "%.*s.", (Aflag && !num_port) ? 12 : 16, inetname(in)); cp = index(line, '\0'); if (!num_port && port) sp = getservbyport((int)port, proto); if (sp || port == 0) sprintf(cp, "%.15s ", sp ? sp->s_name : "*"); else sprintf(cp, "%d ", ntohs((u_short)port)); width = (Aflag && !Wflag) ? 18 : 22; if (Wflag) printf("%-*s ", width, line); else printf("%-*.*s ", width, width, line); } /* * Construct an Internet address representation. * If numeric_addr has been supplied, give * numeric value, otherwise try for symbolic name. */ char * inetname(struct in_addr *inp) { char *cp; static char line[MAXHOSTNAMELEN]; struct hostent *hp; struct netent *np; cp = 0; if (!numeric_addr && inp->s_addr != INADDR_ANY) { int net = inet_netof(*inp); int lna = inet_lnaof(*inp); if (lna == INADDR_ANY) { np = getnetbyaddr(net, AF_INET); if (np) cp = np->n_name; } if (cp == 0) { hp = gethostbyaddr((char *)inp, sizeof (*inp), AF_INET); if (hp) { cp = hp->h_name; trimdomain(cp, strlen(cp)); } } } if (inp->s_addr == INADDR_ANY) strcpy(line, "*"); else if (cp) { strncpy(line, cp, sizeof(line) - 1); line[sizeof(line) - 1] = '\0'; } else { inp->s_addr = ntohl(inp->s_addr); #define C(x) ((u_int)((x) & 0xff)) sprintf(line, "%u.%u.%u.%u", C(inp->s_addr >> 24), C(inp->s_addr >> 16), C(inp->s_addr >> 8), C(inp->s_addr)); } return (line); } --FCuugMFkClbJLl1L Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="inet.c.diff" --- inet.org.c Mon Dec 27 01:48:58 2004 +++ inet.c Sun Dec 26 22:33:20 2004 @@ -38,7 +38,7 @@ #endif #include -__FBSDID("$FreeBSD: /repoman/r/ncvs/src/usr.bin/netstat/inet.c,v 1.67 2004/07/26 20:18:11 charnier Exp $"); +__FBSDID("$FreeBSD: src/usr.bin/netstat/inet.c,v 1.67 2004/07/26 20:18:11 charnier Exp $"); #include #include @@ -569,6 +569,7 @@ if (ipstat.ips_forward || sflag <= 1) putchar('\n'); p(ips_cantforward, "\t%lu packet%s not forwardable\n"); + p(ips_transit_re, "\t%lu packet%s forwarded to receive path\n"); p(ips_notmember, "\t%lu packet%s received for unknown multicast group\n"); p(ips_redirectsent, "\t%lu redirect%s sent\n"); --FCuugMFkClbJLl1L Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ip_fastfwd.c" /* * Copyright (c) 2003 Andre Oppermann, Internet Business Solutions AG * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the author may not be used to endorse or promote * products derived from this software without specific prior written * permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD: src/sys/netinet/ip_fastfwd.c,v 1.17.2.3 2004/10/03 17:04:40 mlaier Exp $ * $Wolfowitz: snap5d/src/sys/netinet/apc_ip_fastfwd.c,v 1.35.2 2004/12/04 15:32:21 jenkins Exp $ * $Wolfowitz: freebsd5/src/sys/netinet/ip_fastfwd.c,v 1.18.0.3 2004/12/15 17:04:40 blahdy Exp $ */ /* * ip_fastforward gets its speed from processing the forwarded packet to * completion (if_output on the other side) without any queues or netisr's. * The receiving interface DMAs the packet into memory, the upper half of * driver calls ip_fastforward, we do our routing table lookup and directly * send it off to the outgoing interface which DMAs the packet to the * network card. The only part of the packet we touch with the CPU is the * IP header (unless there are complex firewall rules touching other parts * of the packet, but that is up to you). We are essentially limited by bus * bandwidth and how fast the network card/driver can set up receives and * transmits. * * We handle basic errors, ip header errors, checksum errors, * destination unreachable, fragmentation and fragmentation needed and * report them via icmp to the sender. * * Else if something is not pure IPv4 unicast forwarding we fall back to * the normal ip_input processing path. We should only be called from * interfaces connected to the outside world. * * Firewalling is fully supported including divert, ipfw fwd and ipfilter * ipnat and address rewrite. * * IPSEC is not supported if this host is a tunnel broker. IPSEC is * supported for connections to/from local host. * * We try to do the least expensive (in CPU ops) checks and operations * first to catch junk with as little overhead as possible. * * We take full advantage of hardware support for ip checksum and * fragmentation offloading. * * We don't do ICMP redirect in the fast forwarding path. I have had my own * cases where two core routers with Zebra routing suite would send millions * ICMP redirects to connected hosts if the router to dest was not the default * gateway. In one case it was filling the routing table of a host with close * 300'000 cloned redirect entries until it ran out of kernel memory. However * the networking code proved very robust and it didn't crash or went ill * otherwise. */ /* * Many thanks to Matt Thomas of NetBSD for basic structure of ip_flow.c which * is being followed here. */ #include "opt_ipfw.h" #include "opt_ipstealth.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* include */ #include #include #include #include #include #include #include static int ipfastforward_active = 0; SYSCTL_INT(_net_inet_ip, OID_AUTO, fastforwarding, CTLFLAG_RW, &ipfastforward_active, 0, "Enable fast IP forwarding"); static struct sockaddr_in * ip_findroute(struct route *ro, struct in_addr dest, struct mbuf *m) { struct sockaddr_in *dst; struct rtentry *rt; /* struct mtrie *mt; */ /* * Find route to destination. */ bzero(ro, sizeof(*ro)); dst = (struct sockaddr_in *)&ro->ro_dst; dst->sin_family = AF_INET; dst->sin_len = sizeof(*dst); dst->sin_addr.s_addr = dest.s_addr; rtalloc_ign(ro, RTF_CLONING); /* fiballoc(pfx, mt); */ /* * Prefix there and valid adjacency? */ rt = ro->ro_rt; if (rt && (rt->rt_flags & RTF_UP) && (rt->rt_ifp->if_flags & IFF_UP) && (rt->rt_ifp->if_flags & IFF_RUNNING)) { if (rt->rt_flags & RTF_GATEWAY) dst = (struct sockaddr_in *)rt->rt_gateway; } else { ipstat.ips_noroute++; ipstat.ips_cantforward++; if (rt) RTFREE(rt); /* * The old ip_fastforward() violated RFC1812 by responding * with !H instead of !N when there is no destination * route found. Behaviors observed from both Cisco Cat6509/Sup720 * and Juniper M20 result in !N (correctly complying to * RFC1812) when there is no route available. --james 2004/09/17 */ icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NET, 0, NULL); return NULL; } return dst; } /* * Try to forward a packet based on the destination address. * This is a fast path optimized for the plain forwarding case. * If the packet is handled (and consumed) here then we return 1; * otherwise 0 is returned and the packet should be delivered * to ip_input for full processing. */ int ip_fastforward(struct mbuf *m) { struct ip *ip; struct mbuf *m0 = NULL; struct route ro; /* struct fentry *pfx = NULL; */ struct sockaddr_in *dst = NULL; struct ifnet *ifp; struct in_addr odest, dest; u_short sum, ip_len; int error = 0; int hlen, mtu; #ifdef IPFIREWALL_FORWARD struct m_tag *fwd_tag; #endif /* * Are we active and forwarding packets? */ if (!ipfastforward_active || !ipforwarding) return 0; M_ASSERTVALID(m); M_ASSERTPKTHDR(m); ro.ro_rt = NULL; /* * Step 1: check for packet drop conditions (and sanity checks) */ ipstat.ips_total++; /* * Is entire packet big enough? */ if (m->m_pkthdr.len < sizeof(struct ip)) { ipstat.ips_tooshort++; goto drop; } /* * Is first mbuf large enough for ip header and is header present? */ if (m->m_len < sizeof (struct ip) && (m = m_pullup(m, sizeof (struct ip))) == 0) { ipstat.ips_toosmall++; return 1; } ip = mtod(m, struct ip *); /* * Is it IPv4? */ if (ip->ip_v != IPVERSION) { ipstat.ips_badvers++; goto drop; } /* * Is IP header length correct and is it in first mbuf? */ hlen = ip->ip_hl << 2; if (hlen < sizeof(struct ip)) { /* minimum header length */ ipstat.ips_badlen++; goto drop; } if (hlen > m->m_len) { if ((m = m_pullup(m, hlen)) == 0) { ipstat.ips_badhlen++; return 1; } ip = mtod(m, struct ip *); } /* * Checksum correct? */ if (m->m_pkthdr.csum_flags & CSUM_IP_CHECKED) sum = !(m->m_pkthdr.csum_flags & CSUM_IP_VALID); else { if (hlen == sizeof(struct ip)) sum = in_cksum_hdr(ip); else sum = in_cksum(m, hlen); } if (sum) { ipstat.ips_badsum++; goto drop; } m->m_pkthdr.csum_flags |= (CSUM_IP_CHECKED | CSUM_IP_VALID); ip_len = ntohs(ip->ip_len); /* * Is IP length longer than packet we have got? */ if (m->m_pkthdr.len < ip_len) { ipstat.ips_tooshort++; goto drop; } /* * Is packet longer than IP header tells us? If yes, truncate packet. */ if (m->m_pkthdr.len > ip_len) { if (m->m_len == m->m_pkthdr.len) { m->m_len = ip_len; m->m_pkthdr.len = ip_len; } else m_adj(m, ip_len - m->m_pkthdr.len); } /* * Is packet from or to 127/8? */ if ((ntohl(ip->ip_dst.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET || (ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET) { ipstat.ips_badaddr++; goto drop; } #ifdef ALTQ /* * Is packet dropped by traffic conditioner? */ if (altq_input != NULL && (*altq_input)(m, AF_INET) == 0) return 1; #endif /* * Step 2: fallback conditions to normal ip_input path processing */ /* * Only IP packets without options */ if (ip->ip_hl != (sizeof(struct ip) >> 2)) { if (ip_doopts == 1){ goto prercvpath; } else if (ip_doopts == 2) { icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_FILTER_PROHIB, 0, NULL); return 1; } /* else ignore IP options and continue */ } /* * Only unicast IP, not from loopback, no L2 or IP broadcast, * no multicast, no INADDR_ANY * * XXX: Probably some of these checks could be direct drop * conditions. However it is not clear whether there are some * hacks or obscure behaviours which make it neccessary to * let ip_input handle it. We play safe here and let ip_input * deal with it until it is proven that we can directly drop it. * * If packet originated from loopback interface, don't even * bother with receive path. Receive acl must only validate * "From-Wire -> To-ControlPlane" destined traffic, not the * packets we created on our own. */ if (m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) return 0; if (ntohl(ip->ip_src.s_addr) == (u_long)INADDR_BROADCAST || ntohl(ip->ip_dst.s_addr) == (u_long)INADDR_BROADCAST || IN_MULTICAST(ntohl(ip->ip_src.s_addr)) || IN_MULTICAST(ntohl(ip->ip_dst.s_addr)) || ip->ip_dst.s_addr == INADDR_ANY ) goto prercvpath; /* * Step 3: incoming packet firewall processing */ /* * Convert to host representation */ ip->ip_len = ntohs(ip->ip_len); ip->ip_off = ntohs(ip->ip_off); odest.s_addr = dest.s_addr = ip->ip_dst.s_addr; /* * Run through list of ipfilter hooks for input packets */ if (inet_pfil_hook.ph_busy_count == -1) goto passin; if (pfil_run_hooks(&inet_pfil_hook, &m, m->m_pkthdr.rcvif, PFIL_IN, NULL) || m == NULL) return 1; M_ASSERTVALID(m); M_ASSERTPKTHDR(m); ip = mtod(m, struct ip *); /* m may have changed by pfil hook */ dest.s_addr = ip->ip_dst.s_addr; passin: /* * Step 4: Look up and analyze route then decrement TTL. */ /* * Find route to destination. * Note: If firewall call above changed destination to another * address, lookup of kernel RIB will be acted upon the new * destination address -- hence saving us a hash lookup here. */ if ((dst = ip_findroute(&ro, dest, m)) == NULL) return 1; /* icmp unreach already sent */ ifp = ro.ro_rt->rt_ifp; /* * Destination address changed by firewall? (policy routing) */ if (odest.s_addr != dest.s_addr) { /* * Is the new destination for a local address on this host? */ if (ro.ro_rt->rt_flags & RTF_LOCAL) goto forwardlocal; /* * Go on with new destination address */ } #ifdef IPFIREWALL_FORWARD if (m->m_flags & M_FASTFWD_OURS) { /* * ipfw changed it for a local address on this host. */ goto forwardlocal; } #endif /* IPFIREWALL_FORWARD */ /* * Is packet destined to us or broadcast address(es)? * SIOCSIFADDR installs /32 lo0 routes so let's check if * this is a route that is bound to loopback. */ if (ro.ro_rt->rt_flags & RTF_LOCAL) goto rcvpath; /* * Drop blackhole and reject routes while we are in the * fast forwarding path. */ if (ro.ro_rt->rt_flags & RTF_BLACKHOLE) goto drop; /* * XXX Need L2 info off the kernel routing table.. This is a * makeshift kludge, so please use 2nd consideration before * committing the line below into main cvs tree. * * Administratively installed reject routes should have * rmx_expire unset. */ if ((ro.ro_rt->rt_flags & RTF_REJECT) && ro.ro_rt->rt_rmx.rmx_expire == 0){ icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NET, 0, NULL); goto consumed; } /* * Check TTL */ #ifdef IPSTEALTH if (!ipstealth) { #endif if (ip->ip_ttl <= IPTTLDEC) { icmp_error(m, ICMP_TIMXCEED, ICMP_TIMXCEED_INTRANS, 0, NULL); goto consumed; } /* * Decrement the TTL and incrementally change the checksum. * Don't bother doing this with hw checksum offloading. */ ip->ip_ttl -= IPTTLDEC; if (ip->ip_sum >= (u_int16_t) ~htons(IPTTLDEC << 8)) ip->ip_sum -= ~htons(IPTTLDEC << 8); else ip->ip_sum += htons(IPTTLDEC << 8); #ifdef IPSTEALTH } #endif /* * Step 5: outgoing firewall packet processing */ /* * Run through list of hooks for output packets. */ if (inet_pfil_hook.ph_busy_count == -1) goto passout; if (pfil_run_hooks(&inet_pfil_hook, &m, ifp, PFIL_OUT, NULL) || m == NULL) { goto consumed; } M_ASSERTVALID(m); M_ASSERTPKTHDR(m); ip = mtod(m, struct ip *); dest.s_addr = ip->ip_dst.s_addr; /* * Destination address changed? */ #ifndef IPFIREWALL_FORWARD if (odest.s_addr != dest.s_addr) { #else fwd_tag = m_tag_find(m, PACKET_TAG_IPFORWARD, NULL); if (odest.s_addr != dest.s_addr || fwd_tag != NULL) { #endif /* IPFIREWALL_FORWARD */ /* * Is it now for a local address on this host? * * We'll simply rely on in_localip() to determine whether * address is destined to us this time around -- because * I really don't think running radix lookup two more * times in the outbound sections will outperform hash * lookup of system interface addrs. * * In the above ingress checks, we were able to get rid * of a hash lookup (in_localip() call that is) because * we are doing a radix lookup after the initial firewall * operation. */ #ifndef IPFIREWALL_FORWARD if (in_localip(dest)) { #else if (in_localip(dest) || m->m_flags & M_FASTFWD_OURS) { #endif /* IPFIREWALL_FORWARD */ forwardlocal: /* * Return packet for processing by ip_input(). * Keep host byte order as expected at ip_input's * "ours"-label. */ m->m_flags |= M_FASTFWD_OURS; goto rcvpath; } /* * Redo route lookup with new destination address */ #ifdef IPFIREWALL_FORWARD if (fwd_tag) { if (!in_localip(ip->ip_src) && !in_localaddr(ip->ip_dst)) dest.s_addr = ((struct sockaddr_in *)(fwd_tag+1))->sin_addr.s_addr; m_tag_delete(m, fwd_tag); } #endif /* IPFIREWALL_FORWARD */ RTFREE(ro.ro_rt); if ((dst = ip_findroute(&ro, dest, m)) == NULL) return 1; /* icmp unreach already sent */ ifp = ro.ro_rt->rt_ifp; } passout: /* * Step 6: send off the packet */ #ifndef ALTQ /* * Check if there is enough space in the interface queue */ if ((ifp->if_snd.ifq_len + ip->ip_len / ifp->if_mtu + 1) >= ifp->if_snd.ifq_maxlen) { ipstat.ips_odropped++; /* would send source quench here but that is depreciated */ goto drop; } #endif /* * Check if media link state of interface is not down */ if (ifp->if_link_state == LINK_STATE_DOWN) { icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, NULL); goto consumed; } /* * Check if packet fits MTU or if hardware will fragement for us */ if (ro.ro_rt->rt_rmx.rmx_mtu) mtu = min(ro.ro_rt->rt_rmx.rmx_mtu, ifp->if_mtu); else mtu = ifp->if_mtu; if (ip->ip_len <= mtu || (ifp->if_hwassist & CSUM_FRAGMENT && (ip->ip_off & IP_DF) == 0)) { /* * Restore packet header fields to original values */ ip->ip_len = htons(ip->ip_len); ip->ip_off = htons(ip->ip_off); /* * Send off the packet via outgoing interface */ error = (*ifp->if_output)(ifp, m, (struct sockaddr *)dst, ro.ro_rt); } else { /* * Handle EMSGSIZE with icmp reply needfrag for TCP MTU discovery */ if (ip->ip_off & IP_DF) { ipstat.ips_cantfrag++; icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NEEDFRAG, 0, ifp); goto consumed; } else { /* * We have to fragement the packet */ m->m_pkthdr.csum_flags |= CSUM_IP; /* * ip_fragment expects ip_len and ip_off in host byte * order but returns all packets in network byte order */ if (ip_fragment(ip, &m, mtu, ifp->if_hwassist, (~ifp->if_hwassist & CSUM_DELAY_IP))) { goto drop; } KASSERT(m != NULL, ("null mbuf and no error")); /* * Send off the fragments via outgoing interface */ error = 0; do { m0 = m->m_nextpkt; m->m_nextpkt = NULL; error = (*ifp->if_output)(ifp, m, (struct sockaddr *)dst, ro.ro_rt); if (error) break; } while ((m = m0) != NULL); if (error) { /* Reclaim remaining fragments */ for (; m; m = m0) { m0 = m->m_nextpkt; m->m_nextpkt = NULL; m_freem(m); } } else ipstat.ips_fragmented++; } } if (error != 0) ipstat.ips_odropped++; else { ipstat.ips_forward++; ipstat.ips_fastforward++; } consumed: RTFREE(ro.ro_rt); return 1; prercvpath: /* * Convert to host representation */ ip->ip_len = ntohs(ip->ip_len); ip->ip_off = ntohs(ip->ip_off); odest.s_addr = dest.s_addr = ip->ip_dst.s_addr; rcvpath: /* * Receive adjacency. If the packet needs to be punted up to * ip_input path for further analysis or because it is destined to * one of our own addresses, run it through the receive-path * firewall. To actually use this, the user must set up a firewall * rule using pf(4), ipfw(2), etc that checks on lo0 interface * under INBOUND direction (e.g. ` in quick on lo0` in pf) * * Cisco calls this Receive Path ACL, Juniper calls this Loopback * Filter. The fact that this is FreeBSD makes us behave like * Juniper (filtering on lo0) instead of Cisco (filtering via * "ip receive " command). --james 2004/10/23 */ /* * Set coordinates to loopback interface, inbound direction, * then call in the pfil_hooks. */ if (ro.ro_rt) RTFREE(ro.ro_rt); if (inet_pfil_hook.ph_busy_count == -1) goto punt; if (pfil_run_hooks(&inet_pfil_hook, &m, loif, PFIL_IN, NULL) || m == NULL) return 1; ip = mtod(m, struct ip *); /* m may have changed by pfil hook */ dest.s_addr = ip->ip_dst.s_addr; /* We do not support policy routing inside the receive path. * If the user requests it, drop the packet. Ensure that this * is documented in the user manual. */ if (odest.s_addr != dest.s_addr) goto drop; punt: /* * Packet has been pre-processed by ip_fastforward for * control plane evaluations. */ m->m_flags |= M_FASTFWD_PREPROC; ipstat.ips_transit_re++; return 0; drop: if (m) m_freem(m); if (ro.ro_rt) RTFREE(ro.ro_rt); return 1; } --FCuugMFkClbJLl1L Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ip_fastfwd.c.diff" --- ip_fastfwd.org.c Mon Dec 27 01:42:27 2004 +++ ip_fastfwd.c Sun Dec 26 22:33:15 2004 @@ -26,7 +26,9 @@ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * - * $FreeBSD: /repoman/r/ncvs/src/sys/netinet/ip_fastfwd.c,v 1.25 2004/11/09 09:40:32 andre Exp $ + * $FreeBSD: src/sys/netinet/ip_fastfwd.c,v 1.17.2.3 2004/10/03 17:04:40 mlaier Exp $ + * $Wolfowitz: snap5d/src/sys/netinet/apc_ip_fastfwd.c,v 1.35.2 2004/12/04 15:32:21 jenkins Exp $ + * $Wolfowitz: freebsd5/src/sys/netinet/ip_fastfwd.c,v 1.18.0.3 2004/12/15 17:04:40 blahdy Exp $ */ /* @@ -93,6 +95,7 @@ #include #include #include +/* include */ #include #include @@ -112,6 +115,7 @@ { struct sockaddr_in *dst; struct rtentry *rt; + /* struct mtrie *mt; */ /* * Find route to destination. @@ -122,9 +126,10 @@ dst->sin_len = sizeof(*dst); dst->sin_addr.s_addr = dest.s_addr; rtalloc_ign(ro, RTF_CLONING); + /* fiballoc(pfx, mt); */ /* - * Route there and interface still up? + * Prefix there and valid adjacency? */ rt = ro->ro_rt; if (rt && (rt->rt_flags & RTF_UP) && @@ -137,7 +142,15 @@ ipstat.ips_cantforward++; if (rt) RTFREE(rt); - icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, NULL); + + /* + * The old ip_fastforward() violated RFC1812 by responding + * with !H instead of !N when there is no destination + * route found. Behaviors observed from both Cisco Cat6509/Sup720 + * and Juniper M20 result in !N (correctly complying to + * RFC1812) when there is no route available. --james 2004/09/17 + */ + icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NET, 0, NULL); return NULL; } return dst; @@ -156,9 +169,8 @@ struct ip *ip; struct mbuf *m0 = NULL; struct route ro; + /* struct fentry *pfx = NULL; */ struct sockaddr_in *dst = NULL; - struct in_ifaddr *ia = NULL; - struct ifaddr *ifa = NULL; struct ifnet *ifp; struct in_addr odest, dest; u_short sum, ip_len; @@ -183,6 +195,8 @@ * Step 1: check for packet drop conditions (and sanity checks) */ + ipstat.ips_total++; + /* * Is entire packet big enough? */ @@ -195,9 +209,9 @@ * Is first mbuf large enough for ip header and is header present? */ if (m->m_len < sizeof (struct ip) && - (m = m_pullup(m, sizeof (struct ip))) == NULL) { + (m = m_pullup(m, sizeof (struct ip))) == 0) { ipstat.ips_toosmall++; - return 1; /* mbuf already free'd */ + return 1; } ip = mtod(m, struct ip *); @@ -241,10 +255,6 @@ ipstat.ips_badsum++; goto drop; } - - /* - * Remeber that we have checked the IP header and found it valid. - */ m->m_pkthdr.csum_flags |= (CSUM_IP_CHECKED | CSUM_IP_VALID); ip_len = ntohs(ip->ip_len); @@ -293,9 +303,9 @@ * Only IP packets without options */ if (ip->ip_hl != (sizeof(struct ip) >> 2)) { - if (ip_doopts == 1) - return 0; - else if (ip_doopts == 2) { + if (ip_doopts == 1){ + goto prercvpath; + } else if (ip_doopts == 2) { icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_FILTER_PROHIB, 0, NULL); return 1; @@ -312,38 +322,22 @@ * hacks or obscure behaviours which make it neccessary to * let ip_input handle it. We play safe here and let ip_input * deal with it until it is proven that we can directly drop it. + * + * If packet originated from loopback interface, don't even + * bother with receive path. Receive acl must only validate + * "From-Wire -> To-ControlPlane" destined traffic, not the + * packets we created on our own. */ - if ((m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) || - ntohl(ip->ip_src.s_addr) == (u_long)INADDR_BROADCAST || + if (m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) + return 0; + + if (ntohl(ip->ip_src.s_addr) == (u_long)INADDR_BROADCAST || ntohl(ip->ip_dst.s_addr) == (u_long)INADDR_BROADCAST || IN_MULTICAST(ntohl(ip->ip_src.s_addr)) || IN_MULTICAST(ntohl(ip->ip_dst.s_addr)) || ip->ip_dst.s_addr == INADDR_ANY ) - return 0; - - /* - * Is it for a local address on this host? - */ - if (in_localip(ip->ip_dst)) - return 0; + goto prercvpath; - /* - * Or is it for a local IP broadcast address on this host? - */ - if ((m->m_flags & M_BCAST) && - (m->m_pkthdr.rcvif->if_flags & IFF_BROADCAST)) { - TAILQ_FOREACH(ifa, &m->m_pkthdr.rcvif->if_addrhead, ifa_link) { - if (ifa->ifa_addr->sa_family != AF_INET) - continue; - ia = ifatoia(ifa); - if (ia->ia_netbroadcast.s_addr == ip->ip_dst.s_addr) - return 0; - if (satosin(&ia->ia_broadaddr)->sin_addr.s_addr == - ip->ip_dst.s_addr) - return 0; - } - } - ipstat.ips_total++; /* * Step 3: incoming packet firewall processing @@ -373,14 +367,29 @@ ip = mtod(m, struct ip *); /* m may have changed by pfil hook */ dest.s_addr = ip->ip_dst.s_addr; +passin: /* - * Destination address changed? + * Step 4: Look up and analyze route then decrement TTL. + */ + + /* + * Find route to destination. + * Note: If firewall call above changed destination to another + * address, lookup of kernel RIB will be acted upon the new + * destination address -- hence saving us a hash lookup here. + */ + if ((dst = ip_findroute(&ro, dest, m)) == NULL) + return 1; /* icmp unreach already sent */ + ifp = ro.ro_rt->rt_ifp; + + /* + * Destination address changed by firewall? (policy routing) */ if (odest.s_addr != dest.s_addr) { /* - * Is it now for a local address on this host? + * Is the new destination for a local address on this host? */ - if (in_localip(dest)) + if (ro.ro_rt->rt_flags & RTF_LOCAL) goto forwardlocal; /* * Go on with new destination address @@ -395,10 +404,34 @@ } #endif /* IPFIREWALL_FORWARD */ -passin: /* - * Step 4: decrement TTL and look up route + * Is packet destined to us or broadcast address(es)? + * SIOCSIFADDR installs /32 lo0 routes so let's check if + * this is a route that is bound to loopback. */ + if (ro.ro_rt->rt_flags & RTF_LOCAL) + goto rcvpath; + + /* + * Drop blackhole and reject routes while we are in the + * fast forwarding path. + */ + if (ro.ro_rt->rt_flags & RTF_BLACKHOLE) + goto drop; + + /* + * XXX Need L2 info off the kernel routing table.. This is a + * makeshift kludge, so please use 2nd consideration before + * committing the line below into main cvs tree. + * + * Administratively installed reject routes should have + * rmx_expire unset. + */ + if ((ro.ro_rt->rt_flags & RTF_REJECT) && + ro.ro_rt->rt_rmx.rmx_expire == 0){ + icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_NET, 0, NULL); + goto consumed; + } /* * Check TTL @@ -408,13 +441,12 @@ #endif if (ip->ip_ttl <= IPTTLDEC) { icmp_error(m, ICMP_TIMXCEED, ICMP_TIMXCEED_INTRANS, 0, NULL); - return 1; + goto consumed; } /* - * Decrement the TTL and incrementally change the IP header checksum. - * Don't bother doing this with hw checksum offloading, it's faster - * doing it right here. + * Decrement the TTL and incrementally change the checksum. + * Don't bother doing this with hw checksum offloading. */ ip->ip_ttl -= IPTTLDEC; if (ip->ip_sum >= (u_int16_t) ~htons(IPTTLDEC << 8)) @@ -426,19 +458,6 @@ #endif /* - * Find route to destination. - */ - if ((dst = ip_findroute(&ro, dest, m)) == NULL) - return 1; /* icmp unreach already sent */ - ifp = ro.ro_rt->rt_ifp; - - /* - * Immediately drop blackholed traffic. - */ - if (ro.ro_rt->rt_flags & RTF_BLACKHOLE) - goto drop; - - /* * Step 5: outgoing firewall packet processing */ @@ -469,11 +488,22 @@ #endif /* IPFIREWALL_FORWARD */ /* * Is it now for a local address on this host? + * + * We'll simply rely on in_localip() to determine whether + * address is destined to us this time around -- because + * I really don't think running radix lookup two more + * times in the outbound sections will outperform hash + * lookup of system interface addrs. + * + * In the above ingress checks, we were able to get rid + * of a hash lookup (in_localip() call that is) because + * we are doing a radix lookup after the initial firewall + * operation. */ #ifndef IPFIREWALL_FORWARD if (in_localip(dest)) { #else - if (m->m_flags & M_FASTFWD_OURS || in_localip(dest)) { + if (in_localip(dest) || m->m_flags & M_FASTFWD_OURS) { #endif /* IPFIREWALL_FORWARD */ forwardlocal: /* @@ -482,9 +512,7 @@ * "ours"-label. */ m->m_flags |= M_FASTFWD_OURS; - if (ro.ro_rt) - RTFREE(ro.ro_rt); - return 0; + goto rcvpath; } /* * Redo route lookup with new destination address @@ -507,15 +535,6 @@ * Step 6: send off the packet */ - /* - * Check if route is dampned (when ARP is unable to resolve) - */ - if ((ro.ro_rt->rt_flags & RTF_REJECT) && - ro.ro_rt->rt_rmx.rmx_expire >= time_second) { - icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, NULL); - goto consumed; - } - #ifndef ALTQ /* * Check if there is enough space in the interface queue @@ -607,13 +626,69 @@ if (error != 0) ipstat.ips_odropped++; else { - ro.ro_rt->rt_rmx.rmx_pksent++; ipstat.ips_forward++; ipstat.ips_fastforward++; } consumed: RTFREE(ro.ro_rt); return 1; +prercvpath: + /* + * Convert to host representation + */ + ip->ip_len = ntohs(ip->ip_len); + ip->ip_off = ntohs(ip->ip_off); + + odest.s_addr = dest.s_addr = ip->ip_dst.s_addr; +rcvpath: + /* + * Receive adjacency. If the packet needs to be punted up to + * ip_input path for further analysis or because it is destined to + * one of our own addresses, run it through the receive-path + * firewall. To actually use this, the user must set up a firewall + * rule using pf(4), ipfw(2), etc that checks on lo0 interface + * under INBOUND direction (e.g. ` in quick on lo0` in pf) + * + * Cisco calls this Receive Path ACL, Juniper calls this Loopback + * Filter. The fact that this is FreeBSD makes us behave like + * Juniper (filtering on lo0) instead of Cisco (filtering via + * "ip receive " command). --james 2004/10/23 + */ + + /* + * Set coordinates to loopback interface, inbound direction, + * then call in the pfil_hooks. + */ + + if (ro.ro_rt) + RTFREE(ro.ro_rt); + + if (inet_pfil_hook.ph_busy_count == -1) + goto punt; + + if (pfil_run_hooks(&inet_pfil_hook, &m, loif, PFIL_IN, NULL) || + m == NULL) + return 1; + + ip = mtod(m, struct ip *); /* m may have changed by pfil hook */ + dest.s_addr = ip->ip_dst.s_addr; + + /* We do not support policy routing inside the receive path. + * If the user requests it, drop the packet. Ensure that this + * is documented in the user manual. + */ + if (odest.s_addr != dest.s_addr) + goto drop; + +punt: + /* + * Packet has been pre-processed by ip_fastforward for + * control plane evaluations. + */ + m->m_flags |= M_FASTFWD_PREPROC; + + ipstat.ips_transit_re++; + return 0; drop: if (m) m_freem(m); --FCuugMFkClbJLl1L Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ip_input.c" /* * Copyright (c) 1982, 1986, 1988, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)ip_input.c 8.2 (Berkeley) 1/4/94 * $FreeBSD: src/sys/netinet/ip_input.c,v 1.283.2.7 2004/10/03 17:04:40 mlaier Exp $ */ #include "opt_bootp.h" #include "opt_ipfw.h" #include "opt_ipstealth.h" #include "opt_ipsec.h" #include "opt_mac.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* XXX: Temporary until ipfw_ether and ipfw_bridge are converted. */ #include #include #ifdef IPSEC #include #include #endif #ifdef FAST_IPSEC #include #include #endif int rsvp_on = 0; int ipforwarding = 0; SYSCTL_INT(_net_inet_ip, IPCTL_FORWARDING, forwarding, CTLFLAG_RW, &ipforwarding, 0, "Enable IP forwarding between interfaces"); static int ipsendredirects = 1; /* XXX */ SYSCTL_INT(_net_inet_ip, IPCTL_SENDREDIRECTS, redirect, CTLFLAG_RW, &ipsendredirects, 0, "Enable sending IP redirects"); int ip_defttl = IPDEFTTL; SYSCTL_INT(_net_inet_ip, IPCTL_DEFTTL, ttl, CTLFLAG_RW, &ip_defttl, 0, "Maximum TTL on IP packets"); static int ip_dosourceroute = 0; SYSCTL_INT(_net_inet_ip, IPCTL_SOURCEROUTE, sourceroute, CTLFLAG_RW, &ip_dosourceroute, 0, "Enable forwarding source routed IP packets"); static int ip_acceptsourceroute = 0; SYSCTL_INT(_net_inet_ip, IPCTL_ACCEPTSOURCEROUTE, accept_sourceroute, CTLFLAG_RW, &ip_acceptsourceroute, 0, "Enable accepting source routed IP packets"); int ip_doopts = 1; /* 0 = ignore, 1 = process, 2 = reject */ SYSCTL_INT(_net_inet_ip, OID_AUTO, process_options, CTLFLAG_RW, &ip_doopts, 0, "Enable IP options processing ([LS]SRR, RR, TS)"); static int ip_keepfaith = 0; SYSCTL_INT(_net_inet_ip, IPCTL_KEEPFAITH, keepfaith, CTLFLAG_RW, &ip_keepfaith, 0, "Enable packet capture for FAITH IPv4->IPv6 translater daemon"); static int nipq = 0; /* total # of reass queues */ static int maxnipq; SYSCTL_INT(_net_inet_ip, OID_AUTO, maxfragpackets, CTLFLAG_RW, &maxnipq, 0, "Maximum number of IPv4 fragment reassembly queue entries"); static int maxfragsperpacket; SYSCTL_INT(_net_inet_ip, OID_AUTO, maxfragsperpacket, CTLFLAG_RW, &maxfragsperpacket, 0, "Maximum number of IPv4 fragments allowed per packet"); static int ip_sendsourcequench = 0; SYSCTL_INT(_net_inet_ip, OID_AUTO, sendsourcequench, CTLFLAG_RW, &ip_sendsourcequench, 0, "Enable the transmission of source quench packets"); int ip_do_randomid = 0; SYSCTL_INT(_net_inet_ip, OID_AUTO, random_id, CTLFLAG_RW, &ip_do_randomid, 0, "Assign random ip_id values"); /* * XXX - Setting ip_checkinterface mostly implements the receive side of * the Strong ES model described in RFC 1122, but since the routing table * and transmit implementation do not implement the Strong ES model, * setting this to 1 results in an odd hybrid. * * XXX - ip_checkinterface currently must be disabled if you use ipnat * to translate the destination address to another local interface. * * XXX - ip_checkinterface must be disabled if you add IP aliases * to the loopback interface instead of the interface where the * packets for those addresses are received. */ static int ip_checkinterface = 0; SYSCTL_INT(_net_inet_ip, OID_AUTO, check_interface, CTLFLAG_RW, &ip_checkinterface, 0, "Verify packet arrives on correct interface"); #ifdef DIAGNOSTIC static int ipprintfs = 0; #endif struct pfil_head inet_pfil_hook; static struct ifqueue ipintrq; static int ipqmaxlen = IFQ_MAXLEN; extern struct domain inetdomain; extern struct protosw inetsw[]; u_char ip_protox[IPPROTO_MAX]; struct in_ifaddrhead in_ifaddrhead; /* first inet address */ struct in_ifaddrhashhead *in_ifaddrhashtbl; /* inet addr hash table */ u_long in_ifaddrhmask; /* mask for hash table */ SYSCTL_INT(_net_inet_ip, IPCTL_INTRQMAXLEN, intr_queue_maxlen, CTLFLAG_RW, &ipintrq.ifq_maxlen, 0, "Maximum size of the IP input queue"); SYSCTL_INT(_net_inet_ip, IPCTL_INTRQDROPS, intr_queue_drops, CTLFLAG_RD, &ipintrq.ifq_drops, 0, "Number of packets dropped from the IP input queue"); struct ipstat ipstat; SYSCTL_STRUCT(_net_inet_ip, IPCTL_STATS, stats, CTLFLAG_RW, &ipstat, ipstat, "IP statistics (struct ipstat, netinet/ip_var.h)"); /* Packet reassembly stuff */ #define IPREASS_NHASH_LOG2 6 #define IPREASS_NHASH (1 << IPREASS_NHASH_LOG2) #define IPREASS_HMASK (IPREASS_NHASH - 1) #define IPREASS_HASH(x,y) \ (((((x) & 0xF) | ((((x) >> 8) & 0xF) << 4)) ^ (y)) & IPREASS_HMASK) static TAILQ_HEAD(ipqhead, ipq) ipq[IPREASS_NHASH]; struct mtx ipqlock; #define IPQ_LOCK() mtx_lock(&ipqlock) #define IPQ_UNLOCK() mtx_unlock(&ipqlock) #define IPQ_LOCK_INIT() mtx_init(&ipqlock, "ipqlock", NULL, MTX_DEF) #define IPQ_LOCK_ASSERT() mtx_assert(&ipqlock, MA_OWNED) #ifdef IPCTL_DEFMTU SYSCTL_INT(_net_inet_ip, IPCTL_DEFMTU, mtu, CTLFLAG_RW, &ip_mtu, 0, "Default MTU"); #endif #ifdef IPSTEALTH int ipstealth = 0; SYSCTL_INT(_net_inet_ip, OID_AUTO, stealth, CTLFLAG_RW, &ipstealth, 0, ""); #endif /* * ipfw_ether and ipfw_bridge hooks. * XXX: Temporary until those are converted to pfil_hooks as well. */ ip_fw_chk_t *ip_fw_chk_ptr = NULL; ip_dn_io_t *ip_dn_io_ptr = NULL; int fw_enable = 1; int fw_one_pass = 1; /* * XXX this is ugly. IP options source routing magic. */ struct ipoptrt { struct in_addr dst; /* final destination */ char nop; /* one NOP to align */ char srcopt[IPOPT_OFFSET + 1]; /* OPTVAL, OLEN and OFFSET */ struct in_addr route[MAX_IPOPTLEN/sizeof(struct in_addr)]; }; struct ipopt_tag { struct m_tag tag; int ip_nhops; struct ipoptrt ip_srcrt; }; static void save_rte(struct mbuf *, u_char *, struct in_addr); static int ip_dooptions(struct mbuf *m, int); static void ip_forward(struct mbuf *m, int srcrt); static void ip_freef(struct ipqhead *, struct ipq *); /* * IP initialization: fill in IP protocol switch table. * All protocols not implemented in kernel go to raw IP protocol handler. */ void ip_init() { register struct protosw *pr; register int i; TAILQ_INIT(&in_ifaddrhead); in_ifaddrhashtbl = hashinit(INADDR_NHASH, M_IFADDR, &in_ifaddrhmask); pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW); if (pr == 0) panic("ip_init: PF_INET not found"); /* Initialize the entire ip_protox[] array to IPPROTO_RAW. */ for (i = 0; i < IPPROTO_MAX; i++) ip_protox[i] = pr - inetsw; /* * Cycle through IP protocols and put them into the appropriate place * in ip_protox[]. */ for (pr = inetdomain.dom_protosw; pr < inetdomain.dom_protoswNPROTOSW; pr++) if (pr->pr_domain->dom_family == PF_INET && pr->pr_protocol && pr->pr_protocol != IPPROTO_RAW) { /* Be careful to only index valid IP protocols. */ if (pr->pr_protocol && pr->pr_protocol < IPPROTO_MAX) ip_protox[pr->pr_protocol] = pr - inetsw; } /* Initialize packet filter hooks. */ inet_pfil_hook.ph_type = PFIL_TYPE_AF; inet_pfil_hook.ph_af = AF_INET; if ((i = pfil_head_register(&inet_pfil_hook)) != 0) printf("%s: WARNING: unable to register pfil hook, " "error %d\n", __func__, i); /* Initialize IP reassembly queue. */ IPQ_LOCK_INIT(); for (i = 0; i < IPREASS_NHASH; i++) TAILQ_INIT(&ipq[i]); maxnipq = nmbclusters / 32; maxfragsperpacket = 16; /* Initialize various other remaining things. */ ip_id = time_second & 0xffff; ipintrq.ifq_maxlen = ipqmaxlen; mtx_init(&ipintrq.ifq_mtx, "ip_inq", NULL, MTX_DEF); netisr_register(NETISR_IP, ip_input, &ipintrq, NETISR_MPSAFE); } /* * Ip input routine. Checksum and byte swap header. If fragmented * try to reassemble. Process options. Pass to next level. */ void ip_input(struct mbuf *m) { struct ip *ip = NULL; struct in_ifaddr *ia = NULL; struct ifaddr *ifa; int checkif, hlen = 0; u_short sum; int dchg = 0; /* dest changed after fw */ struct in_addr odst; /* original dst address */ #ifdef FAST_IPSEC struct m_tag *mtag; struct tdb_ident *tdbi; struct secpolicy *sp; int s, error; #endif /* FAST_IPSEC */ M_ASSERTPKTHDR(m); if (m->m_flags & M_FASTFWD_OURS) { /* * ip_fastforward firewall changed dest to local. * We expect ip_len and ip_off in host byte order. */ m->m_flags &= ~M_FASTFWD_OURS; /* for reflected mbufs */ /* Set up some basic stuff */ ip = mtod(m, struct ip *); hlen = ip->ip_hl << 2; goto ours; } if (m->m_flags & M_FASTFWD_PREPROC){ /* * Packets that require further analysis or destined * to our own addresses in ip_fastforward. * We expect ip_len and ip_off in host byte order. */ m->m_flags &= ~M_FASTFWD_PREPROC; /* for reflected mbufs */ /* Setup some basic stuff */ ip = mtod(m, struct ip *); hlen = ip->ip_hl << 2; goto preprocessed; } ipstat.ips_total++; if (m->m_pkthdr.len < sizeof(struct ip)) goto tooshort; if (m->m_len < sizeof (struct ip) && (m = m_pullup(m, sizeof (struct ip))) == NULL) { ipstat.ips_toosmall++; return; } ip = mtod(m, struct ip *); if (ip->ip_v != IPVERSION) { ipstat.ips_badvers++; goto bad; } hlen = ip->ip_hl << 2; if (hlen < sizeof(struct ip)) { /* minimum header length */ ipstat.ips_badhlen++; goto bad; } if (hlen > m->m_len) { if ((m = m_pullup(m, hlen)) == NULL) { ipstat.ips_badhlen++; return; } ip = mtod(m, struct ip *); } /* 127/8 must not appear on wire - RFC1122 */ if ((ntohl(ip->ip_dst.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET || (ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET) { if ((m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) == 0) { ipstat.ips_badaddr++; goto bad; } } if (m->m_pkthdr.csum_flags & CSUM_IP_CHECKED) { sum = !(m->m_pkthdr.csum_flags & CSUM_IP_VALID); } else { if (hlen == sizeof(struct ip)) { sum = in_cksum_hdr(ip); } else { sum = in_cksum(m, hlen); } } if (sum) { ipstat.ips_badsum++; goto bad; } #ifdef ALTQ if (altq_input != NULL && (*altq_input)(m, AF_INET) == 0) /* packet is dropped by traffic conditioner */ return; #endif /* * Convert fields to host representation. */ ip->ip_len = ntohs(ip->ip_len); if (ip->ip_len < hlen) { ipstat.ips_badlen++; goto bad; } ip->ip_off = ntohs(ip->ip_off); /* * Check that the amount of data in the buffers * is as at least much as the IP header would have us expect. * Trim mbufs if longer than we expect. * Drop packet if shorter than we expect. */ if (m->m_pkthdr.len < ip->ip_len) { tooshort: ipstat.ips_tooshort++; goto bad; } if (m->m_pkthdr.len > ip->ip_len) { if (m->m_len == m->m_pkthdr.len) { m->m_len = ip->ip_len; m->m_pkthdr.len = ip->ip_len; } else m_adj(m, ip->ip_len - m->m_pkthdr.len); } preprocessed: #if defined(IPSEC) && !defined(IPSEC_FILTERGIF) /* * Bypass packet filtering for packets from a tunnel (gif). */ if (ipsec_getnhist(m)) goto passin; #endif #if defined(FAST_IPSEC) && !defined(IPSEC_FILTERGIF) /* * Bypass packet filtering for packets from a tunnel (gif). */ if (m_tag_find(m, PACKET_TAG_IPSEC_IN_DONE, NULL) != NULL) goto passin; #endif /* * Run through list of hooks for input packets. * * NB: Beware of the destination address changing (e.g. * by NAT rewriting). When this happens, tell * ip_forward to do the right thing. */ /* Jump over all PFIL processing if hooks are not active. */ if (inet_pfil_hook.ph_busy_count == -1) goto passin; odst = ip->ip_dst; if (pfil_run_hooks(&inet_pfil_hook, &m, m->m_pkthdr.rcvif, PFIL_IN, NULL) != 0) return; if (m == NULL) /* consumed by filter */ return; ip = mtod(m, struct ip *); dchg = (odst.s_addr != ip->ip_dst.s_addr); #ifdef IPFIREWALL_FORWARD if (m->m_flags & M_FASTFWD_OURS) { m->m_flags &= ~M_FASTFWD_OURS; goto ours; } dchg = (m_tag_find(m, PACKET_TAG_IPFORWARD, NULL) != NULL); #endif /* IPFIREWALL_FORWARD */ passin: /* * Process options and, if not destined for us, * ship it on. ip_dooptions returns 1 when an * error was detected (causing an icmp message * to be sent and the original packet to be freed). */ if (hlen > sizeof (struct ip) && ip_dooptions(m, 0)) return; /* greedy RSVP, snatches any PATH packet of the RSVP protocol and no * matter if it is destined to another node, or whether it is * a multicast one, RSVP wants it! and prevents it from being forwarded * anywhere else. Also checks if the rsvp daemon is running before * grabbing the packet. */ if (rsvp_on && ip->ip_p==IPPROTO_RSVP) goto ours; /* * Check our list of addresses, to see if the packet is for us. * If we don't have any addresses, assume any unicast packet * we receive might be for us (and let the upper layers deal * with it). */ if (TAILQ_EMPTY(&in_ifaddrhead) && (m->m_flags & (M_MCAST|M_BCAST)) == 0) goto ours; /* * Enable a consistency check between the destination address * and the arrival interface for a unicast packet (the RFC 1122 * strong ES model) if IP forwarding is disabled and the packet * is not locally generated and the packet is not subject to * 'ipfw fwd'. * * XXX - Checking also should be disabled if the destination * address is ipnat'ed to a different interface. * * XXX - Checking is incompatible with IP aliases added * to the loopback interface instead of the interface where * the packets are received. */ checkif = ip_checkinterface && (ipforwarding == 0) && m->m_pkthdr.rcvif != NULL && ((m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) == 0) && (dchg == 0); /* * Check for exact addresses in the hash bucket. */ LIST_FOREACH(ia, INADDR_HASH(ip->ip_dst.s_addr), ia_hash) { /* * If the address matches, verify that the packet * arrived via the correct interface if checking is * enabled. */ if (IA_SIN(ia)->sin_addr.s_addr == ip->ip_dst.s_addr && (!checkif || ia->ia_ifp == m->m_pkthdr.rcvif)) goto ours; } /* * Check for broadcast addresses. * * Only accept broadcast packets that arrive via the matching * interface. Reception of forwarded directed broadcasts would * be handled via ip_forward() and ether_output() with the loopback * into the stack for SIMPLEX interfaces handled by ether_output(). */ if (m->m_pkthdr.rcvif != NULL && m->m_pkthdr.rcvif->if_flags & IFF_BROADCAST) { TAILQ_FOREACH(ifa, &m->m_pkthdr.rcvif->if_addrhead, ifa_link) { if (ifa->ifa_addr->sa_family != AF_INET) continue; ia = ifatoia(ifa); if (satosin(&ia->ia_broadaddr)->sin_addr.s_addr == ip->ip_dst.s_addr) goto ours; if (ia->ia_netbroadcast.s_addr == ip->ip_dst.s_addr) goto ours; #ifdef BOOTP_COMPAT if (IA_SIN(ia)->sin_addr.s_addr == INADDR_ANY) goto ours; #endif } } if (IN_MULTICAST(ntohl(ip->ip_dst.s_addr))) { struct in_multi *inm; if (ip_mrouter) { /* * If we are acting as a multicast router, all * incoming multicast packets are passed to the * kernel-level multicast forwarding function. * The packet is returned (relatively) intact; if * ip_mforward() returns a non-zero value, the packet * must be discarded, else it may be accepted below. */ if (ip_mforward && ip_mforward(ip, m->m_pkthdr.rcvif, m, 0) != 0) { ipstat.ips_cantforward++; m_freem(m); return; } /* * The process-level routing daemon needs to receive * all multicast IGMP packets, whether or not this * host belongs to their destination groups. */ if (ip->ip_p == IPPROTO_IGMP) goto ours; ipstat.ips_forward++; } /* * See if we belong to the destination multicast group on the * arrival interface. */ IN_LOOKUP_MULTI(ip->ip_dst, m->m_pkthdr.rcvif, inm); if (inm == NULL) { ipstat.ips_notmember++; m_freem(m); return; } goto ours; } if (ip->ip_dst.s_addr == (u_long)INADDR_BROADCAST) goto ours; if (ip->ip_dst.s_addr == INADDR_ANY) goto ours; /* * FAITH(Firewall Aided Internet Translator) */ if (m->m_pkthdr.rcvif && m->m_pkthdr.rcvif->if_type == IFT_FAITH) { if (ip_keepfaith) { if (ip->ip_p == IPPROTO_TCP || ip->ip_p == IPPROTO_ICMP) goto ours; } m_freem(m); return; } /* * Not for us; forward if possible and desirable. */ if (ipforwarding == 0) { ipstat.ips_cantforward++; m_freem(m); } else { #ifdef IPSEC /* * Enforce inbound IPsec SPD. */ if (ipsec4_in_reject(m, NULL)) { ipsecstat.in_polvio++; goto bad; } #endif /* IPSEC */ #ifdef FAST_IPSEC mtag = m_tag_find(m, PACKET_TAG_IPSEC_IN_DONE, NULL); s = splnet(); if (mtag != NULL) { tdbi = (struct tdb_ident *)(mtag + 1); sp = ipsec_getpolicy(tdbi, IPSEC_DIR_INBOUND); } else { sp = ipsec_getpolicybyaddr(m, IPSEC_DIR_INBOUND, IP_FORWARDING, &error); } if (sp == NULL) { /* NB: can happen if error */ splx(s); /*XXX error stat???*/ DPRINTF(("ip_input: no SP for forwarding\n")); /*XXX*/ goto bad; } /* * Check security policy against packet attributes. */ error = ipsec_in_reject(sp, m); KEY_FREESP(&sp); splx(s); if (error) { ipstat.ips_cantforward++; goto bad; } #endif /* FAST_IPSEC */ ip_forward(m, dchg); } return; ours: #ifdef IPSTEALTH /* * IPSTEALTH: Process non-routing options only * if the packet is destined for us. */ if (ipstealth && hlen > sizeof (struct ip) && ip_dooptions(m, 1)) return; #endif /* IPSTEALTH */ /* Count the packet in the ip address stats */ if (ia != NULL) { ia->ia_ifa.if_ipackets++; ia->ia_ifa.if_ibytes += m->m_pkthdr.len; } /* * Attempt reassembly; if it succeeds, proceed. * ip_reass() will return a different mbuf. */ if (ip->ip_off & (IP_MF | IP_OFFMASK)) { m = ip_reass(m); if (m == NULL) return; ip = mtod(m, struct ip *); /* Get the header length of the reassembled packet */ hlen = ip->ip_hl << 2; } /* * Further protocols expect the packet length to be w/o the * IP header. */ ip->ip_len -= hlen; #ifdef IPSEC /* * enforce IPsec policy checking if we are seeing last header. * note that we do not visit this with protocols with pcb layer * code - like udp/tcp/raw ip. */ if ((inetsw[ip_protox[ip->ip_p]].pr_flags & PR_LASTHDR) != 0 && ipsec4_in_reject(m, NULL)) { ipsecstat.in_polvio++; goto bad; } #endif #if FAST_IPSEC /* * enforce IPsec policy checking if we are seeing last header. * note that we do not visit this with protocols with pcb layer * code - like udp/tcp/raw ip. */ if ((inetsw[ip_protox[ip->ip_p]].pr_flags & PR_LASTHDR) != 0) { /* * Check if the packet has already had IPsec processing * done. If so, then just pass it along. This tag gets * set during AH, ESP, etc. input handling, before the * packet is returned to the ip input queue for delivery. */ mtag = m_tag_find(m, PACKET_TAG_IPSEC_IN_DONE, NULL); s = splnet(); if (mtag != NULL) { tdbi = (struct tdb_ident *)(mtag + 1); sp = ipsec_getpolicy(tdbi, IPSEC_DIR_INBOUND); } else { sp = ipsec_getpolicybyaddr(m, IPSEC_DIR_INBOUND, IP_FORWARDING, &error); } if (sp != NULL) { /* * Check security policy against packet attributes. */ error = ipsec_in_reject(sp, m); KEY_FREESP(&sp); } else { /* XXX error stat??? */ error = EINVAL; DPRINTF(("ip_input: no SP, packet discarded\n"));/*XXX*/ goto bad; } splx(s); if (error) goto bad; } #endif /* FAST_IPSEC */ /* * Switch out to protocol's input routine. */ ipstat.ips_delivered++; (*inetsw[ip_protox[ip->ip_p]].pr_input)(m, hlen); return; bad: m_freem(m); } /* * Take incoming datagram fragment and try to reassemble it into * whole datagram. If the argument is the first fragment or one * in between the function will return NULL and store the mbuf * in the fragment chain. If the argument is the last fragment * the packet will be reassembled and the pointer to the new * mbuf returned for further processing. Only m_tags attached * to the first packet/fragment are preserved. * The IP header is *NOT* adjusted out of iplen. */ struct mbuf * ip_reass(struct mbuf *m) { struct ip *ip; struct mbuf *p, *q, *nq, *t; struct ipq *fp = NULL; struct ipqhead *head; int i, hlen, next; u_int8_t ecn, ecn0; u_short hash; /* If maxnipq is 0, never accept fragments. */ if (maxnipq == 0) { ipstat.ips_fragments++; ipstat.ips_fragdropped++; m_freem(m); return (NULL); } ip = mtod(m, struct ip *); hlen = ip->ip_hl << 2; hash = IPREASS_HASH(ip->ip_src.s_addr, ip->ip_id); head = &ipq[hash]; IPQ_LOCK(); /* * Look for queue of fragments * of this datagram. */ TAILQ_FOREACH(fp, head, ipq_list) if (ip->ip_id == fp->ipq_id && ip->ip_src.s_addr == fp->ipq_src.s_addr && ip->ip_dst.s_addr == fp->ipq_dst.s_addr && #ifdef MAC mac_fragment_match(m, fp) && #endif ip->ip_p == fp->ipq_p) goto found; fp = NULL; /* * Enforce upper bound on number of fragmented packets * for which we attempt reassembly; * If maxnipq is -1, accept all fragments without limitation. */ if ((nipq > maxnipq) && (maxnipq > 0)) { /* * drop something from the tail of the current queue * before proceeding further */ struct ipq *q = TAILQ_LAST(head, ipqhead); if (q == NULL) { /* gak */ for (i = 0; i < IPREASS_NHASH; i++) { struct ipq *r = TAILQ_LAST(&ipq[i], ipqhead); if (r) { ipstat.ips_fragtimeout += r->ipq_nfrags; ip_freef(&ipq[i], r); break; } } } else { ipstat.ips_fragtimeout += q->ipq_nfrags; ip_freef(head, q); } } found: /* * Adjust ip_len to not reflect header, * convert offset of this to bytes. */ ip->ip_len -= hlen; if (ip->ip_off & IP_MF) { /* * Make sure that fragments have a data length * that's a non-zero multiple of 8 bytes. */ if (ip->ip_len == 0 || (ip->ip_len & 0x7) != 0) { ipstat.ips_toosmall++; /* XXX */ goto dropfrag; } m->m_flags |= M_FRAG; } else m->m_flags &= ~M_FRAG; ip->ip_off <<= 3; /* * Attempt reassembly; if it succeeds, proceed. * ip_reass() will return a different mbuf. */ ipstat.ips_fragments++; m->m_pkthdr.header = ip; /* Previous ip_reass() started here. */ /* * Presence of header sizes in mbufs * would confuse code below. */ m->m_data += hlen; m->m_len -= hlen; /* * If first fragment to arrive, create a reassembly queue. */ if (fp == NULL) { if ((t = m_get(M_DONTWAIT, MT_FTABLE)) == NULL) goto dropfrag; fp = mtod(t, struct ipq *); #ifdef MAC if (mac_init_ipq(fp, M_NOWAIT) != 0) { m_free(t); goto dropfrag; } mac_create_ipq(m, fp); #endif TAILQ_INSERT_HEAD(head, fp, ipq_list); nipq++; fp->ipq_nfrags = 1; fp->ipq_ttl = IPFRAGTTL; fp->ipq_p = ip->ip_p; fp->ipq_id = ip->ip_id; fp->ipq_src = ip->ip_src; fp->ipq_dst = ip->ip_dst; fp->ipq_frags = m; m->m_nextpkt = NULL; goto inserted; } else { fp->ipq_nfrags++; #ifdef MAC mac_update_ipq(m, fp); #endif } #define GETIP(m) ((struct ip*)((m)->m_pkthdr.header)) /* * Handle ECN by comparing this segment with the first one; * if CE is set, do not lose CE. * drop if CE and not-ECT are mixed for the same packet. */ ecn = ip->ip_tos & IPTOS_ECN_MASK; ecn0 = GETIP(fp->ipq_frags)->ip_tos & IPTOS_ECN_MASK; if (ecn == IPTOS_ECN_CE) { if (ecn0 == IPTOS_ECN_NOTECT) goto dropfrag; if (ecn0 != IPTOS_ECN_CE) GETIP(fp->ipq_frags)->ip_tos |= IPTOS_ECN_CE; } if (ecn == IPTOS_ECN_NOTECT && ecn0 != IPTOS_ECN_NOTECT) goto dropfrag; /* * Find a segment which begins after this one does. */ for (p = NULL, q = fp->ipq_frags; q; p = q, q = q->m_nextpkt) if (GETIP(q)->ip_off > ip->ip_off) break; /* * If there is a preceding segment, it may provide some of * our data already. If so, drop the data from the incoming * segment. If it provides all of our data, drop us, otherwise * stick new segment in the proper place. * * If some of the data is dropped from the the preceding * segment, then it's checksum is invalidated. */ if (p) { i = GETIP(p)->ip_off + GETIP(p)->ip_len - ip->ip_off; if (i > 0) { if (i >= ip->ip_len) goto dropfrag; m_adj(m, i); m->m_pkthdr.csum_flags = 0; ip->ip_off += i; ip->ip_len -= i; } m->m_nextpkt = p->m_nextpkt; p->m_nextpkt = m; } else { m->m_nextpkt = fp->ipq_frags; fp->ipq_frags = m; } /* * While we overlap succeeding segments trim them or, * if they are completely covered, dequeue them. */ for (; q != NULL && ip->ip_off + ip->ip_len > GETIP(q)->ip_off; q = nq) { i = (ip->ip_off + ip->ip_len) - GETIP(q)->ip_off; if (i < GETIP(q)->ip_len) { GETIP(q)->ip_len -= i; GETIP(q)->ip_off += i; m_adj(q, i); q->m_pkthdr.csum_flags = 0; break; } nq = q->m_nextpkt; m->m_nextpkt = nq; ipstat.ips_fragdropped++; fp->ipq_nfrags--; m_freem(q); } inserted: /* * Check for complete reassembly and perform frag per packet * limiting. * * Frag limiting is performed here so that the nth frag has * a chance to complete the packet before we drop the packet. * As a result, n+1 frags are actually allowed per packet, but * only n will ever be stored. (n = maxfragsperpacket.) * */ next = 0; for (p = NULL, q = fp->ipq_frags; q; p = q, q = q->m_nextpkt) { if (GETIP(q)->ip_off != next) { if (fp->ipq_nfrags > maxfragsperpacket) { ipstat.ips_fragdropped += fp->ipq_nfrags; ip_freef(head, fp); } goto done; } next += GETIP(q)->ip_len; } /* Make sure the last packet didn't have the IP_MF flag */ if (p->m_flags & M_FRAG) { if (fp->ipq_nfrags > maxfragsperpacket) { ipstat.ips_fragdropped += fp->ipq_nfrags; ip_freef(head, fp); } goto done; } /* * Reassembly is complete. Make sure the packet is a sane size. */ q = fp->ipq_frags; ip = GETIP(q); if (next + (ip->ip_hl << 2) > IP_MAXPACKET) { ipstat.ips_toolong++; ipstat.ips_fragdropped += fp->ipq_nfrags; ip_freef(head, fp); goto done; } /* * Concatenate fragments. */ m = q; t = m->m_next; m->m_next = 0; m_cat(m, t); nq = q->m_nextpkt; q->m_nextpkt = 0; for (q = nq; q != NULL; q = nq) { nq = q->m_nextpkt; q->m_nextpkt = NULL; m->m_pkthdr.csum_flags &= q->m_pkthdr.csum_flags; m->m_pkthdr.csum_data += q->m_pkthdr.csum_data; m_cat(m, q); } #ifdef MAC mac_create_datagram_from_ipq(fp, m); mac_destroy_ipq(fp); #endif /* * Create header for new ip packet by modifying header of first * packet; dequeue and discard fragment reassembly header. * Make header visible. */ ip->ip_len = (ip->ip_hl << 2) + next; ip->ip_src = fp->ipq_src; ip->ip_dst = fp->ipq_dst; TAILQ_REMOVE(head, fp, ipq_list); nipq--; (void) m_free(dtom(fp)); m->m_len += (ip->ip_hl << 2); m->m_data -= (ip->ip_hl << 2); /* some debugging cruft by sklower, below, will go away soon */ if (m->m_flags & M_PKTHDR) /* XXX this should be done elsewhere */ m_fixhdr(m); ipstat.ips_reassembled++; IPQ_UNLOCK(); return (m); dropfrag: ipstat.ips_fragdropped++; if (fp != NULL) fp->ipq_nfrags--; m_freem(m); done: IPQ_UNLOCK(); return (NULL); #undef GETIP } /* * Free a fragment reassembly header and all * associated datagrams. */ static void ip_freef(fhp, fp) struct ipqhead *fhp; struct ipq *fp; { register struct mbuf *q; IPQ_LOCK_ASSERT(); while (fp->ipq_frags) { q = fp->ipq_frags; fp->ipq_frags = q->m_nextpkt; m_freem(q); } TAILQ_REMOVE(fhp, fp, ipq_list); (void) m_free(dtom(fp)); nipq--; } /* * IP timer processing; * if a timer expires on a reassembly * queue, discard it. */ void ip_slowtimo() { register struct ipq *fp; int s = splnet(); int i; IPQ_LOCK(); for (i = 0; i < IPREASS_NHASH; i++) { for(fp = TAILQ_FIRST(&ipq[i]); fp;) { struct ipq *fpp; fpp = fp; fp = TAILQ_NEXT(fp, ipq_list); if(--fpp->ipq_ttl == 0) { ipstat.ips_fragtimeout += fpp->ipq_nfrags; ip_freef(&ipq[i], fpp); } } } /* * If we are over the maximum number of fragments * (due to the limit being lowered), drain off * enough to get down to the new limit. */ if (maxnipq >= 0 && nipq > maxnipq) { for (i = 0; i < IPREASS_NHASH; i++) { while (nipq > maxnipq && !TAILQ_EMPTY(&ipq[i])) { ipstat.ips_fragdropped += TAILQ_FIRST(&ipq[i])->ipq_nfrags; ip_freef(&ipq[i], TAILQ_FIRST(&ipq[i])); } } } IPQ_UNLOCK(); splx(s); } /* * Drain off all datagram fragments. */ void ip_drain() { int i; IPQ_LOCK(); for (i = 0; i < IPREASS_NHASH; i++) { while(!TAILQ_EMPTY(&ipq[i])) { ipstat.ips_fragdropped += TAILQ_FIRST(&ipq[i])->ipq_nfrags; ip_freef(&ipq[i], TAILQ_FIRST(&ipq[i])); } } IPQ_UNLOCK(); in_rtqdrain(); } /* * Do option processing on a datagram, * possibly discarding it if bad options are encountered, * or forwarding it if source-routed. * The pass argument is used when operating in the IPSTEALTH * mode to tell what options to process: * [LS]SRR (pass 0) or the others (pass 1). * The reason for as many as two passes is that when doing IPSTEALTH, * non-routing options should be processed only if the packet is for us. * Returns 1 if packet has been forwarded/freed, * 0 if the packet should be processed further. */ static int ip_dooptions(struct mbuf *m, int pass) { struct ip *ip = mtod(m, struct ip *); u_char *cp; struct in_ifaddr *ia; int opt, optlen, cnt, off, code, type = ICMP_PARAMPROB, forward = 0; struct in_addr *sin, dst; n_time ntime; struct sockaddr_in ipaddr = { sizeof(ipaddr), AF_INET }; /* ignore or reject packets with IP options */ if (ip_doopts == 0) return 0; else if (ip_doopts == 2) { type = ICMP_UNREACH; code = ICMP_UNREACH_FILTER_PROHIB; goto bad; } dst = ip->ip_dst; cp = (u_char *)(ip + 1); cnt = (ip->ip_hl << 2) - sizeof (struct ip); for (; cnt > 0; cnt -= optlen, cp += optlen) { opt = cp[IPOPT_OPTVAL]; if (opt == IPOPT_EOL) break; if (opt == IPOPT_NOP) optlen = 1; else { if (cnt < IPOPT_OLEN + sizeof(*cp)) { code = &cp[IPOPT_OLEN] - (u_char *)ip; goto bad; } optlen = cp[IPOPT_OLEN]; if (optlen < IPOPT_OLEN + sizeof(*cp) || optlen > cnt) { code = &cp[IPOPT_OLEN] - (u_char *)ip; goto bad; } } switch (opt) { default: break; /* * Source routing with record. * Find interface with current destination address. * If none on this machine then drop if strictly routed, * or do nothing if loosely routed. * Record interface address and bring up next address * component. If strictly routed make sure next * address is on directly accessible net. */ case IPOPT_LSRR: case IPOPT_SSRR: #ifdef IPSTEALTH if (ipstealth && pass > 0) break; #endif if (optlen < IPOPT_OFFSET + sizeof(*cp)) { code = &cp[IPOPT_OLEN] - (u_char *)ip; goto bad; } if ((off = cp[IPOPT_OFFSET]) < IPOPT_MINOFF) { code = &cp[IPOPT_OFFSET] - (u_char *)ip; goto bad; } ipaddr.sin_addr = ip->ip_dst; ia = (struct in_ifaddr *) ifa_ifwithaddr((struct sockaddr *)&ipaddr); if (ia == NULL) { if (opt == IPOPT_SSRR) { type = ICMP_UNREACH; code = ICMP_UNREACH_SRCFAIL; goto bad; } if (!ip_dosourceroute) goto nosourcerouting; /* * Loose routing, and not at next destination * yet; nothing to do except forward. */ break; } off--; /* 0 origin */ if (off > optlen - (int)sizeof(struct in_addr)) { /* * End of source route. Should be for us. */ if (!ip_acceptsourceroute) goto nosourcerouting; save_rte(m, cp, ip->ip_src); break; } #ifdef IPSTEALTH if (ipstealth) goto dropit; #endif if (!ip_dosourceroute) { if (ipforwarding) { char buf[16]; /* aaa.bbb.ccc.ddd\0 */ /* * Acting as a router, so generate ICMP */ nosourcerouting: strcpy(buf, inet_ntoa(ip->ip_dst)); log(LOG_WARNING, "attempted source route from %s to %s\n", inet_ntoa(ip->ip_src), buf); type = ICMP_UNREACH; code = ICMP_UNREACH_SRCFAIL; goto bad; } else { /* * Not acting as a router, so silently drop. */ #ifdef IPSTEALTH dropit: #endif ipstat.ips_cantforward++; m_freem(m); return (1); } } /* * locate outgoing interface */ (void)memcpy(&ipaddr.sin_addr, cp + off, sizeof(ipaddr.sin_addr)); if (opt == IPOPT_SSRR) { #define INA struct in_ifaddr * #define SA struct sockaddr * if ((ia = (INA)ifa_ifwithdstaddr((SA)&ipaddr)) == NULL) ia = (INA)ifa_ifwithnet((SA)&ipaddr); } else ia = ip_rtaddr(ipaddr.sin_addr); if (ia == NULL) { type = ICMP_UNREACH; code = ICMP_UNREACH_SRCFAIL; goto bad; } ip->ip_dst = ipaddr.sin_addr; (void)memcpy(cp + off, &(IA_SIN(ia)->sin_addr), sizeof(struct in_addr)); cp[IPOPT_OFFSET] += sizeof(struct in_addr); /* * Let ip_intr's mcast routing check handle mcast pkts */ forward = !IN_MULTICAST(ntohl(ip->ip_dst.s_addr)); break; case IPOPT_RR: #ifdef IPSTEALTH if (ipstealth && pass == 0) break; #endif if (optlen < IPOPT_OFFSET + sizeof(*cp)) { code = &cp[IPOPT_OFFSET] - (u_char *)ip; goto bad; } if ((off = cp[IPOPT_OFFSET]) < IPOPT_MINOFF) { code = &cp[IPOPT_OFFSET] - (u_char *)ip; goto bad; } /* * If no space remains, ignore. */ off--; /* 0 origin */ if (off > optlen - (int)sizeof(struct in_addr)) break; (void)memcpy(&ipaddr.sin_addr, &ip->ip_dst, sizeof(ipaddr.sin_addr)); /* * locate outgoing interface; if we're the destination, * use the incoming interface (should be same). */ if ((ia = (INA)ifa_ifwithaddr((SA)&ipaddr)) == NULL && (ia = ip_rtaddr(ipaddr.sin_addr)) == NULL) { type = ICMP_UNREACH; code = ICMP_UNREACH_HOST; goto bad; } (void)memcpy(cp + off, &(IA_SIN(ia)->sin_addr), sizeof(struct in_addr)); cp[IPOPT_OFFSET] += sizeof(struct in_addr); break; case IPOPT_TS: #ifdef IPSTEALTH if (ipstealth && pass == 0) break; #endif code = cp - (u_char *)ip; if (optlen < 4 || optlen > 40) { code = &cp[IPOPT_OLEN] - (u_char *)ip; goto bad; } if ((off = cp[IPOPT_OFFSET]) < 5) { code = &cp[IPOPT_OLEN] - (u_char *)ip; goto bad; } if (off > optlen - (int)sizeof(int32_t)) { cp[IPOPT_OFFSET + 1] += (1 << 4); if ((cp[IPOPT_OFFSET + 1] & 0xf0) == 0) { code = &cp[IPOPT_OFFSET] - (u_char *)ip; goto bad; } break; } off--; /* 0 origin */ sin = (struct in_addr *)(cp + off); switch (cp[IPOPT_OFFSET + 1] & 0x0f) { case IPOPT_TS_TSONLY: break; case IPOPT_TS_TSANDADDR: if (off + sizeof(n_time) + sizeof(struct in_addr) > optlen) { code = &cp[IPOPT_OFFSET] - (u_char *)ip; goto bad; } ipaddr.sin_addr = dst; ia = (INA)ifaof_ifpforaddr((SA)&ipaddr, m->m_pkthdr.rcvif); if (ia == NULL) continue; (void)memcpy(sin, &IA_SIN(ia)->sin_addr, sizeof(struct in_addr)); cp[IPOPT_OFFSET] += sizeof(struct in_addr); off += sizeof(struct in_addr); break; case IPOPT_TS_PRESPEC: if (off + sizeof(n_time) + sizeof(struct in_addr) > optlen) { code = &cp[IPOPT_OFFSET] - (u_char *)ip; goto bad; } (void)memcpy(&ipaddr.sin_addr, sin, sizeof(struct in_addr)); if (ifa_ifwithaddr((SA)&ipaddr) == NULL) continue; cp[IPOPT_OFFSET] += sizeof(struct in_addr); off += sizeof(struct in_addr); break; default: code = &cp[IPOPT_OFFSET + 1] - (u_char *)ip; goto bad; } ntime = iptime(); (void)memcpy(cp + off, &ntime, sizeof(n_time)); cp[IPOPT_OFFSET] += sizeof(n_time); } } if (forward && ipforwarding) { ip_forward(m, 1); return (1); } return (0); bad: icmp_error(m, type, code, 0, 0); ipstat.ips_badoptions++; return (1); } /* * Given address of next destination (final or next hop), * return internet address info of interface to be used to get there. */ struct in_ifaddr * ip_rtaddr(dst) struct in_addr dst; { struct route sro; struct sockaddr_in *sin; struct in_ifaddr *ifa; bzero(&sro, sizeof(sro)); sin = (struct sockaddr_in *)&sro.ro_dst; sin->sin_family = AF_INET; sin->sin_len = sizeof(*sin); sin->sin_addr = dst; rtalloc_ign(&sro, RTF_CLONING); if (sro.ro_rt == NULL) return ((struct in_ifaddr *)0); ifa = ifatoia(sro.ro_rt->rt_ifa); RTFREE(sro.ro_rt); return ifa; } /* * Save incoming source route for use in replies, * to be picked up later by ip_srcroute if the receiver is interested. */ static void save_rte(m, option, dst) struct mbuf *m; u_char *option; struct in_addr dst; { unsigned olen; struct ipopt_tag *opts; opts = (struct ipopt_tag *)m_tag_get(PACKET_TAG_IPOPTIONS, sizeof(struct ipopt_tag), M_NOWAIT); if (opts == NULL) return; olen = option[IPOPT_OLEN]; #ifdef DIAGNOSTIC if (ipprintfs) printf("save_rte: olen %d\n", olen); #endif if (olen > sizeof(opts->ip_srcrt) - (1 + sizeof(dst))) return; bcopy(option, opts->ip_srcrt.srcopt, olen); opts->ip_nhops = (olen - IPOPT_OFFSET - 1) / sizeof(struct in_addr); opts->ip_srcrt.dst = dst; m_tag_prepend(m, (struct m_tag *)opts); } /* * Retrieve incoming source route for use in replies, * in the same form used by setsockopt. * The first hop is placed before the options, will be removed later. */ struct mbuf * ip_srcroute(m0) struct mbuf *m0; { register struct in_addr *p, *q; register struct mbuf *m; struct ipopt_tag *opts; opts = (struct ipopt_tag *)m_tag_find(m0, PACKET_TAG_IPOPTIONS, NULL); if (opts == NULL) return ((struct mbuf *)0); if (opts->ip_nhops == 0) return ((struct mbuf *)0); m = m_get(M_DONTWAIT, MT_HEADER); if (m == NULL) return ((struct mbuf *)0); #define OPTSIZ (sizeof(opts->ip_srcrt.nop) + sizeof(opts->ip_srcrt.srcopt)) /* length is (nhops+1)*sizeof(addr) + sizeof(nop + srcrt header) */ m->m_len = opts->ip_nhops * sizeof(struct in_addr) + sizeof(struct in_addr) + OPTSIZ; #ifdef DIAGNOSTIC if (ipprintfs) printf("ip_srcroute: nhops %d mlen %d", opts->ip_nhops, m->m_len); #endif /* * First save first hop for return route */ p = &(opts->ip_srcrt.route[opts->ip_nhops - 1]); *(mtod(m, struct in_addr *)) = *p--; #ifdef DIAGNOSTIC if (ipprintfs) printf(" hops %lx", (u_long)ntohl(mtod(m, struct in_addr *)->s_addr)); #endif /* * Copy option fields and padding (nop) to mbuf. */ opts->ip_srcrt.nop = IPOPT_NOP; opts->ip_srcrt.srcopt[IPOPT_OFFSET] = IPOPT_MINOFF; (void)memcpy(mtod(m, caddr_t) + sizeof(struct in_addr), &(opts->ip_srcrt.nop), OPTSIZ); q = (struct in_addr *)(mtod(m, caddr_t) + sizeof(struct in_addr) + OPTSIZ); #undef OPTSIZ /* * Record return path as an IP source route, * reversing the path (pointers are now aligned). */ while (p >= opts->ip_srcrt.route) { #ifdef DIAGNOSTIC if (ipprintfs) printf(" %lx", (u_long)ntohl(q->s_addr)); #endif *q++ = *p--; } /* * Last hop goes to final destination. */ *q = opts->ip_srcrt.dst; #ifdef DIAGNOSTIC if (ipprintfs) printf(" %lx\n", (u_long)ntohl(q->s_addr)); #endif m_tag_delete(m0, (struct m_tag *)opts); return (m); } /* * Strip out IP options, at higher * level protocol in the kernel. * Second argument is buffer to which options * will be moved, and return value is their length. * XXX should be deleted; last arg currently ignored. */ void ip_stripoptions(m, mopt) register struct mbuf *m; struct mbuf *mopt; { register int i; struct ip *ip = mtod(m, struct ip *); register caddr_t opts; int olen; olen = (ip->ip_hl << 2) - sizeof (struct ip); opts = (caddr_t)(ip + 1); i = m->m_len - (sizeof (struct ip) + olen); bcopy(opts + olen, opts, (unsigned)i); m->m_len -= olen; if (m->m_flags & M_PKTHDR) m->m_pkthdr.len -= olen; ip->ip_v = IPVERSION; ip->ip_hl = sizeof(struct ip) >> 2; } u_char inetctlerrmap[PRC_NCMDS] = { 0, 0, 0, 0, 0, EMSGSIZE, EHOSTDOWN, EHOSTUNREACH, EHOSTUNREACH, EHOSTUNREACH, ECONNREFUSED, ECONNREFUSED, EMSGSIZE, EHOSTUNREACH, 0, 0, 0, 0, EHOSTUNREACH, 0, ENOPROTOOPT, ECONNREFUSED }; /* * Forward a packet. If some error occurs return the sender * an icmp packet. Note we can't always generate a meaningful * icmp message because icmp doesn't have a large enough repertoire * of codes and types. * * If not forwarding, just drop the packet. This could be confusing * if ipforwarding was zero but some routing protocol was advancing * us as a gateway to somewhere. However, we must let the routing * protocol deal with that. * * The srcrt parameter indicates whether the packet is being forwarded * via a source route. */ void ip_forward(struct mbuf *m, int srcrt) { struct ip *ip = mtod(m, struct ip *); struct in_ifaddr *ia = NULL; int error, type = 0, code = 0; struct mbuf *mcopy; struct in_addr dest; struct ifnet *destifp, dummyifp; #ifdef DIAGNOSTIC if (ipprintfs) printf("forward: src %lx dst %lx ttl %x\n", (u_long)ip->ip_src.s_addr, (u_long)ip->ip_dst.s_addr, ip->ip_ttl); #endif if (m->m_flags & (M_BCAST|M_MCAST) || in_canforward(ip->ip_dst) == 0) { ipstat.ips_cantforward++; m_freem(m); return; } #ifdef IPSTEALTH if (!ipstealth) { #endif if (ip->ip_ttl <= IPTTLDEC) { icmp_error(m, ICMP_TIMXCEED, ICMP_TIMXCEED_INTRANS, 0, 0); return; } #ifdef IPSTEALTH } #endif if (!srcrt && (ia = ip_rtaddr(ip->ip_dst)) == NULL) { icmp_error(m, ICMP_UNREACH, ICMP_UNREACH_HOST, 0, 0); return; } /* * Save the IP header and at most 8 bytes of the payload, * in case we need to generate an ICMP message to the src. * * XXX this can be optimized a lot by saving the data in a local * buffer on the stack (72 bytes at most), and only allocating the * mbuf if really necessary. The vast majority of the packets * are forwarded without having to send an ICMP back (either * because unnecessary, or because rate limited), so we are * really we are wasting a lot of work here. * * We don't use m_copy() because it might return a reference * to a shared cluster. Both this function and ip_output() * assume exclusive access to the IP header in `m', so any * data in a cluster may change before we reach icmp_error(). */ MGET(mcopy, M_DONTWAIT, m->m_type); if (mcopy != NULL && !m_dup_pkthdr(mcopy, m, M_DONTWAIT)) { /* * It's probably ok if the pkthdr dup fails (because * the deep copy of the tag chain failed), but for now * be conservative and just discard the copy since * code below may some day want the tags. */ m_free(mcopy); mcopy = NULL; } if (mcopy != NULL) { mcopy->m_len = imin((ip->ip_hl << 2) + 8, (int)ip->ip_len); mcopy->m_pkthdr.len = mcopy->m_len; m_copydata(m, 0, mcopy->m_len, mtod(mcopy, caddr_t)); } #ifdef IPSTEALTH if (!ipstealth) { #endif ip->ip_ttl -= IPTTLDEC; #ifdef IPSTEALTH } #endif /* * If forwarding packet using same interface that it came in on, * perhaps should send a redirect to sender to shortcut a hop. * Only send redirect if source is sending directly to us, * and if packet was not source routed (or has any options). * Also, don't send redirect if forwarding using a default route * or a route modified by a redirect. */ dest.s_addr = 0; if (!srcrt && ipsendredirects && ia->ia_ifp == m->m_pkthdr.rcvif) { struct sockaddr_in *sin; struct route ro; struct rtentry *rt; bzero(&ro, sizeof(ro)); sin = (struct sockaddr_in *)&ro.ro_dst; sin->sin_family = AF_INET; sin->sin_len = sizeof(*sin); sin->sin_addr = ip->ip_dst; rtalloc_ign(&ro, RTF_CLONING); rt = ro.ro_rt; if (rt && (rt->rt_flags & (RTF_DYNAMIC|RTF_MODIFIED)) == 0 && satosin(rt_key(rt))->sin_addr.s_addr != 0) { #define RTA(rt) ((struct in_ifaddr *)(rt->rt_ifa)) u_long src = ntohl(ip->ip_src.s_addr); if (RTA(rt) && (src & RTA(rt)->ia_subnetmask) == RTA(rt)->ia_subnet) { if (rt->rt_flags & RTF_GATEWAY) dest.s_addr = satosin(rt->rt_gateway)->sin_addr.s_addr; else dest.s_addr = ip->ip_dst.s_addr; /* Router requirements says to only send host redirects */ type = ICMP_REDIRECT; code = ICMP_REDIRECT_HOST; #ifdef DIAGNOSTIC if (ipprintfs) printf("redirect (%d) to %lx\n", code, (u_long)dest.s_addr); #endif } } if (rt) RTFREE(rt); } error = ip_output(m, (struct mbuf *)0, NULL, IP_FORWARDING, 0, NULL); if (error) ipstat.ips_cantforward++; else { ipstat.ips_forward++; if (type) ipstat.ips_redirectsent++; else { if (mcopy) m_freem(mcopy); return; } } if (mcopy == NULL) return; destifp = NULL; switch (error) { case 0: /* forwarded, but need redirect */ /* type, code set above */ break; case ENETUNREACH: /* shouldn't happen, checked above */ case EHOSTUNREACH: case ENETDOWN: case EHOSTDOWN: default: type = ICMP_UNREACH; code = ICMP_UNREACH_HOST; break; case EMSGSIZE: type = ICMP_UNREACH; code = ICMP_UNREACH_NEEDFRAG; #if defined(IPSEC) || defined(FAST_IPSEC) /* * If the packet is routed over IPsec tunnel, tell the * originator the tunnel MTU. * tunnel MTU = if MTU - sizeof(IP) - ESP/AH hdrsiz * XXX quickhack!!! */ { struct secpolicy *sp = NULL; int ipsecerror; int ipsechdr; struct route *ro; #ifdef IPSEC sp = ipsec4_getpolicybyaddr(mcopy, IPSEC_DIR_OUTBOUND, IP_FORWARDING, &ipsecerror); #else /* FAST_IPSEC */ sp = ipsec_getpolicybyaddr(mcopy, IPSEC_DIR_OUTBOUND, IP_FORWARDING, &ipsecerror); #endif if (sp != NULL) { /* count IPsec header size */ ipsechdr = ipsec4_hdrsiz(mcopy, IPSEC_DIR_OUTBOUND, NULL); /* * find the correct route for outer IPv4 * header, compute tunnel MTU. * * XXX BUG ALERT * The "dummyifp" code relies upon the fact * that icmp_error() touches only ifp->if_mtu. */ /*XXX*/ destifp = NULL; if (sp->req != NULL && sp->req->sav != NULL && sp->req->sav->sah != NULL) { ro = &sp->req->sav->sah->sa_route; if (ro->ro_rt && ro->ro_rt->rt_ifp) { dummyifp.if_mtu = ro->ro_rt->rt_rmx.rmx_mtu ? ro->ro_rt->rt_rmx.rmx_mtu : ro->ro_rt->rt_ifp->if_mtu; dummyifp.if_mtu -= ipsechdr; destifp = &dummyifp; } } #ifdef IPSEC key_freesp(sp); #else /* FAST_IPSEC */ KEY_FREESP(&sp); #endif ipstat.ips_cantfrag++; break; } else #endif /*IPSEC || FAST_IPSEC*/ /* * When doing source routing 'ia' can be NULL. Fall back * to the minimum guaranteed routeable packet size and use * the same hack as IPSEC to setup a dummyifp for icmp. */ if (ia == NULL) { dummyifp.if_mtu = IP_MSS; destifp = &dummyifp; } else destifp = ia->ia_ifp; #if defined(IPSEC) || defined(FAST_IPSEC) } #endif /*IPSEC || FAST_IPSEC*/ ipstat.ips_cantfrag++; break; case ENOBUFS: /* * A router should not generate ICMP_SOURCEQUENCH as * required in RFC1812 Requirements for IP Version 4 Routers. * Source quench could be a big problem under DoS attacks, * or if the underlying interface is rate-limited. * Those who need source quench packets may re-enable them * via the net.inet.ip.sendsourcequench sysctl. */ if (ip_sendsourcequench == 0) { m_freem(mcopy); return; } else { type = ICMP_SOURCEQUENCH; code = 0; } break; case EACCES: /* ipfw denied packet */ m_freem(mcopy); return; } icmp_error(mcopy, type, code, dest.s_addr, destifp); } void ip_savecontrol(inp, mp, ip, m) register struct inpcb *inp; register struct mbuf **mp; register struct ip *ip; register struct mbuf *m; { if (inp->inp_socket->so_options & (SO_BINTIME | SO_TIMESTAMP)) { struct bintime bt; bintime(&bt); if (inp->inp_socket->so_options & SO_BINTIME) { *mp = sbcreatecontrol((caddr_t) &bt, sizeof(bt), SCM_BINTIME, SOL_SOCKET); if (*mp) mp = &(*mp)->m_next; } if (inp->inp_socket->so_options & SO_TIMESTAMP) { struct timeval tv; bintime2timeval(&bt, &tv); *mp = sbcreatecontrol((caddr_t) &tv, sizeof(tv), SCM_TIMESTAMP, SOL_SOCKET); if (*mp) mp = &(*mp)->m_next; } } if (inp->inp_flags & INP_RECVDSTADDR) { *mp = sbcreatecontrol((caddr_t) &ip->ip_dst, sizeof(struct in_addr), IP_RECVDSTADDR, IPPROTO_IP); if (*mp) mp = &(*mp)->m_next; } if (inp->inp_flags & INP_RECVTTL) { *mp = sbcreatecontrol((caddr_t) &ip->ip_ttl, sizeof(u_char), IP_RECVTTL, IPPROTO_IP); if (*mp) mp = &(*mp)->m_next; } #ifdef notyet /* XXX * Moving these out of udp_input() made them even more broken * than they already were. */ /* options were tossed already */ if (inp->inp_flags & INP_RECVOPTS) { *mp = sbcreatecontrol((caddr_t) opts_deleted_above, sizeof(struct in_addr), IP_RECVOPTS, IPPROTO_IP); if (*mp) mp = &(*mp)->m_next; } /* ip_srcroute doesn't do what we want here, need to fix */ if (inp->inp_flags & INP_RECVRETOPTS) { *mp = sbcreatecontrol((caddr_t) ip_srcroute(m), sizeof(struct in_addr), IP_RECVRETOPTS, IPPROTO_IP); if (*mp) mp = &(*mp)->m_next; } #endif if (inp->inp_flags & INP_RECVIF) { struct ifnet *ifp; struct sdlbuf { struct sockaddr_dl sdl; u_char pad[32]; } sdlbuf; struct sockaddr_dl *sdp; struct sockaddr_dl *sdl2 = &sdlbuf.sdl; if (((ifp = m->m_pkthdr.rcvif)) && ( ifp->if_index && (ifp->if_index <= if_index))) { sdp = (struct sockaddr_dl *) (ifaddr_byindex(ifp->if_index)->ifa_addr); /* * Change our mind and don't try copy. */ if ((sdp->sdl_family != AF_LINK) || (sdp->sdl_len > sizeof(sdlbuf))) { goto makedummy; } bcopy(sdp, sdl2, sdp->sdl_len); } else { makedummy: sdl2->sdl_len = offsetof(struct sockaddr_dl, sdl_data[0]); sdl2->sdl_family = AF_LINK; sdl2->sdl_index = 0; sdl2->sdl_nlen = sdl2->sdl_alen = sdl2->sdl_slen = 0; } *mp = sbcreatecontrol((caddr_t) sdl2, sdl2->sdl_len, IP_RECVIF, IPPROTO_IP); if (*mp) mp = &(*mp)->m_next; } } /* * XXX these routines are called from the upper part of the kernel. * They need to be locked when we remove Giant. * * They could also be moved to ip_mroute.c, since all the RSVP * handling is done there already. */ static int ip_rsvp_on; struct socket *ip_rsvpd; int ip_rsvp_init(struct socket *so) { if (so->so_type != SOCK_RAW || so->so_proto->pr_protocol != IPPROTO_RSVP) return EOPNOTSUPP; if (ip_rsvpd != NULL) return EADDRINUSE; ip_rsvpd = so; /* * This may seem silly, but we need to be sure we don't over-increment * the RSVP counter, in case something slips up. */ if (!ip_rsvp_on) { ip_rsvp_on = 1; rsvp_on++; } return 0; } int ip_rsvp_done(void) { ip_rsvpd = NULL; /* * This may seem silly, but we need to be sure we don't over-decrement * the RSVP counter, in case something slips up. */ if (ip_rsvp_on) { ip_rsvp_on = 0; rsvp_on--; } return 0; } void rsvp_input(struct mbuf *m, int off) /* XXX must fixup manually */ { if (rsvp_input_p) { /* call the real one if loaded */ rsvp_input_p(m, off); return; } /* Can still get packets with rsvp_on = 0 if there is a local member * of the group to which the RSVP packet is addressed. But in this * case we want to throw the packet away. */ if (!rsvp_on) { m_freem(m); return; } if (ip_rsvpd != NULL) { rip_input(m, off); return; } /* Drop the packet */ m_freem(m); } --FCuugMFkClbJLl1L Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ip_input.c.diff" --- ip_input.org.c Mon Dec 27 01:53:29 2004 +++ ip_input.c Mon Dec 27 01:51:55 2004 @@ -27,7 +27,7 @@ * SUCH DAMAGE. * * @(#)ip_input.c 8.2 (Berkeley) 1/4/94 - * $FreeBSD: /repoman/r/ncvs/src/sys/netinet/ip_input.c,v 1.292 2004/10/19 15:45:57 andre Exp $ + * $FreeBSD: src/sys/netinet/ip_input.c,v 1.283.2.7 2004/10/03 17:04:40 mlaier Exp $ */ #include "opt_bootp.h" @@ -156,7 +156,7 @@ static int ipprintfs = 0; #endif -struct pfil_head inet_pfil_hook; /* Packet filter hooks */ +struct pfil_head inet_pfil_hook; static struct ifqueue ipintrq; static int ipqmaxlen = IFQ_MAXLEN; @@ -261,7 +261,7 @@ if (pr->pr_domain->dom_family == PF_INET && pr->pr_protocol && pr->pr_protocol != IPPROTO_RAW) { /* Be careful to only index valid IP protocols. */ - if (pr->pr_protocol <= IPPROTO_MAX) + if (pr->pr_protocol && pr->pr_protocol < IPPROTO_MAX) ip_protox[pr->pr_protocol] = pr - inetsw; } @@ -311,16 +311,29 @@ if (m->m_flags & M_FASTFWD_OURS) { /* - * Firewall or NAT changed destination to local. - * We expect ip_len and ip_off to be in host byte order. + * ip_fastforward firewall changed dest to local. + * We expect ip_len and ip_off in host byte order. */ - m->m_flags &= ~M_FASTFWD_OURS; - /* Set up some basics that will be used later. */ + m->m_flags &= ~M_FASTFWD_OURS; /* for reflected mbufs */ + /* Set up some basic stuff */ ip = mtod(m, struct ip *); hlen = ip->ip_hl << 2; goto ours; } + if (m->m_flags & M_FASTFWD_PREPROC){ + /* + * Packets that require further analysis or destined + * to our own addresses in ip_fastforward. + * We expect ip_len and ip_off in host byte order. + */ + m->m_flags &= ~M_FASTFWD_PREPROC; /* for reflected mbufs */ + /* Setup some basic stuff */ + ip = mtod(m, struct ip *); + hlen = ip->ip_hl << 2; + goto preprocessed; + } + ipstat.ips_total++; if (m->m_pkthdr.len < sizeof(struct ip)) @@ -408,6 +421,9 @@ } else m_adj(m, ip->ip_len - m->m_pkthdr.len); } + +preprocessed: + #if defined(IPSEC) && !defined(IPSEC_FILTERGIF) /* * Bypass packet filtering for packets from a tunnel (gif). @@ -1143,67 +1159,6 @@ IPQ_UNLOCK(); in_rtqdrain(); } - -/* - * The protocol to be inserted into ip_protox[] must be already registered - * in inetsw[], either statically or through pf_proto_register(). - */ -int -ipproto_register(u_char ipproto) -{ - struct protosw *pr; - - /* Sanity checks. */ - if (ipproto == 0) - return (EPROTONOSUPPORT); - - /* - * The protocol slot must not be occupied by another protocol - * already. An index pointing to IPPROTO_RAW is unused. - */ - pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW); - if (pr == NULL) - return (EPFNOSUPPORT); - if (ip_protox[ipproto] != pr - inetsw) /* IPPROTO_RAW */ - return (EEXIST); - - /* Find the protocol position in inetsw[] and set the index. */ - for (pr = inetdomain.dom_protosw; - pr < inetdomain.dom_protoswNPROTOSW; pr++) { - if (pr->pr_domain->dom_family == PF_INET && - pr->pr_protocol && pr->pr_protocol == ipproto) { - /* Be careful to only index valid IP protocols. */ - if (pr->pr_protocol <= IPPROTO_MAX) { - ip_protox[pr->pr_protocol] = pr - inetsw; - return (0); - } else - return (EINVAL); - } - } - return (EPROTONOSUPPORT); -} - -int -ipproto_unregister(u_char ipproto) -{ - struct protosw *pr; - - /* Sanity checks. */ - if (ipproto == 0) - return (EPROTONOSUPPORT); - - /* Check if the protocol was indeed registered. */ - pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW); - if (pr == NULL) - return (EPFNOSUPPORT); - if (ip_protox[ipproto] == pr - inetsw) /* IPPROTO_RAW */ - return (ENOENT); - - /* Reset the protocol slot to IPPROTO_RAW. */ - ip_protox[ipproto] = pr - inetsw; - return (0); -} - /* * Do option processing on a datagram, --FCuugMFkClbJLl1L Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ip_var.h" /* * Copyright (c) 1982, 1986, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)ip_var.h 8.2 (Berkeley) 1/9/95 * $FreeBSD: src/sys/netinet/ip_var.h,v 1.89.2.2 2004/09/23 16:38:53 andre Exp $ */ #ifndef _NETINET_IP_VAR_H_ #define _NETINET_IP_VAR_H_ #include /* * Overlay for ip header used by other protocols (tcp, udp). */ struct ipovly { u_char ih_x1[9]; /* (unused) */ u_char ih_pr; /* protocol */ u_short ih_len; /* protocol length */ struct in_addr ih_src; /* source internet address */ struct in_addr ih_dst; /* destination internet address */ }; #ifdef _KERNEL /* * Ip reassembly queue structure. Each fragment * being reassembled is attached to one of these structures. * They are timed out after ipq_ttl drops to 0, and may also * be reclaimed if memory becomes tight. */ struct ipq { TAILQ_ENTRY(ipq) ipq_list; /* to other reass headers */ u_char ipq_ttl; /* time for reass q to live */ u_char ipq_p; /* protocol of this fragment */ u_short ipq_id; /* sequence id for reassembly */ struct mbuf *ipq_frags; /* to ip headers of fragments */ struct in_addr ipq_src,ipq_dst; u_char ipq_nfrags; /* # frags in this packet */ struct label *ipq_label; /* MAC label */ }; #endif /* _KERNEL */ /* * Structure stored in mbuf in inpcb.ip_options * and passed to ip_output when ip options are in use. * The actual length of the options (including ipopt_dst) * is in m_len. */ #define MAX_IPOPTLEN 40 struct ipoption { struct in_addr ipopt_dst; /* first-hop dst if source routed */ char ipopt_list[MAX_IPOPTLEN]; /* options proper */ }; /* * Structure attached to inpcb.ip_moptions and * passed to ip_output when IP multicast options are in use. */ struct ip_moptions { struct ifnet *imo_multicast_ifp; /* ifp for outgoing multicasts */ struct in_addr imo_multicast_addr; /* ifindex/addr on MULTICAST_IF */ u_char imo_multicast_ttl; /* TTL for outgoing multicasts */ u_char imo_multicast_loop; /* 1 => hear sends if a member */ u_short imo_num_memberships; /* no. memberships this socket */ struct in_multi *imo_membership[IP_MAX_MEMBERSHIPS]; u_long imo_multicast_vif; /* vif num outgoing multicasts */ }; struct ipstat { u_long ips_total; /* total packets received */ u_long ips_badsum; /* checksum bad */ u_long ips_tooshort; /* packet too short */ u_long ips_toosmall; /* not enough data */ u_long ips_badhlen; /* ip header length < data size */ u_long ips_badlen; /* ip length < ip header length */ u_long ips_fragments; /* fragments received */ u_long ips_fragdropped; /* frags dropped (dups, out of space) */ u_long ips_fragtimeout; /* fragments timed out */ u_long ips_forward; /* packets forwarded */ u_long ips_fastforward; /* packets fast forwarded */ u_long ips_transit_re; /* packets sent to receive path from fastfwd */ u_long ips_cantforward; /* packets rcvd for unreachable dest */ u_long ips_redirectsent; /* packets forwarded on same net */ u_long ips_noproto; /* unknown or unsupported protocol */ u_long ips_delivered; /* datagrams delivered to upper level*/ u_long ips_localout; /* total ip packets generated here */ u_long ips_odropped; /* lost packets due to nobufs, etc. */ u_long ips_reassembled; /* total packets reassembled ok */ u_long ips_fragmented; /* datagrams successfully fragmented */ u_long ips_ofragments; /* output fragments created */ u_long ips_cantfrag; /* don't fragment flag was set, etc. */ u_long ips_badoptions; /* error in option processing */ u_long ips_noroute; /* packets discarded due to no route */ u_long ips_badvers; /* ip version != 4 */ u_long ips_rawout; /* total raw ip packets generated */ u_long ips_toolong; /* ip length > max ip packet size */ u_long ips_notmember; /* multicasts for unregistered grps */ u_long ips_nogif; /* no match gif found */ u_long ips_badaddr; /* invalid address on header */ }; #ifdef _KERNEL /* flags passed to ip_output as last parameter */ #define IP_FORWARDING 0x1 /* most of ip header exists */ #define IP_RAWOUTPUT 0x2 /* raw ip header exists */ #define IP_SENDONES 0x4 /* send all-ones broadcast */ #define IP_ROUTETOIF SO_DONTROUTE /* bypass routing tables */ #define IP_ALLOWBROADCAST SO_BROADCAST /* can send broadcast packets */ /* mbuf flag used by ip_fastfwd */ #define M_FASTFWD_OURS M_PROTO1 /* changed dst to local */ #define M_FASTFWD_PREPROC M_PROTO2 /* bypass pre processing */ struct ip; struct inpcb; struct route; struct sockopt; extern struct ipstat ipstat; extern u_short ip_id; /* ip packet ctr, for ids */ extern int ip_defttl; /* default IP ttl */ extern int ipforwarding; /* ip forwarding */ extern int ip_doopts; /* process or ignore IP options */ #ifdef IPSTEALTH extern int ipstealth; /* stealth forwarding */ #endif extern u_char ip_protox[]; extern struct socket *ip_rsvpd; /* reservation protocol daemon */ extern struct socket *ip_mrouter; /* multicast routing daemon */ extern int (*legal_vif_num)(int); extern u_long (*ip_mcast_src)(int); extern int rsvp_on; extern struct pr_usrreqs rip_usrreqs; int ip_ctloutput(struct socket *, struct sockopt *sopt); void ip_drain(void); int ip_fragment(struct ip *ip, struct mbuf **m_frag, int mtu, u_long if_hwassist_flags, int sw_csum); void ip_freemoptions(struct ip_moptions *); void ip_init(void); extern int (*ip_mforward)(struct ip *, struct ifnet *, struct mbuf *, struct ip_moptions *); int ip_output(struct mbuf *, struct mbuf *, struct route *, int, struct ip_moptions *, struct inpcb *); struct mbuf * ip_reass(struct mbuf *); struct in_ifaddr * ip_rtaddr(struct in_addr); void ip_savecontrol(struct inpcb *, struct mbuf **, struct ip *, struct mbuf *); void ip_slowtimo(void); struct mbuf * ip_srcroute(struct mbuf *); void ip_stripoptions(struct mbuf *, struct mbuf *); u_int16_t ip_randomid(void); int rip_ctloutput(struct socket *, struct sockopt *); void rip_ctlinput(int, struct sockaddr *, void *); void rip_init(void); void rip_input(struct mbuf *, int); int rip_output(struct mbuf *, struct socket *, u_long); void ipip_input(struct mbuf *, int); void rsvp_input(struct mbuf *, int); int ip_rsvp_init(struct socket *); int ip_rsvp_done(void); extern int (*ip_rsvp_vif)(struct socket *, struct sockopt *); extern void (*ip_rsvp_force_done)(struct socket *); extern void (*rsvp_input_p)(struct mbuf *m, int off); extern struct pfil_head inet_pfil_hook; /* packet filter hooks */ void in_delayed_cksum(struct mbuf *m); static __inline uint16_t ip_newid(void); extern int ip_do_randomid; static __inline uint16_t ip_newid(void) { if (ip_do_randomid) return ip_randomid(); return htons(ip_id++); } #endif /* _KERNEL */ #endif /* !_NETINET_IP_VAR_H_ */ --FCuugMFkClbJLl1L Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ip_var.h.diff" --- ip_var.org.h Mon Dec 27 01:48:09 2004 +++ ip_var.h Sun Dec 26 22:32:58 2004 @@ -27,7 +27,7 @@ * SUCH DAMAGE. * * @(#)ip_var.h 8.2 (Berkeley) 1/9/95 - * $FreeBSD: /repoman/r/ncvs/src/sys/netinet/ip_var.h,v 1.92 2004/10/19 15:45:57 andre Exp $ + * $FreeBSD: src/sys/netinet/ip_var.h,v 1.89.2.2 2004/09/23 16:38:53 andre Exp $ */ #ifndef _NETINET_IP_VAR_H_ @@ -104,6 +104,7 @@ u_long ips_fragtimeout; /* fragments timed out */ u_long ips_forward; /* packets forwarded */ u_long ips_fastforward; /* packets fast forwarded */ + u_long ips_transit_re; /* packets sent to receive path from fastfwd */ u_long ips_cantforward; /* packets rcvd for unreachable dest */ u_long ips_redirectsent; /* packets forwarded on same net */ u_long ips_noproto; /* unknown or unsupported protocol */ @@ -135,6 +136,7 @@ /* mbuf flag used by ip_fastfwd */ #define M_FASTFWD_OURS M_PROTO1 /* changed dst to local */ +#define M_FASTFWD_PREPROC M_PROTO2 /* bypass pre processing */ struct ip; struct inpcb; @@ -149,6 +151,7 @@ #ifdef IPSTEALTH extern int ipstealth; /* stealth forwarding */ #endif + extern u_char ip_protox[]; extern struct socket *ip_rsvpd; /* reservation protocol daemon */ extern struct socket *ip_mrouter; /* multicast routing daemon */ @@ -168,8 +171,6 @@ int ip_output(struct mbuf *, struct mbuf *, struct route *, int, struct ip_moptions *, struct inpcb *); -int ipproto_register(u_char); -int ipproto_unregister(u_char); struct mbuf * ip_reass(struct mbuf *); struct in_ifaddr * --FCuugMFkClbJLl1L-- From owner-freebsd-net@FreeBSD.ORG Mon Dec 27 11:02:15 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 071CA16A4DF for ; Mon, 27 Dec 2004 11:02:15 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id D2C5D43D49 for ; Mon, 27 Dec 2004 11:02:14 +0000 (GMT) (envelope-from owner-bugmaster@freebsd.org) Received: from freefall.freebsd.org (peter@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.1/8.13.1) with ESMTP id iBRB2Eco030216 for ; Mon, 27 Dec 2004 11:02:14 GMT (envelope-from owner-bugmaster@freebsd.org) Received: (from peter@localhost) by freefall.freebsd.org (8.13.1/8.13.1/Submit) id iBRB2Eqd030210 for freebsd-net@freebsd.org; Mon, 27 Dec 2004 11:02:14 GMT (envelope-from owner-bugmaster@freebsd.org) Date: Mon, 27 Dec 2004 11:02:14 GMT Message-Id: <200412271102.iBRB2Eqd030210@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: peter set sender to owner-bugmaster@freebsd.org using -f From: FreeBSD bugmaster To: freebsd-net@FreeBSD.org Subject: Current problem reports assigned to you X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Dec 2004 11:02:15 -0000 Current FreeBSD problem reports Critical problems Serious problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2002/07/26] kern/41007 net overfull traffic on third and fourth adap o [2003/10/14] kern/57985 net [patch] Missing splx in ether_output_fram 2 problems total. Non-critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2003/07/11] kern/54383 net [nfs] [patch] NFS root configurations wit 1 problem total. From owner-freebsd-net@FreeBSD.ORG Wed Dec 29 07:38:34 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 781D716A4CE for ; Wed, 29 Dec 2004 07:38:34 +0000 (GMT) Received: from sdf.lonestar.org (mx.freeshell.org [192.94.73.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 937B043D49 for ; Wed, 29 Dec 2004 07:38:33 +0000 (GMT) (envelope-from wang@sdf.lonestar.org) Received: from sdf.lonestar.org (IDENT:wang@sdf.lonestar.org [192.94.73.1]) by sdf.lonestar.org (8.12.10/8.12.10) with ESMTP id iBT7brBq017313 for ; Wed, 29 Dec 2004 07:37:53 GMT Received: (from wang@localhost) by sdf.lonestar.org (8.12.10/8.12.8/Submit) id iBT7brMt002820; Wed, 29 Dec 2004 07:37:53 GMT Date: Wed, 29 Dec 2004 07:37:53 +0000 (UTC) From: Wang To: freebsd-net@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: Intel Pro/1000 Nic - no communication with network X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Dec 2004 07:38:34 -0000 Hi all, I am trying to get FreeBSD installed and running on a new rack server (asus ap140r-e1). The rack has a Intel Pro/1000 Nic (Intelr 82541GI Gigabit Controller). I have tried both freebsd 4.10 and 5.3 - and after installation both detect the network card and ifconfig shows it. I provided the ip manually via rc.conf lines: defaultrouter="10.0.0.2" hostname="blah.testbox.com" ifconfig_em0="inet 10.0.0.9 netmask 255.0.0.0" These lines are definately correct because I use them on another freebsd 5.3 box on my network without any problems (dhcp also works on the other 5.3 box). Ifconfig shows for em0: em0: flags=8843 mtu 1500 options=3 inet 10.0.0.9 netmask 0xff000000 broadcast 10.255.255.255 ether 00:11:2F:0F:80:f4 media: Ethernet autoselect status: no carrier I have tried both dhcp (my preference here) and hardcoding the ip/gateway details - but neither work. DHCP can not be obtained, and if i manually give ifconfig the ip/gateway etc...i just do not seem to get any connectivity out of the card, I can't ping any other boxes on the network...only pinging the boxes own 10.0.0.9 ip works. I went to the intel web site and downloaded the driver they have on there for freebsd, but it seems the same as what freebsd already has built into the kernel (I tried the intel driver regardless, but the result is the same...no network communication). I decided to verify that the card/cable/network itself is ok - so I ran the Knoppix Linux Live cd on the rack....it dectected the nic perfectly and got dhcp immediately! So I know for sure the problem is something to do with freebsd and its setup. I really have no clue where to turn with this problem, and I don't want to have to ditch bsd in favour of linux for this rack - please can anyone help? I really need this all complete tomorrow Thank you in advance, daveuk From owner-freebsd-net@FreeBSD.ORG Wed Dec 29 08:00:07 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2A8B516A4CE for ; Wed, 29 Dec 2004 08:00:07 +0000 (GMT) Received: from lakecmmtao03.coxmail.com (lakecmmtao03.coxmail.com [68.99.120.70]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5EF9743D41 for ; Wed, 29 Dec 2004 08:00:06 +0000 (GMT) (envelope-from steve@freeslacker.net) Received: from [192.168.69.75] ([68.98.220.74]) by lakecmmtao03.coxmail.com ESMTP <20041229080003.QDSZ15913.lakecmmtao03.coxmail.com@[192.168.69.75]>; Wed, 29 Dec 2004 03:00:03 -0500 Message-ID: <41D26405.2060500@freeslacker.net> Date: Wed, 29 Dec 2004 01:00:05 -0700 From: Steven Stremciuc User-Agent: Mozilla Thunderbird 0.8 (Windows/20040913) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Wang References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-net@freebsd.org Subject: Re: Intel Pro/1000 Nic - no communication with network X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Dec 2004 08:00:07 -0000 "status: no carrier" when you do an ifconfig indicates a layer 1 or physical problem. You need to try plugging in each of the other ethernet ports on that server and checking ifconfig to see if the status changes. As obvious as this seems (and sorry if you've already tried this) I have some supermicro servers with Intel nic's and FreeBSD 5.3-R has them mixed up. The ports are labeled 1 and 2 on the case but FreeBSD calls ethernet port 2 em0 and ethernet port 1 em1. Maybe that is what you are seeing here. steve > > Ifconfig shows for em0: > > > em0: flags=8843 mtu 1500 > > options=3 > > inet 10.0.0.9 netmask 0xff000000 broadcast 10.255.255.255 > > ether 00:11:2F:0F:80:f4 > > media: Ethernet autoselect > > status: no carrier From owner-freebsd-net@FreeBSD.ORG Wed Dec 29 09:02:23 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D3A2E16A4CE for ; Wed, 29 Dec 2004 09:02:23 +0000 (GMT) Received: from relay.pair.com (relay00.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 3265B43D1D for ; Wed, 29 Dec 2004 09:02:23 +0000 (GMT) (envelope-from silby@silby.com) Received: (qmail 70146 invoked from network); 29 Dec 2004 09:02:21 -0000 Received: from unknown (HELO localhost) (unknown) by unknown with SMTP; 29 Dec 2004 09:02:21 -0000 X-pair-Authenticated: 209.68.2.70 Date: Wed, 29 Dec 2004 03:02:20 -0600 (CST) From: Mike Silbersack To: net@freebsd.org In-Reply-To: <20041218033226.L28788@odysseus.silby.com> Message-ID: <20041229025718.U26249@odysseus.silby.com> References: <20041218033226.L28788@odysseus.silby.com> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-1414219215-1104310690=:26249" Content-ID: <20041229025813.O26249@odysseus.silby.com> Subject: Update: Alternate port randomization approaches X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Dec 2004 09:02:24 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-1414219215-1104310690=:26249 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; format=flowed Content-ID: <20041229025813.B26249@odysseus.silby.com> On Sat, 18 Dec 2004, Mike Silbersack wrote: > There have been a few reports by users of front end web proxies and other > systems under FreeBSD that port randomization causes them problems under > load. This seems to be due to a combination of port randomization and rapid > connections to the same host causing ports to be recycled before the ISN has > advanced past the end of the previous connection, thereby causing the > TIME_WAIT socket on the receiving end to ignore the new SYN. Based on testing done by Igor Sysoev, I've found that my original patch is insufficient; even as little as one randomizaion per second can cause problems for some users. As a result, I've created the attached patch (versions for both 6.x and 4.x are included). It implements a relatively simple algorithm: Port randomization is turned disable once the connection rate goes above 20 connections per second, and it is not reenabled until the connection rate falls below 20 cps for 5 seconds straight. This appears to work for Igor, and it seems safe enough to commit before 4.11-RC2. But, if possible, I'd like a few more sets of eyes to doublecheck the concept and code; please take a look at it if you have a chance. Thanks, Mike "Silby" Silbersack --0-1414219215-1104310690=:26249 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; NAME="portrandom-gen4-4x.patch" Content-Transfer-Encoding: BASE64 Content-ID: <20041229025810.L26249@odysseus.silby.com> Content-Description: Content-Disposition: ATTACHMENT; FILENAME="portrandom-gen4-4x.patch" ZGlmZiAtdSAtciAvdXNyL3NyYy9zeXMub2xkL25ldGluZXQvaW5fcGNiLmMg L3Vzci9zcmMvc3lzL25ldGluZXQvaW5fcGNiLmMNCi0tLSAvdXNyL3NyYy9z eXMub2xkL25ldGluZXQvaW5fcGNiLmMJVGh1IERlYyAxNiAwMzoyNjoxMSAy MDA0DQorKysgL3Vzci9zcmMvc3lzL25ldGluZXQvaW5fcGNiLmMJU2F0IERl YyAyNSAxNzowNzo1NiAyMDA0DQpAQCAtNjIsNiArNjIsOCBAQA0KICNpbmNs dWRlIDxuZXRpbmV0L2luX3BjYi5oPg0KICNpbmNsdWRlIDxuZXRpbmV0L2lu X3Zhci5oPg0KICNpbmNsdWRlIDxuZXRpbmV0L2lwX3Zhci5oPg0KKyNpbmNs dWRlIDxuZXRpbmV0L3VkcC5oPg0KKyNpbmNsdWRlIDxuZXRpbmV0L3VkcF92 YXIuaD4NCiAjaWZkZWYgSU5FVDYNCiAjaW5jbHVkZSA8bmV0aW5ldC9pcDYu aD4NCiAjaW5jbHVkZSA8bmV0aW5ldDYvaXA2X3Zhci5oPg0KQEAgLTk1LDgg Kzk3LDEyIEBADQogaW50CWlwcG9ydF9oaWZpcnN0YXV0byA9IElQUE9SVF9I SUZJUlNUQVVUTzsJLyogNDkxNTIgKi8NCiBpbnQJaXBwb3J0X2hpbGFzdGF1 dG8gID0gSVBQT1JUX0hJTEFTVEFVVE87CQkvKiA2NTUzNSAqLw0KIA0KLS8q IFNoYWxsIHdlIGFsbG9jYXRlIGVwaGVtZXJhbCBwb3J0cyBpbiByYW5kb20g b3JkZXI/ICovDQotaW50CWlwcG9ydF9yYW5kb21pemVkID0gMDsNCisvKiBW YXJpYWJsZXMgZGVhbGluZyB3aXRoIHJhbmRvbSBlcGhlbWVyYWwgcG9ydCBh bGxvY2F0aW9uLiAqLw0KK2ludAlpcHBvcnRfcmFuZG9taXplZCA9IDE7CS8q IHVzZXIgY29udHJvbGxlZCB2aWEgc3lzY3RsICovDQoraW50CWlwcG9ydF9y YW5kb21jcHMgPSAyMDsJLyogdXNlciBjb250cm9sbGVkIHZpYSBzeXNjdGwg Ki8NCitpbnQJaXBwb3J0X3N0b3ByYW5kb20gPSAwOwkvKiB0b2dnbGVkIGJ5 IGlwcG9ydF90aWNrICovDQoraW50CWlwcG9ydF90Y3BhbGxvY3M7DQoraW50 CWlwcG9ydF90Y3BsYXN0Y291bnQ7DQogDQogI2RlZmluZSBSQU5HRUNISyh2 YXIsIG1pbiwgbWF4KSBcDQogCWlmICgodmFyKSA8IChtaW4pKSB7ICh2YXIp ID0gKG1pbik7IH0gXA0KQEAgLTEzNiw2ICsxNDIsOCBAQA0KIAkgICAmaXBw b3J0X2hpbGFzdGF1dG8sIDAsICZzeXNjdGxfbmV0X2lwcG9ydF9jaGVjaywg IkkiLCAiIik7DQogU1lTQ1RMX0lOVChfbmV0X2luZXRfaXBfcG9ydHJhbmdl LCBPSURfQVVUTywgcmFuZG9taXplZCwgQ1RMRkxBR19SVywNCiAJICAgJmlw cG9ydF9yYW5kb21pemVkLCAwLCAiIik7DQorU1lTQ1RMX0lOVChfbmV0X2lu ZXRfaXBfcG9ydHJhbmdlLCBPSURfQVVUTywgcmFuZG9tY3BzLA0KKyAgICAg ICAgICBDVExGTEFHX1JXLCAmaXBwb3J0X3JhbmRvbWNwcywgMCwgIiIpOw0K IA0KIC8qDQogICogaW5fcGNiLmM6IG1hbmFnZSB0aGUgUHJvdG9jb2wgQ29u dHJvbCBCbG9ja3MuDQpAQCAtMjAwLDYgKzIwOCw3IEBADQogCXVfc2hvcnQg bHBvcnQgPSAwOw0KIAlpbnQgd2lsZCA9IDAsIHJldXNlcG9ydCA9IChzby0+ c29fb3B0aW9ucyAmIFNPX1JFVVNFUE9SVCk7DQogCWludCBlcnJvciwgcHJp c29uID0gMDsNCisJaW50IGRvcmFuZG9tOw0KIA0KIAlpZiAoVEFJTFFfRU1Q VFkoJmluX2lmYWRkcmhlYWQpKSAvKiBYWFggYnJva2VuISAqLw0KIAkJcmV0 dXJuIChFQUREUk5PVEFWQUlMKTsNCkBAIC0zMTMsNiArMzIyLDIwIEBADQog CQkJbGFzdHBvcnQgPSAmcGNiaW5mby0+bGFzdHBvcnQ7DQogCQl9DQogCQkv Kg0KKwkJKiBGb3IgVURQLCB1c2UgcmFuZG9tIHBvcnQgYWxsb2NhdGlvbiBh cyBsb25nIGFzIHRoZSB1c2VyDQorCQkqIGFsbG93cyBpdC4gIEZvciBUQ1Ag KGFuZCBhcyBvZiB5ZXQgdW5rbm93bikgY29ubmVjdGlvbnMsDQorCQkqIHVz ZSByYW5kb20gcG9ydCBhbGxvY2F0aW9uIG9ubHkgaWYgdGhlIHVzZXIgYWxs b3dzIGl0IEFORA0KKwkJKiBpcHBvcnRfdGljayBhbGxvd3MgaXQuDQorCQkq Lw0KKwkJaWYgKGlwcG9ydF9yYW5kb21pemVkICYmDQorCQkJKCFpcHBvcnRf c3RvcHJhbmRvbSB8fCBwY2JpbmZvID09ICZ1ZGJpbmZvKSkNCisJCQlkb3Jh bmRvbSA9IDE7DQorCQllbHNlDQorCQkJZG9yYW5kb20gPSAwOw0KKwkJLyog TWFrZSBzdXJlIHRvIG5vdCBpbmNsdWRlIFVEUCBwYWNrZXRzIGluIHRoZSBj b3VudC4gKi8NCisJCWlmIChwY2JpbmZvICE9ICZ1ZGJpbmZvKQ0KKwkJCWlw cG9ydF90Y3BhbGxvY3MrKzsNCisJCS8qDQogCQkgKiBTaW1wbGUgY2hlY2sg dG8gZW5zdXJlIGFsbCBwb3J0cyBhcmUgbm90IHVzZWQgdXAgY2F1c2luZw0K IAkJICogYSBkZWFkbG9jayBoZXJlLg0KIAkJICoNCkBAIC0zMjMsNyArMzQ2 LDcgQEANCiAJCQkvKg0KIAkJCSAqIGNvdW50aW5nIGRvd24NCiAJCQkgKi8N Ci0JCQlpZiAoaXBwb3J0X3JhbmRvbWl6ZWQpDQorCQkJaWYgKGRvcmFuZG9t KQ0KIAkJCQkqbGFzdHBvcnQgPSBmaXJzdCAtDQogCQkJCQkgICAgKGFyYzRy YW5kb20oKSAlIChmaXJzdCAtIGxhc3QpKTsNCiAJCQljb3VudCA9IGZpcnN0 IC0gbGFzdDsNCkBAIC0zNDMsNyArMzY2LDcgQEANCiAJCQkvKg0KIAkJCSAq IGNvdW50aW5nIHVwDQogCQkJICovDQotCQkJaWYgKGlwcG9ydF9yYW5kb21p emVkKQ0KKwkJCWlmIChkb3JhbmRvbSkNCiAJCQkJKmxhc3Rwb3J0ID0gZmly c3QgKw0KIAkJCQkJICAgIChhcmM0cmFuZG9tKCkgJSAobGFzdCAtIGZpcnN0 KSk7DQogCQkJY291bnQgPSBsYXN0IC0gZmlyc3Q7DQpAQCAtMTA0Niw0ICsx MDY5LDMwIEBADQogCWlmIChudG9obChpbnAtPmlucF9sYWRkci5zX2FkZHIp ID09IHAtPnBfcHJpc29uLT5wcl9pcCkNCiAJCXJldHVybiAoMCk7DQogCXJl dHVybiAoMSk7DQorfQ0KKw0KKy8qDQorICogaXBwb3J0X3RpY2sgcnVucyBv bmNlIHBlciBzZWNvbmQsIGRldGVybWluaW5nIGlmIHJhbmRvbSBwb3J0DQor ICogYWxsb2NhdGlvbiBzaG91bGQgYmUgY29udGludWVkLiAgSWYgbW9yZSB0 aGFuIGlwcG9ydF9yYW5kb21jcHMNCisgKiBwb3J0cyBoYXZlIGJlZW4gYWxs b2NhdGVkIGluIHRoZSBsYXN0IHNlY29uZCwgdGhlbiB3ZSByZXR1cm4gdG8N CisgKiBzZXF1ZW50aWFsIHBvcnQgYWxsb2NhdGlvbi4gV2UgcmV0dXJuIHRv IHJhbmRvbSBhbGxvY2F0aW9uIG9ubHkNCisgKiBvbmNlIHdlIGRyb3AgYmVs b3cgaXBwb3J0X3JhbmRvbWNwcyBmb3IgYXQgbGVhc3QgNSBzZWNvbmRzLg0K KyAqLw0KKw0KK3ZvaWQNCitpcHBvcnRfdGljayh4dHApDQorCXZvaWQgKnh0 cDsNCit7DQorCWlmIChpcHBvcnRfdGNwYWxsb2NzID4gaXBwb3J0X3RjcGxh c3Rjb3VudCArIGlwcG9ydF9yYW5kb21jcHMpIHsNCisJCWlmIChpcHBvcnRf c3RvcHJhbmRvbSA9PSAwKQ0KKwkJCXByaW50ZigiU3RvcHBpbmcgcmFuZG9t IGFsbG9jYXRpb25cbiIpOw0KKwkJaXBwb3J0X3N0b3ByYW5kb20gPSA1Ow0K Kwl9IGVsc2Ugew0KKwkJaWYgKGlwcG9ydF9zdG9wcmFuZG9tID09IDEpDQor CQkJcHJpbnRmKCJHb2luZyBiYWNrIHRvIHJhbmRvbSBhbGxvY2F0aW9uXG4i KTsNCisJCWlmIChpcHBvcnRfc3RvcHJhbmRvbSA+IDApDQorCQkJaXBwb3J0 X3N0b3ByYW5kb20tLTsNCisJfQ0KKwlpcHBvcnRfdGNwbGFzdGNvdW50ID0g aXBwb3J0X3RjcGFsbG9jczsNCisJY2FsbG91dF9yZXNldCgmaXBwb3J0X3Rp Y2tfY2FsbG91dCwgaHosIGlwcG9ydF90aWNrLCBOVUxMKTsNCiB9DQpkaWZm IC11IC1yIC91c3Ivc3JjL3N5cy5vbGQvbmV0aW5ldC9pbl9wY2IuaCAvdXNy L3NyYy9zeXMvbmV0aW5ldC9pbl9wY2IuaA0KLS0tIC91c3Ivc3JjL3N5cy5v bGQvbmV0aW5ldC9pbl9wY2IuaAlUaHUgRGVjIDE2IDAzOjI2OjExIDIwMDQN CisrKyAvdXNyL3NyYy9zeXMvbmV0aW5ldC9pbl9wY2IuaAlTYXQgRGVjIDI1 IDE3OjA5OjAxIDIwMDQNCkBAIC0zMTAsNiArMzEwLDcgQEANCiBleHRlcm4g aW50CWlwcG9ydF9sYXN0YXV0bzsNCiBleHRlcm4gaW50CWlwcG9ydF9oaWZp cnN0YXV0bzsNCiBleHRlcm4gaW50CWlwcG9ydF9oaWxhc3RhdXRvOw0KK2V4 dGVybiBzdHJ1Y3QgY2FsbG91dCBpcHBvcnRfdGlja19jYWxsb3V0Ow0KIA0K IHZvaWQJaW5fcGNicHVyZ2VpZjAgX19QKChzdHJ1Y3QgaW5wY2IgKiwgc3Ry dWN0IGlmbmV0ICopKTsNCiB2b2lkCWluX2xvc2luZyBfX1AoKHN0cnVjdCBp bnBjYiAqKSk7DQpAQCAtMzM1LDYgKzMzNiw3IEBADQogaW50CWluX3NldHBl ZXJhZGRyIF9fUCgoc3RydWN0IHNvY2tldCAqc28sIHN0cnVjdCBzb2NrYWRk ciAqKm5hbSkpOw0KIGludAlpbl9zZXRzb2NrYWRkciBfX1AoKHN0cnVjdCBz b2NrZXQgKnNvLCBzdHJ1Y3Qgc29ja2FkZHIgKipuYW0pKTsNCiB2b2lkCWlu X3BjYnJlbWxpc3RzIF9fUCgoc3RydWN0IGlucGNiICppbnApKTsNCit2b2lk CWlwcG9ydF90aWNrKHZvaWQgKnh0cCk7DQogaW50CXByaXNvbl94aW5wY2Ig X19QKChzdHJ1Y3QgcHJvYyAqcCwgc3RydWN0IGlucGNiICppbnApKTsNCiAj ZW5kaWYgLyogX0tFUk5FTCAqLw0KIA0KZGlmZiAtdSAtciAvdXNyL3NyYy9z eXMub2xkL25ldGluZXQvaXBfaW5wdXQuYyAvdXNyL3NyYy9zeXMvbmV0aW5l dC9pcF9pbnB1dC5jDQotLS0gL3Vzci9zcmMvc3lzLm9sZC9uZXRpbmV0L2lw X2lucHV0LmMJVGh1IERlYyAxNiAwMzoyNjoxMiAyMDA0DQorKysgL3Vzci9z cmMvc3lzL25ldGluZXQvaXBfaW5wdXQuYwlTYXQgRGVjIDI1IDE3OjE2OjA4 IDIwMDQNCkBAIC00Nyw2ICs0Nyw4IEBADQogDQogI2luY2x1ZGUgPHN5cy9w YXJhbS5oPg0KICNpbmNsdWRlIDxzeXMvc3lzdG0uaD4NCisjaW5jbHVkZSA8 c3lzL2NhbGxvdXQuaD4NCisjaW5jbHVkZSA8c3lzL2V2ZW50aGFuZGxlci5o Pg0KICNpbmNsdWRlIDxzeXMvbWJ1Zi5oPg0KICNpbmNsdWRlIDxzeXMvbWFs bG9jLmg+DQogI2luY2x1ZGUgPHN5cy9kb21haW4uaD4NCkBAIC0xODMsNiAr MTg1LDcgQEANCiAJKCgoKCh4KSAmIDB4RikgfCAoKCgoeCkgPj4gOCkgJiAw eEYpIDw8IDQpKSBeICh5KSkgJiBJUFJFQVNTX0hNQVNLKQ0KIA0KIHN0YXRp YyBzdHJ1Y3QgaXBxIGlwcVtJUFJFQVNTX05IQVNIXTsNCitzdHJ1Y3QgY2Fs bG91dCBpcHBvcnRfdGlja19jYWxsb3V0Ow0KIGNvbnN0ICBpbnQgICAgaXBp bnRycV9wcmVzZW50ID0gMTsNCiANCiAjaWZkZWYgSVBDVExfREVGTVRVDQpA QCAtMjY3LDYgKzI3MCwxMiBAQA0KIAltYXhuaXBxID0gbm1iY2x1c3RlcnMg LyAzMjsNCiAJbWF4ZnJhZ3NwZXJwYWNrZXQgPSAxNjsNCiANCisJLyogU3Rh cnQgaXBwb3J0X3RpY2suICovDQorCWNhbGxvdXRfaW5pdCgmaXBwb3J0X3Rp Y2tfY2FsbG91dCk7DQorCWlwcG9ydF90aWNrKE5VTEwpOw0KKwlFVkVOVEhB TkRMRVJfUkVHSVNURVIoc2h1dGRvd25fcHJlX3N5bmMsIGlwX2ZpbmksIE5V TEwsDQorCQlTSFVURE9XTl9QUklfREVGQVVMVCk7DQorDQogI2lmbmRlZiBS QU5ET01fSVBfSUQNCiAJaXBfaWQgPSB0aW1lX3NlY29uZCAmIDB4ZmZmZjsN CiAjZW5kaWYNCkBAIC0yNzQsNiArMjgzLDEzIEBADQogDQogCXJlZ2lzdGVy X25ldGlzcihORVRJU1JfSVAsIGlwaW50cik7DQogfQ0KKw0KK3ZvaWQgaXBf ZmluaSh4dHApDQorCXZvaWQgKnh0cDsNCit7DQorCWNhbGxvdXRfc3RvcCgm aXBwb3J0X3RpY2tfY2FsbG91dCk7DQorfQ0KKw0KIA0KIC8qDQogICogWFhY IHdhdGNoIG91dCB0aGlzIG9uZS4gSXQgaXMgcGVyaGFwcyB1c2VkIGFzIGEg Y2FjaGUgZm9yDQpkaWZmIC11IC1yIC91c3Ivc3JjL3N5cy5vbGQvbmV0aW5l dC9pcF92YXIuaCAvdXNyL3NyYy9zeXMvbmV0aW5ldC9pcF92YXIuaA0KLS0t IC91c3Ivc3JjL3N5cy5vbGQvbmV0aW5ldC9pcF92YXIuaAlUaHUgRGVjIDE2 IDAzOjI2OjEyIDIwMDQNCisrKyAvdXNyL3NyYy9zeXMvbmV0aW5ldC9pcF92 YXIuaAlTYXQgRGVjIDI1IDE3OjEyOjEyIDIwMDQNCkBAIC0xNjAsNiArMTYw LDcgQEANCiANCiBpbnQJIGlwX2N0bG91dHB1dChzdHJ1Y3Qgc29ja2V0ICos IHN0cnVjdCBzb2Nrb3B0ICpzb3B0KTsNCiB2b2lkCSBpcF9kcmFpbih2b2lk KTsNCit2b2lkCSBpcF9maW5pKHZvaWQgKnh0cCk7DQogaW50CSBpcF9mcmFn bWVudChzdHJ1Y3QgaXAgKmlwLCBzdHJ1Y3QgbWJ1ZiAqKm1fZnJhZywgaW50 IG10dSwNCiAJICAgIHVfbG9uZyBpZl9od2Fzc2lzdF9mbGFncywgaW50IHN3 X2NzdW0pOw0KIHZvaWQJIGlwX2ZyZWVtb3B0aW9ucyhzdHJ1Y3QgaXBfbW9w dGlvbnMgKik7DQo= --0-1414219215-1104310690=:26249 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; NAME="portrandom-gen4.patch" Content-Transfer-Encoding: BASE64 Content-ID: <20041229025810.S26249@odysseus.silby.com> Content-Description: Content-Disposition: ATTACHMENT; FILENAME="portrandom-gen4.patch" ZGlmZiAtdSAtciAvdXNyL3NyYy9zeXMub2xkL25ldGluZXQvaW5fcGNiLmMg L3Vzci9zcmMvc3lzL25ldGluZXQvaW5fcGNiLmMNCi0tLSAvdXNyL3NyYy9z eXMub2xkL25ldGluZXQvaW5fcGNiLmMJRnJpIERlYyAyNCAxOTo0NToxNSAy MDA0DQorKysgL3Vzci9zcmMvc3lzL25ldGluZXQvaW5fcGNiLmMJU2F0IERl YyAyNSAxMzo1MToyNCAyMDA0DQpAQCAtNTksNiArNTksOCBAQA0KICNpbmNs dWRlIDxuZXRpbmV0L2luX3Zhci5oPg0KICNpbmNsdWRlIDxuZXRpbmV0L2lw X3Zhci5oPg0KICNpbmNsdWRlIDxuZXRpbmV0L3RjcF92YXIuaD4NCisjaW5j bHVkZSA8bmV0aW5ldC91ZHAuaD4NCisjaW5jbHVkZSA8bmV0aW5ldC91ZHBf dmFyLmg+DQogI2lmZGVmIElORVQ2DQogI2luY2x1ZGUgPG5ldGluZXQvaXA2 Lmg+DQogI2luY2x1ZGUgPG5ldGluZXQ2L2lwNl92YXIuaD4NCkBAIC05Nyw4 ICs5OSwxMiBAQA0KIGludAlpcHBvcnRfcmVzZXJ2ZWRoaWdoID0gSVBQT1JU X1JFU0VSVkVEIC0gMTsJLyogMTAyMyAqLw0KIGludAlpcHBvcnRfcmVzZXJ2 ZWRsb3cgPSAwOw0KIA0KLS8qIFNoYWxsIHdlIGFsbG9jYXRlIGVwaGVtZXJh bCBwb3J0cyBpbiByYW5kb20gb3JkZXI/ICovDQotaW50CWlwcG9ydF9yYW5k b21pemVkID0gMTsNCisvKiBWYXJpYWJsZXMgZGVhbGluZyB3aXRoIHJhbmRv bSBlcGhlbWVyYWwgcG9ydCBhbGxvY2F0aW9uLiAqLw0KK2ludAlpcHBvcnRf cmFuZG9taXplZCA9IDE7CS8qIHVzZXIgY29udHJvbGxlZCB2aWEgc3lzY3Rs ICovDQoraW50CWlwcG9ydF9yYW5kb21jcHMgPSAyMDsJLyogdXNlciBjb250 cm9sbGVkIHZpYSBzeXNjdGwgKi8NCitpbnQJaXBwb3J0X3N0b3ByYW5kb20g PSAwOwkvKiB0b2dnbGVkIGJ5IGlwcG9ydF90aWNrICovDQoraW50CWlwcG9y dF90Y3BhbGxvY3M7DQoraW50CWlwcG9ydF90Y3BsYXN0Y291bnQ7DQogDQog I2RlZmluZSBSQU5HRUNISyh2YXIsIG1pbiwgbWF4KSBcDQogCWlmICgodmFy KSA8IChtaW4pKSB7ICh2YXIpID0gKG1pbik7IH0gXA0KQEAgLTE0Myw2ICsx NDksOCBAQA0KIAkgICBDVExGTEFHX1JXfENUTEZMQUdfU0VDVVJFLCAmaXBw b3J0X3Jlc2VydmVkbG93LCAwLCAiIik7DQogU1lTQ1RMX0lOVChfbmV0X2lu ZXRfaXBfcG9ydHJhbmdlLCBPSURfQVVUTywgcmFuZG9taXplZCwNCiAJICAg Q1RMRkxBR19SVywgJmlwcG9ydF9yYW5kb21pemVkLCAwLCAiIik7DQorU1lT Q1RMX0lOVChfbmV0X2luZXRfaXBfcG9ydHJhbmdlLCBPSURfQVVUTywgcmFu ZG9tY3BzLA0KKwkgICBDVExGTEFHX1JXLCAmaXBwb3J0X3JhbmRvbWNwcywg MCwgIiIpOw0KIA0KIC8qDQogICogaW5fcGNiLmM6IG1hbmFnZSB0aGUgUHJv dG9jb2wgQ29udHJvbCBCbG9ja3MuDQpAQCAtMjY2LDYgKzI3NCw3IEBADQog CXVfc2hvcnQgbHBvcnQgPSAwOw0KIAlpbnQgd2lsZCA9IDAsIHJldXNlcG9y dCA9IChzby0+c29fb3B0aW9ucyAmIFNPX1JFVVNFUE9SVCk7DQogCWludCBl cnJvciwgcHJpc29uID0gMDsNCisJaW50IGRvcmFuZG9tOw0KIA0KIAlJTlBf SU5GT19XTE9DS19BU1NFUlQocGNiaW5mbyk7DQogCUlOUF9MT0NLX0FTU0VS VChpbnApOw0KQEAgLTM5NCw2ICs0MDMsMjAgQEANCiAJCQlsYXN0cG9ydCA9 ICZwY2JpbmZvLT5sYXN0cG9ydDsNCiAJCX0NCiAJCS8qDQorCQkgKiBGb3Ig VURQLCB1c2UgcmFuZG9tIHBvcnQgYWxsb2NhdGlvbiBhcyBsb25nIGFzIHRo ZSB1c2VyDQorCQkgKiBhbGxvd3MgaXQuICBGb3IgVENQIChhbmQgYXMgb2Yg eWV0IHVua25vd24pIGNvbm5lY3Rpb25zLA0KKwkJICogdXNlIHJhbmRvbSBw b3J0IGFsbG9jYXRpb24gb25seSBpZiB0aGUgdXNlciBhbGxvd3MgaXQgQU5E DQorCQkgKiBpcHBvcnRfdGljayBhbGxvd3MgaXQuDQorCQkgKi8NCisJCWlm IChpcHBvcnRfcmFuZG9taXplZCAmJg0KKwkJCSghaXBwb3J0X3N0b3ByYW5k b20gfHwgcGNiaW5mbyA9PSAmdWRiaW5mbykpDQorCQkJZG9yYW5kb20gPSAx Ow0KKwkJZWxzZQ0KKwkJCWRvcmFuZG9tID0gMDsNCisJCS8qIE1ha2Ugc3Vy ZSB0byBub3QgaW5jbHVkZSBVRFAgcGFja2V0cyBpbiB0aGUgY291bnQuICov DQorCQlpZiAocGNiaW5mbyAhPSAmdWRiaW5mbykNCisJCQlpcHBvcnRfdGNw YWxsb2NzKys7DQorCQkvKg0KIAkJICogU2ltcGxlIGNoZWNrIHRvIGVuc3Vy ZSBhbGwgcG9ydHMgYXJlIG5vdCB1c2VkIHVwIGNhdXNpbmcNCiAJCSAqIGEg ZGVhZGxvY2sgaGVyZS4NCiAJCSAqDQpAQCAtNDA0LDcgKzQyNyw3IEBADQog CQkJLyoNCiAJCQkgKiBjb3VudGluZyBkb3duDQogCQkJICovDQotCQkJaWYg KGlwcG9ydF9yYW5kb21pemVkKQ0KKwkJCWlmIChkb3JhbmRvbSkNCiAJCQkJ Kmxhc3Rwb3J0ID0gZmlyc3QgLQ0KIAkJCQkJICAgIChhcmM0cmFuZG9tKCkg JSAoZmlyc3QgLSBsYXN0KSk7DQogCQkJY291bnQgPSBmaXJzdCAtIGxhc3Q7 DQpAQCAtNDIyLDcgKzQ0NSw3IEBADQogCQkJLyoNCiAJCQkgKiBjb3VudGlu ZyB1cA0KIAkJCSAqLw0KLQkJCWlmIChpcHBvcnRfcmFuZG9taXplZCkNCisJ CQlpZiAoZG9yYW5kb20pDQogCQkJCSpsYXN0cG9ydCA9IGZpcnN0ICsNCiAJ CQkJCSAgICAoYXJjNHJhbmRvbSgpICUgKGxhc3QgLSBmaXJzdCkpOw0KIAkJ CWNvdW50ID0gbGFzdCAtIGZpcnN0Ow0KQEAgLTExODAsNCArMTIwMywzMCBA QA0KIAlTT0NLX1VOTE9DSyhzbyk7DQogCUlOUF9VTkxPQ0soaW5wKTsNCiAj ZW5kaWYNCit9DQorDQorLyoNCisgKiBpcHBvcnRfdGljayBydW5zIG9uY2Ug cGVyIHNlY29uZCwgZGV0ZXJtaW5pbmcgaWYgcmFuZG9tIHBvcnQNCisgKiBh bGxvY2F0aW9uIHNob3VsZCBiZSBjb250aW51ZWQuICBJZiBtb3JlIHRoYW4g aXBwb3J0X3JhbmRvbWNwcw0KKyAqIHBvcnRzIGhhdmUgYmVlbiBhbGxvY2F0 ZWQgaW4gdGhlIGxhc3Qgc2Vjb25kLCB0aGVuIHdlIHJldHVybiB0bw0KKyAq IHNlcXVlbnRpYWwgcG9ydCBhbGxvY2F0aW9uLiBXZSByZXR1cm4gdG8gcmFu ZG9tIGFsbG9jYXRpb24gb25seQ0KKyAqIG9uY2Ugd2UgZHJvcCBiZWxvdyBp cHBvcnRfcmFuZG9tY3BzIGZvciBhdCBsZWFzdCA1IHNlY29uZHMuDQorICov DQorDQordm9pZA0KK2lwcG9ydF90aWNrKHh0cCkNCisJdm9pZCAqeHRwOw0K K3sNCisJaWYgKGlwcG9ydF90Y3BhbGxvY3MgPiBpcHBvcnRfdGNwbGFzdGNv dW50ICsgaXBwb3J0X3JhbmRvbWNwcykgew0KKwkJaWYgKGlwcG9ydF9zdG9w cmFuZG9tID09IDApDQorCQkJcHJpbnRmKCJTdG9wcGluZyByYW5kb20gYWxs b2NhdGlvblxuIik7DQorCQlpcHBvcnRfc3RvcHJhbmRvbSA9IDU7DQorCX0g ZWxzZSB7DQorCQlpZiAoaXBwb3J0X3N0b3ByYW5kb20gPT0gMSkNCisJCQlw cmludGYoIkdvaW5nIGJhY2sgdG8gcmFuZG9tIGFsbG9jYXRpb25cbiIpOw0K KwkJaWYgKGlwcG9ydF9zdG9wcmFuZG9tID4gMCkNCisJCQlpcHBvcnRfc3Rv cHJhbmRvbS0tOw0KKwl9DQorCWlwcG9ydF90Y3BsYXN0Y291bnQgPSBpcHBv cnRfdGNwYWxsb2NzOw0KKwljYWxsb3V0X3Jlc2V0KCZpcHBvcnRfdGlja19j YWxsb3V0LCBoeiwgaXBwb3J0X3RpY2ssIE5VTEwpOw0KIH0NCmRpZmYgLXUg LXIgL3Vzci9zcmMvc3lzLm9sZC9uZXRpbmV0L2luX3BjYi5oIC91c3Ivc3Jj L3N5cy9uZXRpbmV0L2luX3BjYi5oDQotLS0gL3Vzci9zcmMvc3lzLm9sZC9u ZXRpbmV0L2luX3BjYi5oCUZyaSBEZWMgMjQgMTk6NDU6MTUgMjAwNA0KKysr IC91c3Ivc3JjL3N5cy9uZXRpbmV0L2luX3BjYi5oCUZyaSBEZWMgMjQgMjA6 MDI6MTQgMjAwNA0KQEAgLTMzMyw2ICszMzMsNyBAQA0KIGV4dGVybiBpbnQJ aXBwb3J0X2xhc3RhdXRvOw0KIGV4dGVybiBpbnQJaXBwb3J0X2hpZmlyc3Rh dXRvOw0KIGV4dGVybiBpbnQJaXBwb3J0X2hpbGFzdGF1dG87DQorZXh0ZXJu IHN0cnVjdCBjYWxsb3V0IGlwcG9ydF90aWNrX2NhbGxvdXQ7DQogDQogdm9p ZAlpbl9wY2JwdXJnZWlmMChzdHJ1Y3QgaW5wY2JpbmZvICosIHN0cnVjdCBp Zm5ldCAqKTsNCiBpbnQJaW5fcGNiYWxsb2Moc3RydWN0IHNvY2tldCAqLCBz dHJ1Y3QgaW5wY2JpbmZvICosIGNvbnN0IGNoYXIgKik7DQpAQCAtMzYyLDYg KzM2Myw3IEBADQogCWluX3NvY2thZGRyKGluX3BvcnRfdCBwb3J0LCBzdHJ1 Y3QgaW5fYWRkciAqYWRkcik7DQogdm9pZAlpbl9wY2Jzb3NldGxhYmVsKHN0 cnVjdCBzb2NrZXQgKnNvKTsNCiB2b2lkCWluX3BjYnJlbWxpc3RzKHN0cnVj dCBpbnBjYiAqaW5wKTsNCit2b2lkCWlwcG9ydF90aWNrKHZvaWQgKnh0cCk7 DQogI2VuZGlmIC8qIF9LRVJORUwgKi8NCiANCiAjZW5kaWYgLyogIV9ORVRJ TkVUX0lOX1BDQl9IXyAqLw0KZGlmZiAtdSAtciAvdXNyL3NyYy9zeXMub2xk L25ldGluZXQvaXBfaW5wdXQuYyAvdXNyL3NyYy9zeXMvbmV0aW5ldC9pcF9p bnB1dC5jDQotLS0gL3Vzci9zcmMvc3lzLm9sZC9uZXRpbmV0L2lwX2lucHV0 LmMJRnJpIERlYyAyNCAxOTo0NToxNSAyMDA0DQorKysgL3Vzci9zcmMvc3lz L25ldGluZXQvaXBfaW5wdXQuYwlTYXQgRGVjIDI1IDEzOjM3OjUxIDIwMDQN CkBAIC0zOCw2ICszOCw3IEBADQogDQogI2luY2x1ZGUgPHN5cy9wYXJhbS5o Pg0KICNpbmNsdWRlIDxzeXMvc3lzdG0uaD4NCisjaW5jbHVkZSA8c3lzL2Nh bGxvdXQuaD4NCiAjaW5jbHVkZSA8c3lzL21hYy5oPg0KICNpbmNsdWRlIDxz eXMvbWJ1Zi5oPg0KICNpbmNsdWRlIDxzeXMvbWFsbG9jLmg+DQpAQCAtMTg2 LDYgKzE4Nyw3IEBADQogDQogc3RhdGljIFRBSUxRX0hFQUQoaXBxaGVhZCwg aXBxKSBpcHFbSVBSRUFTU19OSEFTSF07DQogc3RydWN0IG10eCBpcHFsb2Nr Ow0KK3N0cnVjdCBjYWxsb3V0IGlwcG9ydF90aWNrX2NhbGxvdXQ7DQogDQog I2RlZmluZQlJUFFfTE9DSygpCW10eF9sb2NrKCZpcHFsb2NrKQ0KICNkZWZp bmUJSVBRX1VOTE9DSygpCW10eF91bmxvY2soJmlwcWxvY2spDQpAQCAtMjc5 LDExICsyODEsMjMgQEANCiAJbWF4bmlwcSA9IG5tYmNsdXN0ZXJzIC8gMzI7 DQogCW1heGZyYWdzcGVycGFja2V0ID0gMTY7DQogDQorCS8qIFN0YXJ0IGlw cG9ydF90aWNrLiAqLw0KKwljYWxsb3V0X2luaXQoJmlwcG9ydF90aWNrX2Nh bGxvdXQsIENBTExPVVRfTVBTQUZFKTsNCisJaXBwb3J0X3RpY2soTlVMTCk7 DQorCUVWRU5USEFORExFUl9SRUdJU1RFUihzaHV0ZG93bl9wcmVfc3luYywg aXBfZmluaSwgTlVMTCwNCisJCVNIVVRET1dOX1BSSV9ERUZBVUxUKTsNCisN CiAJLyogSW5pdGlhbGl6ZSB2YXJpb3VzIG90aGVyIHJlbWFpbmluZyB0aGlu Z3MuICovDQogCWlwX2lkID0gdGltZV9zZWNvbmQgJiAweGZmZmY7DQogCWlw aW50cnEuaWZxX21heGxlbiA9IGlwcW1heGxlbjsNCiAJbXR4X2luaXQoJmlw aW50cnEuaWZxX210eCwgImlwX2lucSIsIE5VTEwsIE1UWF9ERUYpOw0KIAlu ZXRpc3JfcmVnaXN0ZXIoTkVUSVNSX0lQLCBpcF9pbnB1dCwgJmlwaW50cnEs IE5FVElTUl9NUFNBRkUpOw0KK30NCisNCit2b2lkIGlwX2ZpbmkoeHRwKQ0K Kwl2b2lkICp4dHA7DQorew0KKwljYWxsb3V0X3N0b3AoJmlwcG9ydF90aWNr X2NhbGxvdXQpOw0KIH0NCiANCiAvKg0KT25seSBpbiAvdXNyL3NyYy9zeXMv bmV0aW5ldDogaXBfaW5wdXQuYy5vcmlnDQpkaWZmIC11IC1yIC91c3Ivc3Jj L3N5cy5vbGQvbmV0aW5ldC9pcF92YXIuaCAvdXNyL3NyYy9zeXMvbmV0aW5l dC9pcF92YXIuaA0KLS0tIC91c3Ivc3JjL3N5cy5vbGQvbmV0aW5ldC9pcF92 YXIuaAlGcmkgRGVjIDI0IDE5OjQ1OjE1IDIwMDQNCisrKyAvdXNyL3NyYy9z eXMvbmV0aW5ldC9pcF92YXIuaAlTYXQgRGVjIDI1IDEzOjI5OjU0IDIwMDQN CkBAIC0xNTksNiArMTU5LDcgQEANCiANCiBpbnQJIGlwX2N0bG91dHB1dChz dHJ1Y3Qgc29ja2V0ICosIHN0cnVjdCBzb2Nrb3B0ICpzb3B0KTsNCiB2b2lk CSBpcF9kcmFpbih2b2lkKTsNCit2b2lkCSBpcF9maW5pKHZvaWQgKnh0cCk7 DQogaW50CSBpcF9mcmFnbWVudChzdHJ1Y3QgaXAgKmlwLCBzdHJ1Y3QgbWJ1 ZiAqKm1fZnJhZywgaW50IG10dSwNCiAJICAgIHVfbG9uZyBpZl9od2Fzc2lz dF9mbGFncywgaW50IHN3X2NzdW0pOw0KIHZvaWQJIGlwX2ZyZWVtb3B0aW9u cyhzdHJ1Y3QgaXBfbW9wdGlvbnMgKik7DQo= --0-1414219215-1104310690=:26249-- From owner-freebsd-net@FreeBSD.ORG Wed Dec 29 13:09:53 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B52D416A4CE for ; Wed, 29 Dec 2004 13:09:53 +0000 (GMT) Received: from mp2.macomnet.net (mp2.macomnet.net [195.128.64.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id E78AB43D31 for ; Wed, 29 Dec 2004 13:09:52 +0000 (GMT) (envelope-from maxim@FreeBSD.org) Received-SPF: pass (mp2.macomnet.net: domain of maxim@FreeBSD.org designates 127.0.0.1 as permitted sender) receiver=mp2.macomnet.net; client_ip=127.0.0.1; envelope-from=maxim@FreeBSD.org; Received: from localhost (localhost [127.0.0.1]) by mp2.macomnet.net (8.12.11/8.12.11) with ESMTP id iBTD9oGW075360; Wed, 29 Dec 2004 16:09:50 +0300 (MSK) (envelope-from maxim@FreeBSD.org) Date: Wed, 29 Dec 2004 16:09:50 +0300 (MSK) From: Maxim Konovalov To: Mike Silbersack In-Reply-To: <20041229025718.U26249@odysseus.silby.com> Message-ID: <20041229155419.I74642@mp2.macomnet.net> References: <20041218033226.L28788@odysseus.silby.com> <20041229025718.U26249@odysseus.silby.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SpamTest-Info: Profile: Formal (188/041227) X-SpamTest-Info: Profile: Detect Hard (4/030526) X-SpamTest-Info: Profile: SysLog X-SpamTest-Info: Profile: Marking - Keywords (2/030321) X-SpamTest-Status: Not detected X-SpamTest-Version: SMTP-Filter Version 2.0.0 [0124], SpamtestISP/Release cc: net@FreeBSD.org Subject: Re: Update: Alternate port randomization approaches X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Dec 2004 13:09:53 -0000 On Wed, 29 Dec 2004, 03:02-0600, Mike Silbersack wrote: > On Sat, 18 Dec 2004, Mike Silbersack wrote: > > > There have been a few reports by users of front end web proxies and other > > systems under FreeBSD that port randomization causes them problems under > > load. This seems to be due to a combination of port randomization and > > rapid connections to the same host causing ports to be recycled before > > the ISN has advanced past the end of the previous connection, thereby > > causing the TIME_WAIT socket on the receiving end to ignore the new SYN. > > Based on testing done by Igor Sysoev, I've found that my original patch is > insufficient; even as little as one randomizaion per second can cause problems > for some users. As a result, I've created the attached patch (versions for > both 6.x and 4.x are included). It implements a relatively simple algorithm: > Port randomization is turned disable once the connection rate goes above 20 > connections per second, and it is not reenabled until the connection rate > falls below 20 cps for 5 seconds straight. > > This appears to work for Igor, and it seems safe enough to commit before > 4.11-RC2. But, if possible, I'd like a few more sets of eyes to doublecheck > the concept and code; please take a look at it if you have a chance. Again, it's not clear for me why we don't follow our usual deveplopment cycle here: commit & test in HEAD and then MFC to STABLE? -- Maxim Konovalov From owner-freebsd-net@FreeBSD.ORG Wed Dec 29 18:34:18 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1017) id EB74216A4DB; Wed, 29 Dec 2004 18:34:18 +0000 (GMT) Date: Wed, 29 Dec 2004 18:34:18 +0000 From: Tony Ackerman To: freebsd-net@freebsd.org Message-ID: <20041229183418.GA53016@hub.freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i Subject: Intel Pro/1000 Nic - no communication with network X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Dec 2004 18:34:19 -0000 If you have multiple adapters in the system there could be some confusion caused by the way that the adapters were enumerated. Try "pciconf -l |grep 20000" to view all of the Ethernet adapters in the system and which drivers are attached to them. What is your output from this command? Are you getting any link indicators LEDs lit? From owner-freebsd-net@FreeBSD.ORG Thu Dec 30 10:42:18 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8508C16A4CE for ; Thu, 30 Dec 2004 10:42:18 +0000 (GMT) Received: from relay01.pair.com (relay01.pair.com [209.68.5.15]) by mx1.FreeBSD.org (Postfix) with SMTP id 067CF43D4C for ; Thu, 30 Dec 2004 10:42:18 +0000 (GMT) (envelope-from silby@silby.com) Received: (qmail 23465 invoked from network); 30 Dec 2004 10:42:16 -0000 Received: from unknown (HELO localhost) (unknown) by unknown with SMTP; 30 Dec 2004 10:42:16 -0000 X-pair-Authenticated: 209.68.2.70 Date: Thu, 30 Dec 2004 04:42:15 -0600 (CST) From: Mike Silbersack To: Maxim Konovalov In-Reply-To: <20041229155419.I74642@mp2.macomnet.net> Message-ID: <20041230042939.L35911@odysseus.silby.com> References: <20041218033226.L28788@odysseus.silby.com> <20041229155419.I74642@mp2.macomnet.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed cc: net@FreeBSD.org Subject: Re: Update: Alternate port randomization approaches X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Dec 2004 10:42:18 -0000 On Wed, 29 Dec 2004, Maxim Konovalov wrote: > On Wed, 29 Dec 2004, 03:02-0600, Mike Silbersack wrote: >> This appears to work for Igor, and it seems safe enough to commit before >> 4.11-RC2. But, if possible, I'd like a few more sets of eyes to doublecheck >> the concept and code; please take a look at it if you have a chance. > > Again, it's not clear for me why we don't follow our usual > deveplopment cycle here: commit & test in HEAD and then MFC to STABLE? > > -- > Maxim Konovalov The problems random port allocation exposes only occur in situations where machine A is making repeated connections to machine B, so it's limited to situations like front-end web proxies, connections to database servers, and a few other things. General web servers, ftp servers, SMTP servers, etc, aren't affected. So, committing to -current won't cause us to learn anything; specific testers are needed. I should have worked on this issue months ago, but I didn't, so I'm trying to come up with something safe as quickly as possible. This is necessitated because 4.11 is going to be the last in the 4.11 series, so this can't be pushed off until after 4.11 is published - there'd be little point in bothering at that time. Igor has been generous enough to test the various iterations of this patch as I've developed them and tested on a production system to see if they work for him. Based on his results, I think we're pretty close to an acceptable compromised between security (full randomization) and proper operation (no randomization.) We're now looking at settings more along the lines of a 10 connections per second ceiling and a 45 second threshold before randomization is reenabled, FWIW. I'm not too concerned about general testing because these patches are quite simple; they're modifications of the previous behavior, so they won't create any new problems. As far as bugs in the implementaton go, well, anyone is welcome to do a quick review. :) Mike "Silby" Silbersack