From owner-freebsd-net@FreeBSD.ORG Wed Sep 26 18:11:44 2007 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 91BCC16A41A for ; Wed, 26 Sep 2007 18:11:44 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outE.internet-mail-service.net (outE.internet-mail-service.net [216.240.47.228]) by mx1.freebsd.org (Postfix) with ESMTP id 5C28613C474 for ; Wed, 26 Sep 2007 18:11:44 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Wed, 26 Sep 2007 11:11:43 -0700 X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (nat.ironport.com [63.251.108.100]) by idiom.com (Postfix) with ESMTP id 90D16126503; Wed, 26 Sep 2007 11:11:42 -0700 (PDT) Message-ID: <46FAA0DD.6040804@elischer.org> Date: Wed, 26 Sep 2007 11:11:41 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Oleg Bulyzhin References: <20070926060241.GA3945@lath.rinet.ru> In-Reply-To: <20070926060241.GA3945@lath.rinet.ru> Content-Type: multipart/mixed; boundary="------------090905030908090805090805" Cc: freebsd-net@freebsd.org Subject: Re: new mbuf flag proposal X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Sep 2007 18:11:44 -0000 This is a multi-part message in MIME format. --------------090905030908090805090805 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Oleg Bulyzhin wrote: > Hi all. > > Recently, i discovered following problem (though it was already discussed, see > http://freebsd.rambler.ru/bsdmail/freebsd-ipfw_2006/msg00491.html): > pfil handlers (like ipfw or pf) sometime need to create packets (like tcp rst > or icmp errors). In order to avoid loops M_SKIP_FIREWALL flag is used. > Unfortunately, this behaviour is not always correct. > There are configurations when you need to reinject such packets into pfil(4) > handlers (in order to translate them using NAT or apply routing policy > or divert them somewhere, etc). In my case i had to modify kernel > in order to translate tcp keepalive packets(generated by ipfw) using pfnat. HA! I just implemeted this for us at IRONPORT. I added a second flag which allowed the M_SKIP_FIREWALL to be ignored for the first run.. Basically if you generate a RESET packet from within the firewall, and you are doing NAT then you need to allow the outgoing RESET to be translated just like all the other packets of that session.. In the Firewall the presence of the seconf bit negates the existance of M_SKIP_FIREWALL, but it is cleared. meaning that it will do exactly ONE pass through the firewall after being sent out (the pass will be done in ip_output). it is however a specific solution for us.. and probably not general. The trouble is that there is no real fully general solution. I attach my patch. (wonder if it will be stripped for the list?) (if it is I'll cut-n-paste a version later). > > I have a proposal how to solve this: > 1) Introduce new mbuf flag, something like M_PFIL_CREATED, which should be > used to mark packets created by pfil handler. If packet is not supposed > to reenter pfil handlers M_SKIP_FIREWALL can be used instead. > 2) When pfil handler generate packet it should be marked either with > M_SKIP_FIREWALL or M_PFIL_CREATED. In latter case, pfil handler should add > mbuf_tag for distinguishing source of M_PFIL_CREATED flag. > > So, for packet creation code should be like this: > > m->m_flags |= M_PFIL_CREATED; > mtag = m_tag_alloc(MTAG_PFIL_CREATED, PFIL_IPFW, 0, M_NOWAIT); > if (mtag) { > m_tag_prepend(m, mtag); > } else { > goto drop_pkt; > } > > at the beginning of pfil handler we should have something like this: > > int dont_emit_pkt = 0; > > if (m->m_flags & M_PFIL_CREATED) { > dont_emit_pkt = 1; > mtag = m_tag_locate(m, MTAG_PFIL_CREATED, PFIL_IPFW, NULL); > if (mtag) { /* pkt was created by myself */ > /* my own packet, handle it with care. */ > goto specal_handler; > } else { /* pkt was created by other pfil(4) handler */ > > /* do normal processing but do not emit new packets. */ > goto normal_handler; > } > } > > This functionality can be archived with mbuf_tag only (without new mbuf flag), > but it would be ineffective: > calling m_tag_locate() (unsuccessful most of the time!) for every packet is > rather expensive. > > What do you think about this? > not bad, and tries to address my comments about generality. Comments still needed from others however. --------------090905030908090805090805 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="ipfw2_x_reset.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ipfw2_x_reset.patch" Index: sys/netinet/ip_fw2.c =================================================================== RCS file: /usr/local/cvsroot/freebsd/src/sys/netinet/ip_fw2.c,v retrieving revision 1.106.2.12 diff -d -u -r1.106.2.12 ip_fw2.c --- sys/netinet/ip_fw2.c 9 Mar 2006 13:42:44 -0000 1.106.2.12 +++ sys/netinet/ip_fw2.c 24 Sep 2007 21:10:21 -0000 @@ -96,6 +96,14 @@ #include /* XXX for in_cksum */ +#if defined M_IRONPORT_FWD +/* generally skip the firewall after the first time, + which will be for redirection. Role on vimage! */ +#define M_FIREWALL_IMMUNITY (M_SKIP_FIREWALL | M_IRONPORT_FWD) +#else +#define M_FIREWALL_IMMUNITY (M_SKIP_FIREWALL) +#endif + /* * set_disable contains one bit per set value (0..31). * If the bit is set, all rules with the corresponding set @@ -750,6 +758,8 @@ flags = TH_RST|TH_ACK; } bcopy(&ti, ip6, sizeof(ti)); + /* tcp_respond keeps the mbuf (and hense flags) */ + args->m->m_flags |= M_FIREWALL_IMMUNITY ; tcp_respond(NULL, ip6, (struct tcphdr *)(ip6 + 1), args->m, ack, seq, flags); @@ -1628,7 +1638,7 @@ */ ip->ip_ttl = ip_defttl; ip->ip_len = m->m_pkthdr.len; - m->m_flags |= M_SKIP_FIREWALL; + m->m_flags |= M_FIREWALL_IMMUNITY; return (m); } @@ -1646,6 +1656,8 @@ ip->ip_len = ntohs(ip->ip_len); ip->ip_off = ntohs(ip->ip_off); } + /* the flags get copied */ + args->m->m_flags |= M_FIREWALL_IMMUNITY; icmp_error(args->m, ICMP_UNREACH, code, 0L, 0); } else if (offset == 0 && args->f_id.proto == IPPROTO_TCP) { struct tcphdr *const tcp = @@ -2139,6 +2169,12 @@ /* end of ipv6 variables */ int is_ipv4 = 0; +#if defined(M_IRONPORT_FWD) + if (m->m_flags & M_IRONPORT_FWD) + /* We have a free pass.. but we only get to use it once */ + m->m_flags &= ~M_IRONPORT_FWD; + else +#endif if (m->m_flags & M_SKIP_FIREWALL) return (IP_FW_PASS); /* accept */ Index: sys/netinet/ip_icmp.c =================================================================== RCS file: /usr/local/cvsroot/freebsd/src/sys/netinet/ip_icmp.c,v retrieving revision 1.101.2.2 diff -d -u -r1.101.2.2 ip_icmp.c --- sys/netinet/ip_icmp.c 16 Feb 2006 17:50:57 -0000 1.101.2.2 +++ sys/netinet/ip_icmp.c 24 Sep 2007 21:10:21 -0000 @@ -269,7 +269,7 @@ * If the original mbuf was meant to bypass the firewall, the error * reply should bypass as well. */ - m->m_flags |= n->m_flags & M_SKIP_FIREWALL; + m->m_flags |= n->m_flags & (M_SKIP_FIREWALL|M_IRONPORT_FWD); m->m_data -= sizeof(struct ip); m->m_len += sizeof(struct ip); m->m_pkthdr.len = m->m_len; Index: sys/netinet/ip_output.c =================================================================== RCS file: /usr/local/cvsroot/freebsd/src/sys/netinet/ip_output.c,v retrieving revision 1.242.2.8 diff -d -u -r1.242.2.8 ip_output.c --- sys/netinet/ip_output.c 31 Jan 2006 16:06:05 -0000 1.242.2.8 +++ sys/netinet/ip_output.c 24 Sep 2007 21:10:21 -0000 @@ -661,6 +661,9 @@ /* Jump over all PFIL processing if hooks are not active. */ if (inet_pfil_hook.ph_busy_count == -1) goto passout; + /* Or if we were told to */ + if ((m->m_flags & (M_IRONPORT_FWD|M_SKIP_FIREWALL)) == M_SKIP_FIREWALL) + goto passout; /* Run through list of hooks for output packets. */ odst.s_addr = ip->ip_dst.s_addr; @@ -672,6 +675,7 @@ /* See if destination IP address was changed by packet filter. */ if (odst.s_addr != ip->ip_dst.s_addr) { + m->m_flags &= ~M_IRONPORT_FWD; m->m_flags |= M_SKIP_FIREWALL; /* If destination is now ourself drop to ip_input(). */ if (in_localip(ip->ip_dst)) { @@ -716,6 +720,8 @@ #endif dst = (struct sockaddr_in *)&ro->ro_dst; bcopy((fwd_tag+1), dst, sizeof(struct sockaddr_in)); + /* If we had a free pass for forewarding, lose it */ + m->m_flags &= ~M_IRONPORT_FWD; m->m_flags |= M_SKIP_FIREWALL; m_tag_delete(m, fwd_tag); goto again; Index: sys/sys/mbuf.h =================================================================== RCS file: /usr/local/cvsroot/freebsd/src/sys/sys/mbuf.h,v retrieving revision 1.170.2.6 diff -d -u -r1.170.2.6 mbuf.h --- sys/sys/mbuf.h 23 Mar 2006 23:24:32 -0000 1.170.2.6 +++ sys/sys/mbuf.h 24 Sep 2007 21:10:21 -0000 @@ -167,6 +167,7 @@ #define M_PROTO3 0x0040 /* protocol-specific */ #define M_PROTO4 0x0080 /* protocol-specific */ #define M_PROTO5 0x0100 /* protocol-specific */ +#define M_IRONPORT_FWD 0x2000 /* Allow one (and only one) firewall FWD pass */ #define M_SKIP_FIREWALL 0x4000 /* skip firewall processing */ #define M_FREELIST 0x8000 /* mbuf is on the free list */ @@ -200,7 +201,7 @@ #define M_COPYFLAGS (M_PKTHDR|M_EOR|M_RDONLY|M_PROTO1|M_PROTO1|M_PROTO2|\ M_PROTO3|M_PROTO4|M_PROTO5|M_SKIP_FIREWALL|\ M_BCAST|M_MCAST|M_FRAG|M_FIRSTFRAG|M_LASTFRAG|\ - M_VLANTAG) + M_VLANTAG|M_IRONPORT_FWD) /* * Flags indicating hw checksum support and sw checksum requirements. --------------090905030908090805090805--