Date: Wed, 15 Feb 2006 21:27:38 +0100 From: Andre Oppermann <andre@freebsd.org> To: Ruslan Ermilov <ru@FreeBSD.org> Cc: net@FreeBSD.org Subject: Re: [FIX] dummynet breaks IP reassembly Message-ID: <43F38EBA.F96DE9C5@freebsd.org> References: <20060210221415.GC33588@ip.net.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Ruslan Ermilov wrote: > > Hi Andre, > > If I'm not mistaken, *you* converted ipfw to use pfil(9). > During the conversion, the following bug was introduced. > > When forwarding fragmented packets through a dummynet pipe > (ip_input -> ip_forward -> ip_output -> pipe -> ip_output) > the last ip_output() in the chain that does the actual IP > delivery sets ip_id of all fragments to different values, > making it impossible to reassemble the packet at receive > side. > > Example of forwarding a fragmented IP datagram from em0 > to xl0: > > # tcpdump -nvi em0 net 192.168.2 > tcpdump: listening on em0, link-type EN10MB (Ethernet), capture size 96 bytes > 00:04:35.170994 IP (tos 0x0, ttl 64, id 2186, offset 0, flags [+], proto: ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 2819, seq 0, length 1480 > 00:04:35.171242 IP (tos 0x0, ttl 64, id 2186, offset 1480, flags [none], proto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp > > # tcpdump -nvi xl0 net 192.168.2 > tcpdump: listening on xl0, link-type EN10MB (Ethernet), capture size 96 bytes > 00:04:35.171028 IP (tos 0x0, ttl 63, id 10560, offset 0, flags [+], proto: ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 2819, seq 0, length 1480 > 00:04:35.171266 IP (tos 0x0, ttl 63, id 10561, offset 1480, flags [none], proto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp > > Note that IDs were rewritten from the original 2186 and > worse, they are different. > > In 4,x, the "flags" field (see the patch below) was used > to keep the IP_FORWARDING bit passed to ip_output() by > ip_forward(). This bit was kept in the dummynet packet > tag (in the "flags" field) and later passed to the second > ip_output() call, causing the latter to NOT touch the IP > header again. This feature was lost, resulting in a bug. > > Since dummynet never calls ip_output() with unfilled > header, it's safe to always call ip_output() from dummynet > with the IP_FORWARDING bit set, to indicate it's forwarded > from another ip_output() and so it shouldn't attempt to > modify the header. That is correct. > Surprisingly, it "seemed" to work for packets exceeding > MTU and originating from the dummynetting host, mainly > because the packet wasn't fragmented while in dummynet. > Yet, the ip_id field was always incremented in my tests > (pipe with no bandwidth limitation), comparing it before > and after the dummynet processing. Now, the ip_id is > always preserved. Packets that originated on the dummynetting host don't get fragmented until they actually hit the if_output() in ip_output(). The entire over-sized packet stays intact while in dummynet and only get fragmented later. > # tcpdump -nvi em0 net 192.168.2 > tcpdump: listening on em0, link-type EN10MB (Ethernet), capture size 96 bytes > 00:12:04.654669 IP (tos 0x0, ttl 64, id 2192, offset 0, flags [+], proto: ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 7939, seq 0, length 1480 > 00:12:04.654917 IP (tos 0x0, ttl 64, id 2192, offset 1480, flags [none], proto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp > > # tcpdump -nvi xl0 net 192.168.2 > tcpdump: listening on xl0, link-type EN10MB (Ethernet), capture size 96 bytes > 00:12:04.654703 IP (tos 0x0, ttl 63, id 2192, offset 0, flags [+], proto: ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 7939, seq 0, length 1480 > 00:12:04.654939 IP (tos 0x0, ttl 63, id 2192, offset 1480, flags [none], proto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp > > (Note the ip_id is preserved when forwarding.) > > I'm not sure about the IPv6 portion but it's consistent > with the normal forwarding path so I believe it's correct. > Comments? Looks correct. Dunno about the effect on IPv6. IIRC IPv6 doesn't support fragmenting packets and always has to do PMTU discovery. Thanks for tracking down and fixing this bug. -- Andre > %%% > Index: ip_dummynet.c > =================================================================== > RCS file: /home/ncvs/src/sys/netinet/ip_dummynet.c,v > retrieving revision 1.98 > diff -u -p -r1.98 ip_dummynet.c > --- ip_dummynet.c 3 Feb 2006 11:38:19 -0000 1.98 > +++ ip_dummynet.c 10 Feb 2006 21:20:59 -0000 > @@ -769,7 +769,7 @@ dummynet_send(struct mbuf *m) > pkt = dn_tag_get(m); > switch (pkt->dn_dir) { > case DN_TO_IP_OUT: > - ip_output(m, NULL, NULL, pkt->flags, NULL, NULL); > + ip_output(m, NULL, NULL, IP_FORWARDING, NULL, NULL); > break ; > case DN_TO_IP_IN : > ip = mtod(m, struct ip *); > @@ -783,7 +783,7 @@ dummynet_send(struct mbuf *m) > break; > > case DN_TO_IP6_OUT: > - ip6_output(m, NULL, NULL, pkt->flags, NULL, NULL, NULL); > + ip6_output(m, NULL, NULL, IPV6_FORWARDING, NULL, NULL, NULL); > break; > #endif > case DN_TO_IFB_FWD: > @@ -1129,7 +1129,6 @@ locate_pipe(int pipe_nr) > * ifp the 'ifp' parameter from the caller. > * NULL in ip_input, destination interface in ip_output, > * rule matching rule, in case of multiple passes > - * flags flags from the caller, only used in ip_output > * > */ > static int > @@ -1213,8 +1212,6 @@ dummynet_io(struct mbuf *m, int dir, str > pkt->dn_dir = dir ; > > pkt->ifp = fwa->oif; > - if (dir == DN_TO_IP_OUT || dir == DN_TO_IP6_OUT) > - pkt->flags = fwa->flags; > > if (q->head == NULL) > q->head = m; > Index: ip_dummynet.h > =================================================================== > RCS file: /home/ncvs/src/sys/netinet/ip_dummynet.h,v > retrieving revision 1.38 > diff -u -p -r1.38 ip_dummynet.h > --- ip_dummynet.h 29 Nov 2005 00:11:01 -0000 1.38 > +++ ip_dummynet.h 10 Feb 2006 21:13:45 -0000 > @@ -130,7 +130,6 @@ struct dn_pkt_tag { > > dn_key output_time; /* when the pkt is due for delivery */ > struct ifnet *ifp; /* interface, for ip_output */ > - int flags ; /* flags, for ip_output (IPv6 ?) */ > struct _ip6dn_args ip6opt; /* XXX ipv6 options */ > }; > #endif /* _KERNEL */ > Index: ip_fw.h > =================================================================== > RCS file: /home/ncvs/src/sys/netinet/ip_fw.h,v > retrieving revision 1.103 > diff -u -p -r1.103 ip_fw.h > --- ip_fw.h 13 Dec 2005 12:16:02 -0000 1.103 > +++ ip_fw.h 10 Feb 2006 21:15:41 -0000 > @@ -510,8 +510,6 @@ struct ip_fw_args { > struct ip_fw *rule; /* matching rule */ > struct ether_header *eh; /* for bridged packets */ > > - int flags; /* for dummynet */ > - > struct ipfw_flow_id f_id; /* grabbed from IP header */ > u_int32_t cookie; /* a cookie depending on rule action */ > struct inpcb *inp; > %%% > > Cheers, > -- > Ruslan Ermilov > ru@FreeBSD.org > FreeBSD committer > > -------------------------------------------------------------------------------- > Part 1.2Type: application/pgp-signature
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?43F38EBA.F96DE9C5>