Date: Thu, 17 Oct 2002 17:46:37 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Julian Elischer <julian@elischer.org> Cc: Vincent Jardin <vjardin@wanadoo.fr>, freebsd-net@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: Netgraph TCP/IP Message-ID: <3DAF59ED.D14BD089@mindspring.com> References: <Pine.BSF.4.21.0210171450440.2971-100000@InterJet.elischer.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Julian Elischer wrote: > > There is also the m_pullup() issue of the TCP protocol that is > > being passed IP datagrams which may be frags of TCP packets, in > > order to get the full TCP header, with options. > > The tcp code should handle this anyway. It should, but it won't. The issue is when you need to make a decision based on TCP packet contents, but you don't have a complete packet. The expected behaviour is to call m_pullup. For a Netgraph version of this, you will either need a context (there isn't one at that point -- it runs at NETISR), or you will need to be able to restart tcp_input(). The problem with that is that it's expensive. Effectively, you almost need to seperate out the frag code before, and assemble whole packets before going into the traditional tcp_input(). Lemon has some good idea in this area; so do I. I've got code here, where I've moved around some of the operations to delay computations until fill data is available (which would avoid recomputation). > > Minimally, the approach has to be a seperate TCP stack, which is > > given a different protocol number for the purposes of experiment, > > so that you can have a duplicate TCP stack on both sides using > > the normal mechanism, and replace it on one side with the Netgraph > > version equivalen. > > Not necessarily.. if each stack can 'reject' a packet.. ("not mine"). The problem with this one is that the packet in this case is TCP. Ideally, what you would like for the developement case is to be able to tag particular flows as going to one TCP stack vs. the other; then by examining the flow tag, youy would be able to decide where to handle the packet (this also implies moving the frag reassembly to a seperate "layer"). Then you could flag a flow, and have it dealt with that way. In FreeBSD's lower level code, though, IP flows aren't really treated as flows; this is partly an artifact of the routing code, itself, and partly an artifact if inpcbhash(), which is actually broken. The hash on iput for a flow vs. output on a flow allocation is also broken; consider, that you can make a connection on an outbound socket which is not bound, from a specific source IP address, with no specific source port, and the source port contention is handled globally, rather than locally -- thus limiting you to 65535 maximum outbound connections on a single machine (the number of ports in a single globally contended IP address space, despite the fact that your source IP was specified). What this adss up to is that if you want to run stacks in parallel, they can't share protocol numbers, because the code does not really distinguish them at the proper layer, but instead, distinguishes them off-by-one. This actually makes processing slower overall, as well; consider that the fast forwarding code does a lookup, which, on a miss, is then passed up to the TCP to do another lookup, rather than passing the lookup result as part of the context. What this basically means is that the hash entries for the values are not shared, with a single hit-per-flow, and the more "fast forwarding" you do, the slower normal processing goes. The same happens for the SYN cache context lookup. It basically really slows down the code, to not do the hash lower down, and then make the decision on the basis of the result of identifying the flow. In a general sense, what you would need to do to do what you suggest is to pass all packets through all possible stacks, until you hit the "default" one -- the standard TCP -- and rely on all the other stacks to not eat the packet. In practice, this means that every IP protocol, except TCP, ends up getting rewritten for Netgraph use, until TCP gets rewritten, and then your lookup is O(N*MAX(1,flow_count/hash_size)), where N is the number of IP protocols (TCP, UDP, RTP, etc.). Also, in practice, this still doubles the overhead for things that need to be pre-decided (flow identification for IP fast forwarding, DSR, splicing, etc.), and that none of these things can really be safely implemented as Netgraph modules. Yeah, it can be made to function correctly that way, but it won't function quickly. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3DAF59ED.D14BD089>