Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 1 Feb 2009 11:49:57 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        "Per Hurtig (work)" <per.hurtig@kau.se>
Cc:        freebsd-net@freebsd.org
Subject:   Re: TCP gets special treatment?
Message-ID:  <alpine.BSF.2.00.0902011141400.47005@fledge.watson.org>
In-Reply-To: <a846cbcf0901280249w31265880x88c60762a111b7d8@mail.gmail.com>
References:  <a846cbcf0901280249w31265880x88c60762a111b7d8@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Wed, 28 Jan 2009, Per Hurtig (work) wrote:

> How differently are TCP packets treated compared to e.g. SCTP packets, while 
> traversing the FreeBSD network stack (up to and including the IP-layer when 
> using ipfw)?. I do not assume that the firewall (ipfw) is explicitly 
> configured to check for established sessions or any TCP specifics. Are there 
> a lot of TCP-specific optimizations conducted by lower layers anyways 
> (besides possible checksum offloading)?

Hi Per:

On the whole, TCP packets are treated like any other packet until they reach 
the tcp_input() function during the input path, and once they've entered 
ip_output() in the output path.  There are some exceptions that I'm aware of, 
including:

- ipfw(4) has special knowledge of the layout and semantics of TCP packets,
   including stateful tracking of TCP connections, etc.  ipfw(4) is able use
   (output) or to look up (input) the local socket for the purposes of
   identifying the credential that was or may be associated with.  Many of us
   consider this highly dubious behavior subject to race conditions and
   unexpected semantics, but it appears to be popular functionality.  Other
   firewall packets, including pf(4) have this functionality as well.

- The IP input protocol dispatch (in_proto.c) doesn't set PR_LASTHDR for TCP
   (and UDP for that matter) because IPSEC policy is aware of TCP-level
   properties, meaning that some IPSEC processing (policy checking) isn't
   performed in the normal IPSEC input path and instead deferred to the TCP
   input path.  See ip_ipsec.c.

- Various sorts of checksum offload and segmentation offload require TCP
   segments to be handled outside of the core TCP routines, including
   ip_output(), where deferred checksum calculations will be performed if it
   turns out the output interface doesn't support hardware checksumming, and
   where TSO segments may be rejected, and in device drivers that perform (for
   example) TSO and LSO and are therefore aware (in some form) of TCP
   processing.  tcp_lso.c, for example, is entirely called from the device
   driver in order to perform early reassembly, if the device driver supports
   it (primarily 10gbps drivers).

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.0902011141400.47005>