From owner-freebsd-net@FreeBSD.ORG Sun Feb 1 11:49:57 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AC379106566C for ; Sun, 1 Feb 2009 11:49:57 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 874CD8FC20 for ; Sun, 1 Feb 2009 11:49:57 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 24ED346B37; Sun, 1 Feb 2009 06:49:57 -0500 (EST) Date: Sun, 1 Feb 2009 11:49:57 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: "Per Hurtig (work)" In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: TCP gets special treatment? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Feb 2009 11:49:57 -0000 On Wed, 28 Jan 2009, Per Hurtig (work) wrote: > How differently are TCP packets treated compared to e.g. SCTP packets, while > traversing the FreeBSD network stack (up to and including the IP-layer when > using ipfw)?. I do not assume that the firewall (ipfw) is explicitly > configured to check for established sessions or any TCP specifics. Are there > a lot of TCP-specific optimizations conducted by lower layers anyways > (besides possible checksum offloading)? Hi Per: On the whole, TCP packets are treated like any other packet until they reach the tcp_input() function during the input path, and once they've entered ip_output() in the output path. There are some exceptions that I'm aware of, including: - ipfw(4) has special knowledge of the layout and semantics of TCP packets, including stateful tracking of TCP connections, etc. ipfw(4) is able use (output) or to look up (input) the local socket for the purposes of identifying the credential that was or may be associated with. Many of us consider this highly dubious behavior subject to race conditions and unexpected semantics, but it appears to be popular functionality. Other firewall packets, including pf(4) have this functionality as well. - The IP input protocol dispatch (in_proto.c) doesn't set PR_LASTHDR for TCP (and UDP for that matter) because IPSEC policy is aware of TCP-level properties, meaning that some IPSEC processing (policy checking) isn't performed in the normal IPSEC input path and instead deferred to the TCP input path. See ip_ipsec.c. - Various sorts of checksum offload and segmentation offload require TCP segments to be handled outside of the core TCP routines, including ip_output(), where deferred checksum calculations will be performed if it turns out the output interface doesn't support hardware checksumming, and where TSO segments may be rejected, and in device drivers that perform (for example) TSO and LSO and are therefore aware (in some form) of TCP processing. tcp_lso.c, for example, is entirely called from the device driver in order to perform early reassembly, if the device driver supports it (primarily 10gbps drivers). Robert N M Watson Computer Laboratory University of Cambridge