From owner-svn-src-all@freebsd.org Tue Jun 28 13:37:03 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 819DCB85A2C; Tue, 28 Jun 2016 13:37:03 +0000 (UTC) (envelope-from jtl@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4FF132217; Tue, 28 Jun 2016 13:37:03 +0000 (UTC) (envelope-from jtl@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id u5SDb28I082558; Tue, 28 Jun 2016 13:37:02 GMT (envelope-from jtl@FreeBSD.org) Received: (from jtl@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id u5SDb2rh082554; Tue, 28 Jun 2016 13:37:02 GMT (envelope-from jtl@FreeBSD.org) Message-Id: <201606281337.u5SDb2rh082554@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: jtl set sender to jtl@FreeBSD.org using -f From: "Jonathan T. Looney" Date: Tue, 28 Jun 2016 13:37:02 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r302247 - in head: share/man/man4 share/man/man9 tools/build/options X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Jun 2016 13:37:03 -0000 Author: jtl Date: Tue Jun 28 13:37:01 2016 New Revision: 302247 URL: https://svnweb.freebsd.org/changeset/base/302247 Log: Document support for alternate TCP stacks. Differential Revision: https://reviews.freebsd.org/D6940 Reviewed by: hiren Approved by: re (gjb) Sponsored by: Juniper Networks Added: head/share/man/man9/tcp_functions.9 (contents, props changed) head/tools/build/options/WITH_EXTRA_TCP_STACKS (contents, props changed) Modified: head/share/man/man4/tcp.4 head/share/man/man9/Makefile Modified: head/share/man/man4/tcp.4 ============================================================================== --- head/share/man/man4/tcp.4 Tue Jun 28 07:47:42 2016 (r302246) +++ head/share/man/man4/tcp.4 Tue Jun 28 13:37:01 2016 (r302247) @@ -34,7 +34,7 @@ .\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93 .\" $FreeBSD$ .\" -.Dd May 19, 2016 +.Dd June 28, 2016 .Dt TCP 4 .Os .Sh NAME @@ -119,7 +119,7 @@ supports a number of socket options whic .Xr setsockopt 2 and tested with .Xr getsockopt 2 : -.Bl -tag -width ".Dv TCP_CONGESTION" +.Bl -tag -width ".Dv TCP_FUNCTION_BLK" .It Dv TCP_INFO Information about a socket's underlying TCP session may be retrieved by passing the read-only option @@ -148,6 +148,20 @@ connection. See .Xr mod_cc 4 for details. +.It Dv TCP_FUNCTION_BLK +Select or query the set of functions that TCP will use for this connection. +This allows a user to select an alternate TCP stack. +The alternate TCP stack must already be loaded in the kernel. +To list the available TCP stacks, see +.Va functions_available +in the +.Sx MIB Variables +section further down. +To list the default TCP stack, see +.Va functions_default +in the +.Sx MIB Variables +section. .It Dv TCP_KEEPINIT This .Xr setsockopt 2 @@ -568,6 +582,10 @@ Number of times default MSS was used in .It Va pmtud_blackhole_failed Number of connections for which retransmits continued even after MSS downshift. +.It Va functions_available +List of available TCP function blocks (TCP stacks). +.It Va functions_default +The default TCP function block (TCP stack). .El .Sh ERRORS A socket operation may fail with one of the following errors returned: @@ -599,6 +617,10 @@ exists; .It Bq Er EAFNOSUPPORT when an attempt is made to bind or connect a socket to a multicast address. +.It Bq Er EINVAL +when trying to change TCP function blocks at an invalid point in the session; +.It Bq Er ENOENT +when trying to use a TCP function block that is not available; .El .Sh SEE ALSO .Xr getsockopt 2 , Modified: head/share/man/man9/Makefile ============================================================================== --- head/share/man/man9/Makefile Tue Jun 28 07:47:42 2016 (r302246) +++ head/share/man/man9/Makefile Tue Jun 28 13:37:01 2016 (r302247) @@ -284,6 +284,7 @@ MAN= accept_filter.9 \ sysctl_ctx_init.9 \ SYSINIT.9 \ taskqueue.9 \ + tcp_functions.9 \ thread_exit.9 \ time.9 \ timeout.9 \ @@ -1734,6 +1735,8 @@ MLINKS+=taskqueue.9 TASK_INIT.9 \ taskqueue.9 taskqueue_start_threads_pinned.9 \ taskqueue.9 taskqueue_unblock.9 \ taskqueue.9 TIMEOUT_TASK_INIT.9 +MLINKS+=tcp_functions.9 register_tcp_functions.9 \ + tcp_functions.9 deregister_tcp_functions.9 MLINKS+=time.9 boottime.9 \ time.9 time_second.9 \ time.9 time_uptime.9 Added: head/share/man/man9/tcp_functions.9 ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/share/man/man9/tcp_functions.9 Tue Jun 28 13:37:01 2016 (r302247) @@ -0,0 +1,285 @@ +.\" +.\" Copyright (c) 2016 Jonathan Looney +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR +.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd June 28, 2016 +.Dt TCP_FUNCTIONS 9 +.Os +.Sh NAME +.Nm tcp_functions +.Nd Alternate TCP Stack Framework +.Sh SYNOPSIS +.In netinet/tcp.h +.In netinet/tcp_var.h +.Ft int +.Fn register_tcp_functions "struct tcp_function_block *blk" "int wait" +.Ft int +.Fn deregister_tcp_functions "struct tcp_function_block *blk" +.Sh DESCRIPTION +The +.Nm +framework allows a kernel developer to implement alternate TCP stacks. +The alternate stacks can be compiled in the kernel or can be implemented in +loadable kernel modules. +This functionality is intended to encourage experimentation with the TCP stack +and to allow alternate behaviors to be deployed for different TCP connections +on a single system. +.Pp +A system administrator can set a system default stack. +By default, all TCP connections will use the system default stack. +Additionally, users can specify a particular stack to use on a per-connection +basis. +(See +.Xr tcp 4 +for details on setting the system default stack, or selecting a specific stack +for a given connection.) +.Pp +This man page treats "TCP stacks" as synonymous with "function blocks". +This is intentional. +A "TCP stack" is a collection of functions that implement a set of behavior. +Therefore, an alternate "function block" defines an alternate "TCP stack". +.Pp +.Nm +modules must call the +.Fn register_tcp_functions +function during initialization and successfully call the +.Fn deregister_tcp_functions +function prior to allowing the module to be unloaded. +.Pp +The +.Fn register_tcp_functions +function requests that the system add a specified function block to the system. +.Pp +The +.Fn deregister_tcp_functions +function requests that the system remove a specified function block from the +system. +If the call fails because sockets are still using the specified function block, +the system will mark the function block as being in the process of being +removed. +This will prevent additional sockets from using the specified function block. +However, it will not impact sockets that are already using the function block. +.Pp +The +.Fa blk +argument is a pointer to a +.Vt "struct tcp_function_block" , +which is explained below (see +.Sx Function Block Structure ) . +The +.Fa wait +argument is used as the +.Fa flags +argument to +.Xr malloc 9 , +and must be set to one of the valid values defined in that man page. +.Ss Function Block Structure +The +.Fa blk argument is a pointer to a +.Vt "struct tcp_function_block" , +which has the following members: +.Bd -literal -offset indent +struct tcp_function_block { + char tfb_tcp_block_name[TCP_FUNCTION_NAME_LEN_MAX]; + int (*tfb_tcp_output)(struct tcpcb *); + void (*tfb_tcp_do_segment)(struct mbuf *, struct tcphdr *, + struct socket *, struct tcpcb *, + int, int, uint8_t, + int); + int (*tfb_tcp_ctloutput)(struct socket *so, + struct sockopt *sopt, + struct inpcb *inp, struct tcpcb *tp); + /* Optional memory allocation/free routine */ + void (*tfb_tcp_fb_init)(struct tcpcb *); + void (*tfb_tcp_fb_fini)(struct tcpcb *); + /* Optional timers, must define all if you define one */ + int (*tfb_tcp_timer_stop_all)(struct tcpcb *); + void (*tfb_tcp_timer_activate)(struct tcpcb *, + uint32_t, u_int); + int (*tfb_tcp_timer_active)(struct tcpcb *, uint32_t); + void (*tfb_tcp_timer_stop)(struct tcpcb *, uint32_t); + void (*tfb_tcp_rexmit_tmr)(struct tcpcb *); + volatile uint32_t tfb_refcnt; + uint32_t tfb_flags; +}; +.Ed +.Pp +The +.Va tfb_tcp_block_name +field identifies the unique name of the TCP stack, and should be no longer than +TCP_FUNCTION_NAME_LEN_MAX-1 characters in length. +.Pp +The +.Va tfb_tcp_output , +.Va tfb_tcp_do_segment , +and +.Va tfb_tcp_ctloutput +fields are pointers to functions that perform the equivalent actions +as the default +.Fn tcp_output , +.Fn tcp_do_segment , +and +.Fn tcp_default_ctloutput +functions, respectively. +Each of these function pointers must be non-NULL. +.Pp +If a TCP stack needs to initialize data when a socket first selects the TCP +stack (or, when the socket is first opened), it should set a non-NULL +pointer in the +.Va tfb_tcp_fb_init +field. +Likewise, if a TCP stack needs to cleanup data when a socket stops using the +TCP stack (or, when the socket is closed), it should set a non-NULL pointer +in the +.Va tfb_tcp_fb_fini +field. +.Pp +If the TCP stack implements additional timers, the TCP stack should set a +non-NULL pointer in the +.Va tfb_tcp_timer_stop_all , +.Va tfb_tcp_timer_activate , +.Va tfb_tcp_timer_active , +and +.Va tfb_tcp_timer_stop +fields. +These fields should all be +.Dv NULL +or should all contain pointers to functions. +The +.Va tfb_tcp_timer_activate , +.Va tfb_tcp_timer_active , +and +.Va tfb_tcp_timer_stop +functions will be called when the +.Fn tcp_timer_activate , +.Fn tcp_timer_active , +and +.Fn tcp_timer_stop +functions, respectively, are called with a timer type other than the standard +types. +The functions defined by the TCP stack have the same semantics (both for +arguments and return values) as the normal timer functions they supplement. +.Pp +Additionally, a stack may define its own actions to take when the retransmit +timer fires by setting a non-NULL function pointer in the +.Va tfb_tcp_rexmit_tmr +field. +This function is called very early in the process of handling a retransmit +timer. +However, care must be taken to ensure the retransmit timer leaves the +TCP control block in a valid state for the remainder of the retransmit +timer logic. +.Pp +The +.Va tfb_refcnt +and +.Va tfb_flags +fields are used by the kernel's TCP code and will be initialized when the +TCP stack is registered. +.Ss Requirements for Alternate TCP Stacks +If the TCP stack needs to store data beyond what is stored in the default +TCP control block, the TCP stack can initialize its own per-connection storage. +The +.Va t_fb_ptr +field in the +.Vt "struct tcpcb" +control block structure has been reserved to hold a pointer to this +per-connection storage. +If the TCP stack uses this alternate storage, it should understand that the +value of the +.Va t_fb_ptr +pointer may not be initialized to +.Dv NULL . +Therefore, it should use a +.Va tfb_tcp_fb_init +function to initialize this field. +Additionally, it should use a +.Va tfb_tcp_fb_fini +function to deallocate storage when the socket is closed. +.Pp +It is understood that alternate TCP stacks may keep different sets of data. +However, in order to ensure that data is available to both the user and the +rest of the system in a standardized format, alternate TCP stacks must +update all fields in the TCP control block to the greatest extent practical. +.Sh RETURN VALUES +The +.Fn register_tcp_functions +and +.Fn deregister_tcp_functions +functions return zero on success and non-zero on failure. +In particular, the +.Fn deregister_tcp_functions +will return +.Er EBUSY +until no more connections are using the specified TCP stack. +A module calling +.Fn deregister_tcp_functions +must be prepared to wait until all connections have stopped using the +specified TCP stack. +.Sh ERRORS +The +.Fn register_tcp_functions +function will fail if: +.Bl -tag -width Er +.It Bq Er EINVAL +Any of the members of the +.Fa blk +argument are set incorrectly. +.It Bq Er ENOMEM +The function could not allocate memory for its internal data. +.It Bq Er EALREADY +A function block is already registered with the same name. +.El +The +.Fn deregister_tcp_functions +function will fail if: +.Bl -tag -width Er +.It Bq Er EPERM +The +.Fa blk +argument references the kernel's compiled-in default function block. +.It Bq Er EBUSY +The function block is still in use by one or more sockets, or is defined as +the current default function block. +.It Bq Er ENOENT +The +.Fa blk +argument references a function block that is not currently registered. +.Sh SEE ALSO +.Xr malloc 9 , +.Xr tcp 4 +.Sh HISTORY +This framework first appeared in +.Fx 11.0 . +.Sh AUTHORS +.An -nosplit +The +.Nm +framework was written by +.An Randall Stewart Aq Mt rrs@FreeBSD.org . +.Pp +This manual page was written by +.An Jonathan Looney Aq Mt jtl@FreeBSD.org . Added: head/tools/build/options/WITH_EXTRA_TCP_STACKS ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/tools/build/options/WITH_EXTRA_TCP_STACKS Tue Jun 28 13:37:01 2016 (r302247) @@ -0,0 +1,2 @@ +.\" $FreeBSD$ +Set to build extra TCP stack modules.