From owner-freebsd-net@FreeBSD.ORG Sun Jun 3 20:59:38 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D8DA51065674 for ; Sun, 3 Jun 2012 20:59:38 +0000 (UTC) (envelope-from bounces+73574-866e-freebsd-net=freebsd.org@sendgrid.me) Received: from o3.shared.sendgrid.net (o3.shared.sendgrid.net [208.117.48.85]) by mx1.freebsd.org (Postfix) with SMTP id 923BB8FC14 for ; Sun, 3 Jun 2012 20:59:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sendgrid.info; h= message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=smtpapi; bh=gVzXeyUzNIzMszoS01iUAHe hjN8=; b=BQOYBRoRtikd4zTpT9EC7GxyL6ZL6CB5eSWC+d1oFFrjPDpq7vQp0VV +YGRzD1svZBCcDpFRM15qOkRiQkF+ihb1/NLer8Cq2RyELKkYmm9vzLJaVqAwwNK TAU9RQMzt4MvsL6wQKK6pyoYdEwbNtPhi7wSyzJxTgvKhN6vhFdQ= Received: by 10.41.149.111 with SMTP id f04-08.10408.4FCBD039B Sun, 03 Jun 2012 20:59:37 +0000 (UTC) Received: from mail.tarsnap.com (unknown [10.9.180.5]) by mi4 (SG) with ESMTP id 4fcbd039.462b.29a473f for ; Sun, 03 Jun 2012 15:59:37 -0500 (CST) Received: (qmail 34513 invoked from network); 3 Jun 2012 20:51:55 -0000 Received: from unknown (HELO clamshell.daemonology.net) (127.0.0.1) by mail.tarsnap.com with ESMTP; 3 Jun 2012 20:51:55 -0000 Received: (qmail 35456 invoked from network); 3 Jun 2012 20:59:26 -0000 Received: from unknown (HELO clamshell.daemonology.net) (127.0.0.1) by clamshell.daemonology.net with SMTP; 3 Jun 2012 20:59:26 -0000 Message-ID: <4FCBD02D.2020500@freebsd.org> Date: Sun, 03 Jun 2012 13:59:25 -0700 From: Colin Percival User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120509 Thunderbird/12.0.1 MIME-Version: 1.0 To: Andrew Gallatin References: <4FC635CC.5030608@freebsd.org> <4FC63D27.70807@cs.duke.edu> <4FCB95F6.30204@freebsd.org> <4FCBB56D.2080800@cs.duke.edu> In-Reply-To: <4FCBB56D.2080800@cs.duke.edu> X-Enigmail-Version: 1.5pre Content-Type: multipart/mixed; boundary="------------030204000100060308060202" X-Sendgrid-EID: ISGKkmHlRE12gnWy0TjFyGQSFR1WnpjQdICm3qu6YxEJ7pQio6/S9Skj7KnmnF2556962czSUSXk201SDMJj4EwI/+Dz4EewBDE/zOOkSHhasKA03wSZdlURjWTCtD6k6+CPVzfqKufn63SJ1CARs3zr+vKUe+sMxIYWCZAC6ys= X-SendGrid-Contentd-ID: {"test_id":1338757177} Cc: freebsd-net@freebsd.org Subject: Re: [please review] TSO mbuf chain length limiting patch X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Jun 2012 20:59:38 -0000 This is a multi-part message in MIME format. --------------030204000100060308060202 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 06/03/12 12:05, Andrew Gallatin wrote: > On 06/03/12 12:51, Colin Percival wrote: >> I've attached a new patch which: >> 1. adds a IFCAP_TSO_MSS "capability" and a if_tx_tso_mss field to struct ifnet, >> 2. sets these in netfront when the IFCAP_TSO4 flag is set, >> 3. extends tcp_maxmtu to read this value, >> 4. adds a tx_tso_mss field to struct tcpcb, >> 5. makes tcp_mss_update set tx_tso_mss using tcp_maxmtu, and >> 6. limits TSO lengths to tx_tso_mss in tcp_output. > > Looks good to me. I don't pretend to understand the bowels of > the TCP stack, so I can't comment on the "sendalot" stuff to > force segmentation. The "sendalot" bit is rather confusing, and I'm not at all certain that it's doing what it's supposed to be doing, but my reading of the code is that it really means "there's more data buffered than what we're sending right now so after we issue this write we need to go back to the top and do another". > I assume it works as well as your > previous patch to solve the problems you were seeing with Xen? Yes. > One minor nit is that I envision this limit to always be > 64K or less, so you could probably get away with stealing > 16 bits from ifcspare. With IPv6 I could imagine larger limits happening, so I figured it was better to take one of the ints. > The other trivial nit is that I would have made these new > if_tx_tso_mss and t_tx_tso_mss fields unsigned. Oops, quite right. I was trying to make sure I didn't break ABI, but of course int and u_int should have the same size no matter what platform we're on. > Thanks so much for doing this! Thanks for reviewing. Updated patch attached with s/int/u_int/ and some other minor cleanups. Can anyone review this for ABI safety? -- Colin Percival Security Officer Emeritus, FreeBSD | The power to serve Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid --------------030204000100060308060202 Content-Type: text/plain; charset=us-ascii; name="tx_tso_mss_2.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="tx_tso_mss_2.patch" Index: netinet/tcp_input.c =================================================================== --- netinet/tcp_input.c (revision 235780) +++ netinet/tcp_input.c (working copy) @@ -3298,6 +3298,7 @@ { int mss = 0; u_long maxmtu = 0; + u_int tx_tso_mss = 0; struct inpcb *inp = tp->t_inpcb; struct hc_metrics_lite metrics; int origoffer; @@ -3321,8 +3322,10 @@ /* Initialize. */ #ifdef INET6 if (isipv6) { - maxmtu = tcp_maxmtu6(&inp->inp_inc, mtuflags); + maxmtu = tcp_maxmtu6_ext(&inp->inp_inc, mtuflags, + &tx_tso_mss); tp->t_maxopd = tp->t_maxseg = V_tcp_v6mssdflt; + tp->t_tx_tso_mss = tx_tso_mss; } #endif #if defined(INET) && defined(INET6) @@ -3330,8 +3333,10 @@ #endif #ifdef INET { - maxmtu = tcp_maxmtu(&inp->inp_inc, mtuflags); + maxmtu = tcp_maxmtu_ext(&inp->inp_inc, mtuflags, + &tx_tso_mss); tp->t_maxopd = tp->t_maxseg = V_tcp_mssdflt; + tp->t_tx_tso_mss = tx_tso_mss; } #endif Index: netinet/tcp_subr.c =================================================================== --- netinet/tcp_subr.c (revision 235780) +++ netinet/tcp_subr.c (working copy) @@ -1708,7 +1708,7 @@ * tcp_mss_update to get the peer/interface MTU. */ u_long -tcp_maxmtu(struct in_conninfo *inc, int *flags) +tcp_maxmtu_ext(struct in_conninfo *inc, int *flags, u_int * tx_tso_mss) { struct route sro; struct sockaddr_in *dst; @@ -1738,15 +1738,26 @@ ifp->if_hwassist & CSUM_TSO) *flags |= CSUM_TSO; } + if (tx_tso_mss != NULL) { + if (ifp->if_capabilities & IFCAP_TSO_MSS) + *tx_tso_mss = ifp->if_tx_tso_mss; + } RTFREE(sro.ro_rt); } return (maxmtu); } + +u_long +tcp_maxmtu(struct in_conninfo *inc, int *flags) +{ + + return (tcp_maxmtu_ext(inc, flags, NULL)); +} #endif /* INET */ #ifdef INET6 u_long -tcp_maxmtu6(struct in_conninfo *inc, int *flags) +tcp_maxmtu6_ext(struct in_conninfo *inc, int *flags, u_int * tx_tso_mss) { struct route_in6 sro6; struct ifnet *ifp; @@ -1775,11 +1786,22 @@ ifp->if_hwassist & CSUM_TSO) *flags |= CSUM_TSO; } + if (tx_tso_mss != NULL) { + if (ifp->if_capabilities & IFCAP_TSO_MSS) + *tx_tso_mss = ifp->if_tx_tso_mss; + } RTFREE(sro6.ro_rt); } return (maxmtu); } + +u_long +tcp_maxmtu6(struct in_conninfo *inc, int *flags) +{ + + return (tcp_maxmtu6_ext(inc, flags, NULL)); +} #endif /* INET6 */ #ifdef IPSEC Index: netinet/tcp_var.h =================================================================== --- netinet/tcp_var.h (revision 235780) +++ netinet/tcp_var.h (working copy) @@ -208,7 +208,9 @@ u_int t_keepintvl; /* interval between keepalives */ u_int t_keepcnt; /* number of keepalives before close */ - uint32_t t_ispare[8]; /* 5 UTO, 3 TBD */ + uint32_t t_ispare[5]; /* 5 UTO */ + uint32_t t_tx_tso_mss; /* max segment size for TSO offload */ + uint32_t t_ispare2[2]; /* 2 TBD */ void *t_pspare2[4]; /* 4 TBD */ uint64_t _pad[6]; /* 6 TBD (1-2 CC/RTT?) */ }; @@ -674,7 +676,9 @@ #endif void tcp_input(struct mbuf *, int); u_long tcp_maxmtu(struct in_conninfo *, int *); +u_long tcp_maxmtu_ext(struct in_conninfo *, int *, u_int *); u_long tcp_maxmtu6(struct in_conninfo *, int *); +u_long tcp_maxmtu6_ext(struct in_conninfo *, int *, u_int *); void tcp_mss_update(struct tcpcb *, int, int, struct hc_metrics_lite *, int *); void tcp_mss(struct tcpcb *, int); Index: netinet/tcp_output.c =================================================================== --- netinet/tcp_output.c (revision 235780) +++ netinet/tcp_output.c (working copy) @@ -748,6 +748,15 @@ } /* + * Some drivers want an even shorter limit to the + * length sent via TSO; respect their wishes. + */ + if (tp->t_tx_tso_mss != 0 && len > tp->t_tx_tso_mss) { + len = tp->t_tx_tso_mss; + sendalot = 1; + } + + /* * Prevent the last segment from being * fractional unless the send sockbuf can * be emptied. Index: dev/xen/netfront/netfront.c =================================================================== --- dev/xen/netfront/netfront.c (revision 235780) +++ dev/xen/netfront/netfront.c (working copy) @@ -135,6 +135,7 @@ * to mirror the Linux MAX_SKB_FRAGS constant. */ #define MAX_TX_REQ_FRAGS (65536 / PAGE_SIZE + 2) +#define TX_TSO_MSS (MAX_TX_REQ_FRAGS - 2) * MCLBYTES #define RX_COPY_THRESHOLD 256 @@ -2040,6 +2041,9 @@ if (val) { np->xn_ifp->if_capabilities |= IFCAP_TSO4|IFCAP_LRO; printf(" feature-gso-tcp4"); + + np->xn_ifp->if_capabilities |= IFCAP_TSO_MSS; + np->xn_ifp->if_tx_tso_mss = TX_TSO_MSS; } printf("\n"); Index: net/if.h =================================================================== --- net/if.h (revision 235780) +++ net/if.h (working copy) @@ -230,6 +230,7 @@ #define IFCAP_VLAN_HWTSO 0x40000 /* can do IFCAP_TSO on VLANs */ #define IFCAP_LINKSTATE 0x80000 /* the runtime link state is dynamic */ #define IFCAP_NETMAP 0x100000 /* netmap mode supported/enabled */ +#define IFCAP_TSO_MSS 0x200000 /* TSO size is limited */ #define IFCAP_HWCSUM (IFCAP_RXCSUM | IFCAP_TXCSUM) #define IFCAP_TSO (IFCAP_TSO4 | IFCAP_TSO6) Index: net/if_var.h =================================================================== --- net/if_var.h (revision 235780) +++ net/if_var.h (working copy) @@ -205,7 +205,8 @@ * be used with care where binary compatibility is required. */ char if_cspare[3]; - int if_ispare[4]; + u_int if_tx_tso_mss; /* if IFCAP_TSO_MSS */ + int if_ispare[3]; void *if_pspare[8]; /* 1 netmap, 7 TDB */ }; --------------030204000100060308060202--