From owner-freebsd-net@FreeBSD.ORG Sun Jun 3 16:51:19 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B3D5F1065672 for ; Sun, 3 Jun 2012 16:51:19 +0000 (UTC) (envelope-from bounces+73574-866e-freebsd-net=freebsd.org@sendgrid.me) Received: from o3.shared.sendgrid.net (o3.shared.sendgrid.net [208.117.48.85]) by mx1.freebsd.org (Postfix) with SMTP id 6B5628FC16 for ; Sun, 3 Jun 2012 16:51:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sendgrid.info; h= message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=smtpapi; bh=Mi9/DbovhHiry6yybyyzN4P f6pM=; b=wyv0B/FvMStarTwgBKIIqLyFk7tmuJnyljejr1P0gEPu4oY090Gk0Zq oOrovMl/y7IzUf8qozm457goCr/OSIeapuhb/cmuaQvWcm7zPXgQdkO62d8WEhsC IgH//xYq3g2Flpf0CHMBd1+c2FwGhonD1hZk5kqYjRURV1hbu148= Received: by 10.41.149.156 with SMTP id f04-21.20172.4FCB960111 Sun, 03 Jun 2012 16:51:13 +0000 (UTC) Received: from mail.tarsnap.com (unknown [10.8.49.124]) by mi4 (SG) with ESMTP id 4fcb9601.461d.3bec0c2 for ; Sun, 03 Jun 2012 11:51:13 -0500 (CST) Received: (qmail 32518 invoked from network); 3 Jun 2012 16:43:31 -0000 Received: from unknown (HELO clamshell.daemonology.net) (127.0.0.1) by mail.tarsnap.com with ESMTP; 3 Jun 2012 16:43:31 -0000 Received: (qmail 32186 invoked from network); 3 Jun 2012 16:51:02 -0000 Received: from unknown (HELO clamshell.daemonology.net) (127.0.0.1) by clamshell.daemonology.net with SMTP; 3 Jun 2012 16:51:02 -0000 Message-ID: <4FCB95F6.30204@freebsd.org> Date: Sun, 03 Jun 2012 09:51:02 -0700 From: Colin Percival User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120509 Thunderbird/12.0.1 MIME-Version: 1.0 To: Andrew Gallatin References: <4FC635CC.5030608@freebsd.org> <4FC63D27.70807@cs.duke.edu> In-Reply-To: <4FC63D27.70807@cs.duke.edu> X-Enigmail-Version: 1.5pre Content-Type: multipart/mixed; boundary="------------010705070900080106050908" X-Sendgrid-EID: ISGKkmHlRE12gnWy0TjFyGQSFR1WnpjQdICm3qu6YxHlg7RAud3TTYZr9QVSeo9IJ9+l/mitYMw78g4lVdZtunuW9cVVCosro439GFVLCwSHePeBYXzH9/Yvm2gDY0ThL8NkaxEdOMCcoklYGaoXQcD3wzUI9n5XW8h2+C0RrxQ= Cc: freebsd-net@freebsd.org Subject: Re: [please review] TSO mbuf chain length limiting patch X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Jun 2012 16:51:19 -0000 This is a multi-part message in MIME format. --------------010705070900080106050908 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 05/30/12 08:30, Andrew Gallatin wrote: > On 05/30/12 10:59, Colin Percival wrote: >> The Xen virtual network interface has an issue (ok, really the issue is with >> the linux back-end, but that's what most people are using) where it can't >> handle scatter-gather writes with lots of pieces, aka. long mbuf chains. >> This currently bites us hard with TSO enabled, since it produces said long >> mbuf chains. > > I think a better approach would be to have a limit on the size of the > pre-segmented TCP payload size sent to the driver. I tend to think > that this would be more generically useful, and it is a better match > for the NDIS APIs, where a driver must specify the max TSO size. I > think the changes to the TCP stack might be simpler (eg, they > would seem to jive better with the existing "maxmtu" approach). > > I think this could work for you as well. You could set the Xen max > tso size to be 32K (derived from 18 pages/skb, multiplied by a typical > 2KB mbuf size, with some slack built in). If the chain was too large, > you could m_defrag it down to size. I've attached a new patch which: 1. adds a IFCAP_TSO_MSS "capability" and a if_tx_tso_mss field to struct ifnet, 2. sets these in netfront when the IFCAP_TSO4 flag is set, 3. extends tcp_maxmtu to read this value, 4. adds a tx_tso_mss field to struct tcpcb, 5. makes tcp_mss_update set tx_tso_mss using tcp_maxmtu, and 6. limits TSO lengths to tx_tso_mss in tcp_output. -- Colin Percival Security Officer Emeritus, FreeBSD | The power to serve Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid --------------010705070900080106050908 Content-Type: text/plain; charset=us-ascii; name="tx_tso_mss.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="tx_tso_mss.patch" Index: netinet/tcp_input.c =================================================================== --- netinet/tcp_input.c (revision 235780) +++ netinet/tcp_input.c (working copy) @@ -3298,6 +3298,7 @@ { int mss = 0; u_long maxmtu = 0; + int tx_tso_mss = 0; struct inpcb *inp = tp->t_inpcb; struct hc_metrics_lite metrics; int origoffer; @@ -3321,8 +3322,10 @@ /* Initialize. */ #ifdef INET6 if (isipv6) { - maxmtu = tcp_maxmtu6(&inp->inp_inc, mtuflags); + maxmtu = tcp_maxmtu6_ext(&inp->inp_inc, mtuflags, + &tx_tso_mss); tp->t_maxopd = tp->t_maxseg = V_tcp_v6mssdflt; + tp->t_tx_tso_mss = tx_tso_mss; } #endif #if defined(INET) && defined(INET6) @@ -3330,8 +3333,10 @@ #endif #ifdef INET { - maxmtu = tcp_maxmtu(&inp->inp_inc, mtuflags); + maxmtu = tcp_maxmtu_ext(&inp->inp_inc, mtuflags, + &tx_tso_mss); tp->t_maxopd = tp->t_maxseg = V_tcp_mssdflt; + tp->t_tx_tso_mss = tx_tso_mss; } #endif Index: netinet/tcp_subr.c =================================================================== --- netinet/tcp_subr.c (revision 235780) +++ netinet/tcp_subr.c (working copy) @@ -1708,7 +1708,7 @@ * tcp_mss_update to get the peer/interface MTU. */ u_long -tcp_maxmtu(struct in_conninfo *inc, int *flags) +tcp_maxmtu_ext(struct in_conninfo *inc, int *flags, int * tx_tso_mss) { struct route sro; struct sockaddr_in *dst; @@ -1738,15 +1738,26 @@ ifp->if_hwassist & CSUM_TSO) *flags |= CSUM_TSO; } + if (tx_tso_mss != NULL) { + if (ifp->if_capabilities & IFCAP_TSO_MSS) + *tx_tso_mss = ifp->if_tx_tso_mss; + } RTFREE(sro.ro_rt); } return (maxmtu); } + +u_long +tcp_maxmtu(struct in_conninfo *inc, int *flags) +{ + + return (tcp_maxmtu_ext(inc, flags, NULL)); +} #endif /* INET */ #ifdef INET6 u_long -tcp_maxmtu6(struct in_conninfo *inc, int *flags) +tcp_maxmtu6_ext(struct in_conninfo *inc, int *flags, int * tx_tso_mss) { struct route_in6 sro6; struct ifnet *ifp; @@ -1775,11 +1786,22 @@ ifp->if_hwassist & CSUM_TSO) *flags |= CSUM_TSO; } + if (tx_tso_mss != NULL) { + if (ifp->if_capabilities & IFCAP_TSO_MSS) + *tx_tso_mss = ifp->if_tx_tso_mss; + } RTFREE(sro6.ro_rt); } return (maxmtu); } + +u_long +tcp_maxmtu6(struct in_conninfo *inc, int *flags) +{ + + return (tcp_maxmtu6_ext(inc, flags, NULL)); +} #endif /* INET6 */ #ifdef IPSEC Index: netinet/tcp_var.h =================================================================== --- netinet/tcp_var.h (revision 235780) +++ netinet/tcp_var.h (working copy) @@ -208,7 +208,8 @@ u_int t_keepintvl; /* interval between keepalives */ u_int t_keepcnt; /* number of keepalives before close */ - uint32_t t_ispare[8]; /* 5 UTO, 3 TBD */ + uint32_t t_tx_tso_mss; /* max segment size for TSO offload */ + uint32_t t_ispare[7]; /* 5 UTO, 3 TBD */ void *t_pspare2[4]; /* 4 TBD */ uint64_t _pad[6]; /* 6 TBD (1-2 CC/RTT?) */ }; @@ -674,7 +675,9 @@ #endif void tcp_input(struct mbuf *, int); u_long tcp_maxmtu(struct in_conninfo *, int *); +u_long tcp_maxmtu_ext(struct in_conninfo *, int *, int *); u_long tcp_maxmtu6(struct in_conninfo *, int *); +u_long tcp_maxmtu6_ext(struct in_conninfo *, int *, int *); void tcp_mss_update(struct tcpcb *, int, int, struct hc_metrics_lite *, int *); void tcp_mss(struct tcpcb *, int); Index: netinet/tcp_output.c =================================================================== --- netinet/tcp_output.c (revision 235780) +++ netinet/tcp_output.c (working copy) @@ -748,6 +748,15 @@ } /* + * Some drivers want an even shorter limit to the + * length sent via TSO; respect their wishes. + */ + if (tp->t_tx_tso_mss != 0 && len > tp->t_tx_tso_mss) { + len = tp->t_tx_tso_mss; + sendalot = 1; + } + + /* * Prevent the last segment from being * fractional unless the send sockbuf can * be emptied. Index: dev/xen/netfront/netfront.c =================================================================== --- dev/xen/netfront/netfront.c (revision 235780) +++ dev/xen/netfront/netfront.c (working copy) @@ -135,6 +135,7 @@ * to mirror the Linux MAX_SKB_FRAGS constant. */ #define MAX_TX_REQ_FRAGS (65536 / PAGE_SIZE + 2) +#define TX_TSO_MSS (MAX_TX_REQ_FRAGS - 2) * MCLBYTES #define RX_COPY_THRESHOLD 256 @@ -2040,6 +2041,9 @@ if (val) { np->xn_ifp->if_capabilities |= IFCAP_TSO4|IFCAP_LRO; printf(" feature-gso-tcp4"); + + np->xn_ifp->if_capabilities |= IFCAP_TSO_MSS; + np->xn_ifp->if_tx_tso_mss = TX_TSO_MSS; } printf("\n"); Index: net/if.h =================================================================== --- net/if.h (revision 235780) +++ net/if.h (working copy) @@ -230,6 +230,7 @@ #define IFCAP_VLAN_HWTSO 0x40000 /* can do IFCAP_TSO on VLANs */ #define IFCAP_LINKSTATE 0x80000 /* the runtime link state is dynamic */ #define IFCAP_NETMAP 0x100000 /* netmap mode supported/enabled */ +#define IFCAP_TSO_MSS 0x200000 /* TSO size is limited */ #define IFCAP_HWCSUM (IFCAP_RXCSUM | IFCAP_TXCSUM) #define IFCAP_TSO (IFCAP_TSO4 | IFCAP_TSO6) Index: net/if_var.h =================================================================== --- net/if_var.h (revision 235780) +++ net/if_var.h (working copy) @@ -205,7 +205,8 @@ * be used with care where binary compatibility is required. */ char if_cspare[3]; - int if_ispare[4]; + int if_tx_tso_mss; /* if IFCAP_TSO_MSS */ + int if_ispare[3]; void *if_pspare[8]; /* 1 netmap, 7 TDB */ }; --------------010705070900080106050908--