From owner-freebsd-hackers@FreeBSD.ORG Tue May 6 16:07:25 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 486217B5; Tue, 6 May 2014 16:07:25 +0000 (UTC) Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D538C28A; Tue, 6 May 2014 16:07:24 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net) by mail.ipfw.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1Whe03-000HBg-IU; Tue, 06 May 2014 15:57:31 +0400 Message-ID: <53690885.1010704@FreeBSD.org> Date: Tue, 06 May 2014 20:06:29 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.0.1 MIME-Version: 1.0 To: FreeBSD Net , hackers@freebsd.org Subject: Use of contiguous physical memory in ixgbe driver Content-Type: multipart/mixed; boundary="------------060607030109030205060500" Cc: jfv@FreeBSD.org, Adrian Chadd , wollman@freebsd.org, nparhar@gmail.com X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 May 2014 16:07:25 -0000 This is a multi-part message in MIME format. --------------060607030109030205060500 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hello guys. (bootstrapping people involved in previous version of this topic, sorry for that) There were several problem descriptions/discussions on using 9k+ mbufs with current allocator in: if_em: kern/183381 cxgbe: http://lists.freebsd.org/pipermail/freebsd-net/2014-February/037834.html general one: http://lists.freebsd.org/pipermail/freebsd-net/2014-January/037673.html I'd like to add ixgbe (and i40e with igb) to the list. We're facing the same problem for a long time. As far as I can understand, a) everyone (tm) is aware of current 9/16k allocation problems leading to sudden network failures. b) such mbufs sizes are not absolute evil and can be useful on 40/100G and for TSO cases. c) however, no one is able to / willing to fix our allocator to pre-allocate special arena for mbufs >= 4k page size. d) so most people have written their own local hacks to disable 9k mbufs and use 4k ones. e) our list is not full, people with mellanox/solarflare/broadcom/emulex/etc are still not there (and most if not all 10g NICs support scatter/gather). Can we add more generic hack moving default mbuf size decision from NIC driver to OS and make it tunable for user? Example path for Intel ones is attached. --------------060607030109030205060500 Content-Type: text/x-patch; name="mbuf_sizes.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="mbuf_sizes.diff" Index: sys/kern/kern_mbuf.c =================================================================== --- sys/kern/kern_mbuf.c (revision 265236) +++ sys/kern/kern_mbuf.c (working copy) @@ -103,6 +103,11 @@ int nmbjumbop; /* limits number of page size jum int nmbjumbo9; /* limits number of 9k jumbo clusters */ int nmbjumbo16; /* limits number of 16k jumbo clusters */ +static int nojumbobuf; /* Use MCLBYTES mbufs */ +static int nojumbo9buf; /* Use either MCLBYTES or MJUMPAGESIZE */ +static int nojumbo16buf; /* Use any mbuf size less than MJUM16BYTES */ + + static quad_t maxmbufmem; /* overall real memory limit for all mbufs */ SYSCTL_QUAD(_kern_ipc, OID_AUTO, maxmbufmem, CTLFLAG_RDTUN, &maxmbufmem, 0, @@ -151,6 +156,17 @@ tunable_mbinit(void *dummy) if (nmbufs < nmbclusters + nmbjumbop + nmbjumbo9 + nmbjumbo16) nmbufs = lmax(maxmbufmem / MSIZE / 5, nmbclusters + nmbjumbop + nmbjumbo9 + nmbjumbo16); + + /* + * Defaults to disable 9/16-kbyte pages + */ + nojumbobuf = 0; + nojumbo9buf = 1; + nojumbo16buf = 1; + + TUNABLE_INT_FETCH("kern.ipc.nojumbobuf", &nojumbobuf); + TUNABLE_INT_FETCH("kern.ipc.nojumbo9buf", &nojumbo9buf); + TUNABLE_INT_FETCH("kern.ipc.nojumbo16buf", &nojumbo16buf); } SYSINIT(tunable_mbinit, SI_SUB_KMEM, SI_ORDER_MIDDLE, tunable_mbinit, NULL); @@ -261,6 +277,27 @@ SYSCTL_PROC(_kern_ipc, OID_AUTO, nmbufs, CTLTYPE_I "Maximum number of mbufs allowed"); /* + * Determine the correct mbuf pool + * for given mtu size + */ +int +m_preferredsize(int mtu) +{ + int size; + + if (mtu <= 2048 || nojumbobuf != 0) + size = MCLBYTES; + else if (mtu <= 4096 || nojumbo9buf != 0) + size = MJUMPAGESIZE; + else if (mtu <= 9216 || nojumbo16buf != 0) + size = MJUM9BYTES; + else + size = MJUM16BYTES; + + return (size); +} + +/* * Zones from which we allocate. */ uma_zone_t zone_mbuf; Index: sys/dev/ixgbe/ixgbe.c =================================================================== --- sys/dev/ixgbe/ixgbe.c (revision 265236) +++ sys/dev/ixgbe/ixgbe.c (working copy) @@ -1138,14 +1138,7 @@ ixgbe_init_locked(struct adapter *adapter) ** Determine the correct mbuf pool ** for doing jumbo frames */ - if (adapter->max_frame_size <= 2048) - adapter->rx_mbuf_sz = MCLBYTES; - else if (adapter->max_frame_size <= 4096) - adapter->rx_mbuf_sz = MJUMPAGESIZE; - else if (adapter->max_frame_size <= 9216) - adapter->rx_mbuf_sz = MJUM9BYTES; - else - adapter->rx_mbuf_sz = MJUM16BYTES; + adapter->rx_mbuf_sz = m_preferredsize(adapter->max_frame_size); /* Prepare receive descriptors and buffers */ if (ixgbe_setup_receive_structures(adapter)) { Index: sys/dev/e1000/if_em.c =================================================================== --- sys/dev/e1000/if_em.c (revision 265236) +++ sys/dev/e1000/if_em.c (working copy) @@ -1342,12 +1342,7 @@ em_init_locked(struct adapter *adapter) ** Figure out the desired mbuf ** pool for doing jumbos */ - if (adapter->hw.mac.max_frame_size <= 2048) - adapter->rx_mbuf_sz = MCLBYTES; - else if (adapter->hw.mac.max_frame_size <= 4096) - adapter->rx_mbuf_sz = MJUMPAGESIZE; - else - adapter->rx_mbuf_sz = MJUM9BYTES; + adapter->rx_mbuf_sz = m_preferredsize(adapter->hw.mac.max_frame_size); /* Prepare receive descriptors and buffers */ if (em_setup_receive_structures(adapter)) { Index: sys/dev/e1000/if_igb.c =================================================================== --- sys/dev/e1000/if_igb.c (revision 265236) +++ sys/dev/e1000/if_igb.c (working copy) @@ -1335,12 +1335,7 @@ igb_init_locked(struct adapter *adapter) ** Figure out the desired mbuf pool ** for doing jumbo/packetsplit */ - if (adapter->max_frame_size <= 2048) - adapter->rx_mbuf_sz = MCLBYTES; - else if (adapter->max_frame_size <= 4096) - adapter->rx_mbuf_sz = MJUMPAGESIZE; - else - adapter->rx_mbuf_sz = MJUM9BYTES; + adapter->rx_mbuf_sz = m_preferredsize(adapter->max_frame_size); /* Prepare receive descriptors and buffers */ if (igb_setup_receive_structures(adapter)) { --------------060607030109030205060500--