From owner-freebsd-net@FreeBSD.ORG Thu Jan 30 13:45:30 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B70737C9 for ; Thu, 30 Jan 2014 13:45:30 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 3712B1E50 for ; Thu, 30 Jan 2014 13:45:30 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id s0UDjKQm032687 for ; Thu, 30 Jan 2014 15:45:20 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua s0UDjKQm032687 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id s0UDjKXU032686 for freebsd-net@freebsd.org; Thu, 30 Jan 2014 15:45:20 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 30 Jan 2014 15:45:19 +0200 From: Konstantin Belousov To: FreeBSD Net Subject: Re: Big physically contiguous mbuf clusters Message-ID: <20140130134519.GU24664@kib.kiev.ua> References: <21225.20047.947384.390241@khavrinen.csail.mit.edu> <20140129231121.GA18434@ox> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="oGy11dVowAZA6eXT" Content-Disposition: inline In-Reply-To: <20140129231121.GA18434@ox> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jan 2014 13:45:30 -0000 --oGy11dVowAZA6eXT Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jan 29, 2014 at 03:11:21PM -0800, Navdeep Parhar wrote: > On Wed, Jan 29, 2014 at 02:21:21PM -0800, Adrian Chadd wrote: > > Hi, > >=20 > > On 29 January 2014 10:54, Garrett Wollman wrote: > > > Resolved: that mbuf clusters longer than one page ought not be > > > supported. There is too much physical-memory fragmentation for them > > > to be of use on a moderately active server. 9k mbufs are especially > > > bad, since in the fragmented case they waste 3k per allocation. > >=20 > > I've been wondering whether it'd be feasible to teach the physical > > memory allocator about >page sized allocations and to create zones of > > slightly more physically contiguous memory. >=20 > I think this would be very useful. For example, a zone_jumbo32 would > hit a sweet spot -- enough to fit 3 jumbo frames and some loose change > for metadata. I'd like to see us improve our allocators and VM system > to work better with larger contiguous allocations, rather than > deprecating the larger zones. It seems backwards to push towards > smaller allocation units when installed physical memory in a typical > system continues to rise. >=20 > Allocating 3 x 4K instead of 1 x 9K for a jumbo means 3x the number of > vtophys translations, 3x the phys_addr/len traffic on the PCIe bus > (scatter list has to be fed to the chip and now it's 3x what it has to > be), 3x the number of "wrapper" mbuf allocations (one for each 4K > cluster) which will then be stitched together to form a frame, etc. etc. If the platform supports IOMMU, then physical contiguity of the pages could be ignored, since with proper busdma tag VT-d driver allocates continous bus address space for device view mapping. Of course, this is moot right now due to drivers have no idea about IOMMU presence, and since IOMMU busdma both disabled by default and having non-trivial setup cost. >=20 > Regards, > Navdeep >=20 > >=20 > > For servers with lots of memory we could then keep these around and > > only dip into them for temporary allocations (eg not VM pages that may > > be held for some unknown amount of time.) > >=20 > > Question is - can we enforce that kind of behaviour? > >=20 > >=20 > >=20 > > -a > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" --oGy11dVowAZA6eXT Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJS6ldvAAoJEJDCuSvBvK1BIO4P/jdUPJCR+HAGTKErDa5QXf2s EKVRLoHGl/wRiI6gv5G3fbh6ZdHAQLSD29iyOecpTMmv0CtTLyk19y9nVB0VdyhB cBKYkDOUZab1jksREKKjUlf+OGWpdTuG7FZ3pQki6VKXB81zmDN0aOijCM4H+poU lrG6OWIcU7nlbiAhRqA7rdSdMOKTvtbWBSWfOFzXmlYi374PJXYZOC2foRZXrfAw G972p18FZnbOTsZ3SO91NmYxTWQ5C6qxufmEjg9OuG38YWFPa80l17c5A3WDKZHt 0F2Ujh970vwynmNRaPq2rX4d/QE9jbtm1qAhDueKoppE1pbLnlORWz5DPSZMzXDx H8ei0BP/fPXTt5IILHlcMyRBuTTharDGB0UjhOZI0ruD4cWCmkZcoQxraMoe4iWQ EpiQsGbcXB0VzbG5mAp9KuPt8By0gWf94NLkwva7Z21N2u+SWLvQdUr/k5N0oAqL mAoMuFc//tzpC8i9/73R+yjMkoiaGZCB3X42OtofLwdWQkINrMdOy4k1ZyI8s2kS 5w9W2QXJPlIOxkc+WCF+MKhX/2RGNvuKLNN+LfbffYIzIDuFsEknoPEuIyVIjPM/ XZP6gMcHIMYS/a18vQzx0h9BnlsV1TnIU0759iGsDS1AT12NPwF8L4TIf0TD+nRr d+K2ZRxD20R/TptSfghN =X67A -----END PGP SIGNATURE----- --oGy11dVowAZA6eXT--