From owner-freebsd-net@freebsd.org Sat Jul 28 20:33:52 2018 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8A894105DB1A for ; Sat, 28 Jul 2018 20:33:52 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CB655802E6 for ; Sat, 28 Jul 2018 20:33:51 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: by mail-wr1-x42d.google.com with SMTP id c13-v6so8498069wrt.1 for ; Sat, 28 Jul 2018 13:33:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=RFYTk4QGl5LQiRRJrvegQY1S93HAhTCgZNzl9yLCzJ0=; b=P/Zec0PqWvEkYJlRujq++0lFIZ9FtlZ6jAdDt3M37hahy5DU7d7e6OK7E27ENOo71c PN0kFE3b+qsyPXGDI9xLHaCbZy52WBSsms3ydK4PzyzU7v4qvMOXQJVVSssJs9WvZXZ8 FZVjS0tK1vpTBVCRvJbT+4EoEuWtKI7VHh6nfcKJxaFdl8ULGL/fS7Rq0amYPwQNUxhn I8hA1MN8OwP+LBTTsD82PF4Ljaz3NreP+a6qK6d80rbZ7UYcWdTGArjH2nKC20z1swvM A/df2dG6fS7utZf1S9PZmxl8pvpg5IOelyZGVBoJU3yMBCbsQCucM9OrJn39zlPrIEhQ IHiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=RFYTk4QGl5LQiRRJrvegQY1S93HAhTCgZNzl9yLCzJ0=; b=b7PwF3fql4MxteAgZ8kP0cr6raxnyTRy1u+BMI3TPH9r6Dxf49V+zAfiqvM4Eiza4I PqPBXU9+fpUEsHFaqndyAg0V+7lSd5JzTHjukHTZ47o2PK3SvQpBmmuUUyGyxGUNQeuw lTFAX67WLKAPhA8gVhw6AJh0laccZTf7ODDbl9Amyg8r8OTZoNmcK4tKGXkmNF+K8ROW qCGKJDwq0yApfFGLMOlGF7jwOROFdHljOFDiLnf8RYjmW8ZbovgN/wZuhFiiB32HjHjx ZWkYEFsKWQ2XUNKT3bxNJh9+ZxA/m2V9A7cW0BBO7v+8ZJkqZIXFXEyecSJlBA7lhLG7 lXOA== X-Gm-Message-State: AOUpUlF5ffA0o5DEGOuOaOuBDZOsCHLfQ6FUArNdFh3oIJwkFqTbU4sU EPo/4+NnxQkrIjduQruM9WKPrGe555RXQTaUvDhMzA== X-Google-Smtp-Source: AAOMgpfQHc8pKVozrSB1QfaXiQ4HZgrxS/lkVfgRjOi4ZeAr8bZ1cLQ5v7sL8fSDteFkElX5FS3Uwp9SbJXa0LgmUfg= X-Received: by 2002:adf:a64d:: with SMTP id k71-v6mr10358045wrc.78.1532810030094; Sat, 28 Jul 2018 13:33:50 -0700 (PDT) MIME-Version: 1.0 References: <20180727221843.GZ2884@funkthat.com> In-Reply-To: <20180727221843.GZ2884@funkthat.com> From: Adrian Chadd Date: Sat, 28 Jul 2018 13:33:36 -0700 Message-ID: Subject: Re: 9k jumbo clusters To: ryan@ixsystems.com, FreeBSD Net Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.27 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Jul 2018 20:33:52 -0000 On Fri, 27 Jul 2018 at 15:19, John-Mark Gurney wrote: > Ryan Moeller wrote this message on Fri, Jul 27, 2018 at 12:45 -0700: > > There is a long-standing issue with 9k mbuf jumbo clusters in FreeBSD. > > For example: > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183381 > > https://lists.freebsd.org/pipermail/freebsd-net/2013-March/034890.html > > > > This comment suggests the 16k pool does not have the fragmentation > problem: > > https://reviews.freebsd.org/D11560#239462 > > I???m curious whether that has been confirmed. > > > > Is anyone working on the pathological case with 9k jumbo clusters in the > > physical memory allocator? There was an interesting discussion started a > > few years ago but I???m not sure what ever came of it: > > http://docs.freebsd.org/cgi/mid.cgi?21225.20047.947384.390241 > > > > I have seen some work in the direction of avoiding larger than page size > > jumbo clusters in 12-CURRENT. Many existing drivers avoid the 9k cluster > > size already. The code for larger cluster sizes in iflib is #ifdef'd out > > so it maxes out at the page size jumbo clusters until > "CONTIGMALLOC_WORKS" > > (apparently it doesn't). > > > > With all the changes due to iflib, is there any chance some of this will > > get MFC'd to address the serious problem that remains in 11-STABLE? > > > > Otherwise, would it be feasible to disable the use of the 9k cluster pool > > in at least some of the popular NIC drivers as a solution for the stable > > branches? > > > > Finally, I have studied some of the driver code in 11-STABLE and posted > the > > gist of my notes in relation to this problem. If anyone spots a mistake > or > > has something else to contribute, comments on the gist would be greatly > > appreciated! > > https://gist.github.com/freqlabs/eba9b755f17a223260246becfbb150a1 > > Drivers need to be fixed to use 4k pages instead of cluster. I really hope > no one is using a card that can't do 4k pages, or if they are, then they > should get a real card that can do scatter/gather on 4k pages for jumbo > frames.. Yeah but it's 2018 and your server has like minimum a dozen million 4k pages. So if you're doing stuff like lots of network packet kerchunking why not have specialised allocator paths that can do things like "hey, always give me 64k physical contig pages for storage/mbufs because you know what? they're going to be allocated/freed together always." There was always a race between bus bandwidth, memory bandwidth and bus/memory latencies. I'm not currently on the disk/packet pushing side of things, but the last couple times I were it was at different points in that 4d space and almost every single time there was a benefit from having a couple of specialised allocators so you didn't have to try and manage a few dozen million 4k pages based on your changing workload. I enjoy the 4k page size management stuff for my 128MB routers. Your 128G server has a lot of 4k pages. It's a bit silly. -adrian