From owner-freebsd-net@freebsd.org  Sat Jul 28 20:33:52 2018
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8A894105DB1A
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Sat, 28 Jul 2018 20:33:52 +0000 (UTC)
 (envelope-from adrian.chadd@gmail.com)
Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com
 [IPv6:2a00:1450:4864:20::42d])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id CB655802E6
 for <freebsd-net@freebsd.org>; Sat, 28 Jul 2018 20:33:51 +0000 (UTC)
 (envelope-from adrian.chadd@gmail.com)
Received: by mail-wr1-x42d.google.com with SMTP id c13-v6so8498069wrt.1
 for <freebsd-net@freebsd.org>; Sat, 28 Jul 2018 13:33:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
 bh=RFYTk4QGl5LQiRRJrvegQY1S93HAhTCgZNzl9yLCzJ0=;
 b=P/Zec0PqWvEkYJlRujq++0lFIZ9FtlZ6jAdDt3M37hahy5DU7d7e6OK7E27ENOo71c
 PN0kFE3b+qsyPXGDI9xLHaCbZy52WBSsms3ydK4PzyzU7v4qvMOXQJVVSssJs9WvZXZ8
 FZVjS0tK1vpTBVCRvJbT+4EoEuWtKI7VHh6nfcKJxaFdl8ULGL/fS7Rq0amYPwQNUxhn
 I8hA1MN8OwP+LBTTsD82PF4Ljaz3NreP+a6qK6d80rbZ7UYcWdTGArjH2nKC20z1swvM
 A/df2dG6fS7utZf1S9PZmxl8pvpg5IOelyZGVBoJU3yMBCbsQCucM9OrJn39zlPrIEhQ
 IHiw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to;
 bh=RFYTk4QGl5LQiRRJrvegQY1S93HAhTCgZNzl9yLCzJ0=;
 b=b7PwF3fql4MxteAgZ8kP0cr6raxnyTRy1u+BMI3TPH9r6Dxf49V+zAfiqvM4Eiza4I
 PqPBXU9+fpUEsHFaqndyAg0V+7lSd5JzTHjukHTZ47o2PK3SvQpBmmuUUyGyxGUNQeuw
 lTFAX67WLKAPhA8gVhw6AJh0laccZTf7ODDbl9Amyg8r8OTZoNmcK4tKGXkmNF+K8ROW
 qCGKJDwq0yApfFGLMOlGF7jwOROFdHljOFDiLnf8RYjmW8ZbovgN/wZuhFiiB32HjHjx
 ZWkYEFsKWQ2XUNKT3bxNJh9+ZxA/m2V9A7cW0BBO7v+8ZJkqZIXFXEyecSJlBA7lhLG7
 lXOA==
X-Gm-Message-State: AOUpUlF5ffA0o5DEGOuOaOuBDZOsCHLfQ6FUArNdFh3oIJwkFqTbU4sU
 EPo/4+NnxQkrIjduQruM9WKPrGe555RXQTaUvDhMzA==
X-Google-Smtp-Source: AAOMgpfQHc8pKVozrSB1QfaXiQ4HZgrxS/lkVfgRjOi4ZeAr8bZ1cLQ5v7sL8fSDteFkElX5FS3Uwp9SbJXa0LgmUfg=
X-Received: by 2002:adf:a64d:: with SMTP id
 k71-v6mr10358045wrc.78.1532810030094; 
 Sat, 28 Jul 2018 13:33:50 -0700 (PDT)
MIME-Version: 1.0
References: <EBDE6EDD-D875-43D8-8D65-1F1344A6B817@ixsystems.com>
 <20180727221843.GZ2884@funkthat.com>
In-Reply-To: <20180727221843.GZ2884@funkthat.com>
From: Adrian Chadd <adrian.chadd@gmail.com>
Date: Sat, 28 Jul 2018 13:33:36 -0700
Message-ID: <CAJ-VmomHQ+zcJ+HXAjMg9aS1RPZsdHy0tYjdKzjpwrUY+05NiQ@mail.gmail.com>
Subject: Re: 9k jumbo clusters
To: ryan@ixsystems.com, FreeBSD Net <freebsd-net@freebsd.org>
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.27
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 28 Jul 2018 20:33:52 -0000

On Fri, 27 Jul 2018 at 15:19, John-Mark Gurney <jmg@funkthat.com> wrote:

> Ryan Moeller wrote this message on Fri, Jul 27, 2018 at 12:45 -0700:
> > There is a long-standing issue with 9k mbuf jumbo clusters in FreeBSD.
> > For example:
> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183381
> > https://lists.freebsd.org/pipermail/freebsd-net/2013-March/034890.html
> >
> > This comment suggests the 16k pool does not have the fragmentation
> problem:
> > https://reviews.freebsd.org/D11560#239462
> > I???m curious whether that has been confirmed.
> >
> > Is anyone working on the pathological case with 9k jumbo clusters in the
> > physical memory allocator?  There was an interesting discussion started a
> > few years ago but I???m not sure what ever came of it:
> > http://docs.freebsd.org/cgi/mid.cgi?21225.20047.947384.390241
> >
> > I have seen some work in the direction of avoiding larger than page size
> > jumbo clusters in 12-CURRENT.  Many existing drivers avoid the 9k cluster
> > size already.  The code for larger cluster sizes in iflib is #ifdef'd out
> > so it maxes out at the page size jumbo clusters until
> "CONTIGMALLOC_WORKS"
> > (apparently it doesn't).
> >
> > With all the changes due to iflib, is there any chance some of this will
> > get MFC'd to address the serious problem that remains in 11-STABLE?
> >
> > Otherwise, would it be feasible to disable the use of the 9k cluster pool
> > in at least some of the popular NIC drivers as a solution for the stable
> > branches?
> >
> > Finally, I have studied some of the driver code in 11-STABLE and posted
> the
> > gist of my notes in relation to this problem.  If anyone spots a mistake
> or
> > has something else to contribute, comments on the gist would be greatly
> > appreciated!
> > https://gist.github.com/freqlabs/eba9b755f17a223260246becfbb150a1
>
> Drivers need to be fixed to use 4k pages instead of cluster.  I really hope
> no one is using a card that can't do 4k pages, or if they are, then they
> should get a real card that can do scatter/gather on 4k pages for jumbo
> frames..


Yeah but it's 2018 and your server has like minimum a dozen million 4k
pages.

So if you're doing stuff like lots of network packet kerchunking why not
have specialised allocator paths that can do things like "hey, always give
me 64k physical contig pages for storage/mbufs because you know what?
they're going to be allocated/freed together always."

There was always a race between bus bandwidth, memory bandwidth and
bus/memory latencies. I'm not currently on the disk/packet pushing side of
things, but the last couple times I were it was at different points in that
4d space and almost every single time there was a benefit from having a
couple of specialised allocators so you didn't have to try and manage a few
dozen million 4k pages based on your changing workload.

I enjoy the 4k page size management stuff for my 128MB routers. Your 128G
server has a lot of 4k pages. It's a bit silly.


-adrian