From owner-freebsd-arch@freebsd.org Sat Nov 14 02:16:43 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B3B392D0D5D for ; Sat, 14 Nov 2020 02:16:43 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CXzVL5jFvz3nmj for ; Sat, 14 Nov 2020 02:16:42 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-qt1-x82b.google.com with SMTP id p12so8657384qtp.7 for ; Fri, 13 Nov 2020 18:16:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6IpkEO4mAzLLgxUbNBPn4Hsk11PvMnIuwcLloBIP6Rg=; b=phzlYqWlQPYNb+zn3YFza2ZAjlAO26aUw34ztQzNtAoTwy73jRd+YU5fxC9oEQaXT9 h0jwZl/RO5H578DH3DPdH/FT3VdP/NmXw/OJTw1Qi8pEIXOqDcMP363lkyQKzQUBwa4B jHpheq1iTJk6QNuyJrcD1huW/0K2y7h4R0Luy5Z3W9JTfE65Oyz5lwa/HbxGxWwDelzy 6nAFU8XAb7bvcjif0Ml4wwvrjNynMJaddkwzA5kZJ2oXFqDT0NpqfaVj+BJpZXnwH2pV v91lSvy/8xWwPYETsSt+3rF1U3xLd6w6gbvohKSZPFhMIXdTjYKqaRzGbEVJ1yWBkFww YgYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6IpkEO4mAzLLgxUbNBPn4Hsk11PvMnIuwcLloBIP6Rg=; b=mctRoR5A/p6wFW+yRjPNXdtVakL0aaj98yOpYtPK/92iuGxfQfvIjDoiSCMmZfZYdX rlWAB/kD8kKlPQodS7ol4KaBsIh+2xNzg3aw2jOPKJAajWac9r0WaduBCVE7yMfdotfX VgDb+ybrWxx5pUK9oZ8rdZdapVi06K1uBC8OAUXhEz1pwinF1JjwZn1PCX/3ExsKbtbB uSuWaTQM1WE36hOPXaSiQCJZijCaYieGB8MSc/dM+PF1GksPwY1fWi6ShJTAU0q6H7Zc rAaComopzKZG2FxqjjDBWVWJFeFQAPTWTqmKetqVu/U4CAc6jKad1e5kSG797w4K3vKO 2Ppw== X-Gm-Message-State: AOAM532mT8sXW0Uf5xtlD5sLzF4wUXvwoJU/XiJRh+AbqaP2AFw2fTkE xVGfRXu3vxwSVKvcuC6BysOAUyoRAcFKTGXmA7/6jAwK145qgw== X-Google-Smtp-Source: ABdhPJzEmB7Lz5V3Dg0+gH1gD+L6Es3KHAOO272x/3G0rbHR3dL7vSCIVZtrOkmCKlXohPDq03YmwUd6KetkfZeyUa4= X-Received: by 2002:ac8:7619:: with SMTP id t25mr4844541qtq.244.1605320201436; Fri, 13 Nov 2020 18:16:41 -0800 (PST) MIME-Version: 1.0 References: <926C3A98-03BF-46FD-9B22-9EFBDC0F44A4@samsco.org> In-Reply-To: <926C3A98-03BF-46FD-9B22-9EFBDC0F44A4@samsco.org> From: Warner Losh Date: Fri, 13 Nov 2020 19:16:30 -0700 Message-ID: Subject: Re: MAXPHYS bump for FreeBSD 13 To: Scott Long Cc: "freebsd-arch@freebsd.org" X-Rspamd-Queue-Id: 4CXzVL5jFvz3nmj X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20150623.gappssmtp.com header.s=20150623 header.b=phzlYqWl; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::82b) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-3.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20150623.gappssmtp.com:s=20150623]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-arch@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; SPAMHAUS_ZRD(0.00)[2607:f8b0:4864:20::82b:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20150623.gappssmtp.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::82b:from]; NEURAL_HAM_SHORT(-1.00)[-0.999]; R_SPF_NA(0.00)[no SPF record]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RBL_DBL_DONT_QUERY_IPS(0.00)[2607:f8b0:4864:20::82b:from]; RCVD_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; MAILMAN_DEST(0.00)[freebsd-arch] Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.34 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 02:16:43 -0000 On Fri, Nov 13, 2020 at 6:23 PM Scott Long wrote: > I have mixed feelings on this. The Netflix workload isn=E2=80=99t typica= l, and > this > change represents a fairly substantial increase in memory usage for > bufs. It=E2=80=99s also a config tunable, so it=E2=80=99s not like this = represents a > meaningful > diff reduction for Netflix. > This isn't motivated at all by Netflix's work load nor any needs to minimize diffs at all. In fact, Netflix had nothing to do with the proposal apart from me writing it up. This is motivated more by the needs of more people to do larger I/Os than 128k, though maybe 1MB is too large. Alexander Motin proposed it today during the Vendor Summit and I wrote up the idea for arch@. The upside is that it will likely help benchmarks out of the box. Is that > enough of an upside for the downsides of memory pressure on small memory > and high iops systems? I=E2=80=99m not convinced. I really would like t= o see the > years of talk about fixing this correctly put into action. > I'd love years of inaction to end too. I'd also like FreeBSD to perform a bit better out of the box. Would your calculation have changed had the size been 256k or 512k? Both those options use/waste substantially fewer bytes per I/O than 1MB. Warner > Scott > > > > On Nov 13, 2020, at 11:33 AM, Warner Losh wrote: > > > > Greetings, > > > > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os > that > > we normally use (though there are exceptions). > > > > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping > > DFLTPHYS to 1MB. > > > > 128k was good back in the 90s/2000s when memory was smaller, drives did > > smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O > > devices can easily do 1MB or more and there's performance benefits from > > scheduling larger I/Os. > > > > Bumping this will mean larger struct buf and struct bio. Without some > > concerted effort, it's hard to make this be a sysctl tunable. While > that's > > desirable, perhaps, it shouldn't gate this bump. The increase in size f= or > > 1MB is modest enough. > > > > The NVMe driver currently is limited to 1MB transfers due to limitation= s > in > > the NVMe scatter gather lists and a desire to preallocate as much as > > possible up front. Most NVMe drivers have maximum transfer sizes betwee= n > > 128k and 1MB, with larger being the trend. > > > > The mp[rs] drivers can use larger MAXPHYS, though resource limitations = on > > some cards hamper bumping it beyond about 2MB. > > > > The AHCI driver is happy with 1MB and larger sizes. > > > > Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too > large > > even for our needs due to limiting factors in the upper layers making i= t > > hard to schedule I/Os larger than 3-4MB reliably. > > > > So this should be a relatively low risk, and high benefit. > > > > I don't think other kernel tunables need to change, but I always run in= to > > trouble with runningbufs :) > > > > Comments? Anything I forgot? > > > > Warner > > _______________________________________________ > > freebsd-arch@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > >