From owner-freebsd-arch@freebsd.org Mon Nov 9 16:59:23 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 5D1FE4680BF for ; Mon, 9 Nov 2020 16:59:23 +0000 (UTC) (envelope-from khng300@gmail.com) Received: from mail-ot1-x329.google.com (mail-ot1-x329.google.com [IPv6:2607:f8b0:4864:20::329]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CVHK62lMCz3JlG for ; Mon, 9 Nov 2020 16:59:22 +0000 (UTC) (envelope-from khng300@gmail.com) Received: by mail-ot1-x329.google.com with SMTP id j14so9620976ots.1 for ; Mon, 09 Nov 2020 08:59:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=D66GAwhM2bO42b/nfdG2cQroR73+aqoHPW6MjlQLMyU=; b=cpB5Da48xGO3k2BgbFG5Iql913hoZvAMoIB38OKd9+iPfgwutSsyXQccdAwuRJTLhc QU6s/lik3Bdk++0r5zHL1nRDBt88r9/vYqBwGsGALdANUDLBhrrIN8GyimVijA7ooFi0 Hp3JvmpM5z5jnWzAkvQFRRZMkqRnj/X0Qx0WbtXI0wiCFwarSDgW0R5xe3+Qc/a+jky4 YsbtwtfjFyb8YRaOBibJObxvKMNMr1H5CgJou55IZkT1WqF4/uVTpzbJhPSIXAE2ti9C NaW4iND8i2gVD7uaf62GS1XzX31/QcO5fMS0cE2MBJGPI26jLkxlGU678A9kccQF4uJn Gp7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=D66GAwhM2bO42b/nfdG2cQroR73+aqoHPW6MjlQLMyU=; b=Aqiq1C60EJXHvAUOVVv8VRTyZGeFYfLVcjXuoCfJbydtD2c69uOIZ0XsmlDDYVprGP QjGanJzTTOOVNIS4s4ivxe7eSjIjtWHlHCIHsKSMwxBJRdM+HdIbcuJDoNVMSAfIbOuA dyvwgUPaOjfsgAUblNQhpEEKBO09KpPsodXywW+9vJpwz4ZltXkfzE+lOXYenQZj7i+c 3bkTMs9SU0dDgHXS6Dz3t9Mqsxa8M9KFOQBXvEGkaXbtts9dkWQJEH2H2xbBmZYrgTec 4tKH8TJioZFLLsmpaFI49Z+3JFMnF3Tf9b8W3h6Q0yt9SBZrvVZE0894T52L+GtgHnyh YspA== X-Gm-Message-State: AOAM530S/RLZVJj5U1iq/hrCr0dzGGpZBLmf20UEnDjLE+1EhM3WzPhj PkVsJ03xEe0kJS9FxFCMmxUQvjVetMbBLpX+sf7PEZM74XMloQ== X-Google-Smtp-Source: ABdhPJweD5dXILNwpjeS13W5jTJTjQs13MfTo65EthCYelNqouuh5zsGZGEFQCtosJr7NkZkmXCWvf1XLYUt8WMMLTA= X-Received: by 2002:a9d:2f09:: with SMTP id h9mr11217756otb.186.1604941160648; Mon, 09 Nov 2020 08:59:20 -0800 (PST) MIME-Version: 1.0 From: Ka Ho Ng Date: Tue, 10 Nov 2020 00:59:09 +0800 Message-ID: Subject: A draft of file-sparsifying/hole-punching/deallocation for fzero API To: freebsd-arch@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4CVHK62lMCz3JlG X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=cpB5Da48; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of khng300@gmail.com designates 2607:f8b0:4864:20::329 as permitted sender) smtp.mailfrom=khng300@gmail.com X-Spamd-Result: default: False [-2.20 / 15.00]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_NONE(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RBL_DBL_DONT_QUERY_IPS(0.00)[2607:f8b0:4864:20::329:from]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-arch@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; SPAMHAUS_ZRD(0.00)[2607:f8b0:4864:20::329:from:127.0.2.255]; NEURAL_SPAM_SHORT(0.80)[0.795]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::329:from]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-arch] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Nov 2020 16:59:23 -0000 I am working out APIs which allow turning a range of a file into sparse. The link to the branch is here: https://github.com/khng300/freebsd/tree/khng/current/fdealloc . The draft describing the current shape of the API is here: https://hackmd.io/xQolDBDMQKq5_0n_Ej1WfQ?view Thanks, Ka Ho From owner-freebsd-arch@freebsd.org Mon Nov 9 22:13:00 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 19AB246ECF3 for ; Mon, 9 Nov 2020 22:13:00 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from forward501p.mail.yandex.net (forward501p.mail.yandex.net [77.88.28.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CVQGy41R0z4R5M; Mon, 9 Nov 2020 22:12:58 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from mxback9j.mail.yandex.net (mxback9j.mail.yandex.net [IPv6:2a02:6b8:0:1619::112]) by forward501p.mail.yandex.net (Yandex) with ESMTP id 9469535000A9; Tue, 10 Nov 2020 01:12:55 +0300 (MSK) Received: from localhost (localhost [::1]) by mxback9j.mail.yandex.net (mxback/Yandex) with ESMTP id jZnDkzV2bv-CsPGtC65; Tue, 10 Nov 2020 01:12:54 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfw.ru; s=mail; t=1604959974; bh=ZPqyr/I2N/j3BDpRUtmH+uY2Ujtl09bEsYarso3YVGI=; h=Message-Id:Cc:Subject:In-Reply-To:Date:References:To:From; b=WhCFb8Fu1LgP1rXuuJkFZ2dR3Gs5H5QG9dcsbpds+Ly2QpqBuRjD35n36aMgEp4a6 Q7Ux9YtW8N/xXm3+D0m5wxddWulEFwera8fTHb0uIJvAWB+/mrx5JWEqXBHCu2qyEn ZuOrlRhHFglALP56E6Eh3WEVm5txYBuClcy/krOU= Received: by myt2-c3952fd46804.qloud-c.yandex.net with HTTP; Tue, 10 Nov 2020 01:12:54 +0300 From: Alexander V. Chernikov To: John Baldwin , Konstantin Belousov Cc: freebsd-arch In-Reply-To: References: <356181604233241@mail.yandex.ru> <20201101183919.GK2654@kib.kiev.ua> Subject: Re: Versioning support for kernel<>userland sysctl interface MIME-Version: 1.0 X-Mailer: Yamail [ http://yandex.ru ] 5.0 Date: Mon, 09 Nov 2020 22:12:54 +0000 Message-Id: <835171604959876@mail.yandex.ru> Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=utf-8 X-Rspamd-Queue-Id: 4CVQGy41R0z4R5M X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=ipfw.ru header.s=mail header.b=WhCFb8Fu; dmarc=none; spf=pass (mx1.freebsd.org: domain of melifaro@ipfw.ru designates 77.88.28.111 as permitted sender) smtp.mailfrom=melifaro@ipfw.ru X-Spamd-Result: default: False [-3.60 / 15.00]; ARC_NA(0.00)[]; RWL_MAILSPIKE_VERYGOOD(0.00)[77.88.28.111:from]; R_DKIM_ALLOW(-0.20)[ipfw.ru:s=mail]; FREEFALL_USER(0.00)[melifaro]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+ip4:77.88.0.0/18]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[ipfw.ru]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; RCVD_COUNT_THREE(0.00)[4]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[ipfw.ru:+]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FREEMAIL_TO(0.00)[freebsd.org,gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:13238, ipnet:77.88.0.0/18, country:RU]; MAILMAN_DEST(0.00)[freebsd-arch]; RCVD_IN_DNSWL_LOW(-0.10)[77.88.28.111:from] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Nov 2020 22:13:00 -0000 02.11.2020, 18:30, "John Baldwin" : > On 11/1/20 10:39 AM, Konstantin Belousov wrote: >>  On Sun, Nov 01, 2020 at 12:47:17PM +0000, Alexander V. Chernikov wrote: >>>  Hey folks, >>> >>>  I would like to propose a change [1] that introduces versioning support for the data structures exposed to userland by sysctl interface. >>> >>>  We have dozens of interfaces exposing various statistics and control data by filling in and exporting structures. >>>  net.inet6.icmp6.stats or net.inet6.icmp6.nd6_prlist can be a good examples of such interaction. >>> >>>  Most of these structure do not have version information embedded, which requires us to break compatibility when changing them. >>> >>>  The idea behind the change is really simple: append current structure version number to the sysctl OID to get the desired version of the structure. >>> >>>  For example, fetching "net.inet6.icmp6.stats" becomes "net.inet6.icmp6.stats.1" (or, code-wise, something like "net.inet6.icmp6.stats." __XSTRING(ICMP6STAT_VER)). >>> >>>  The interface satistifes the following properties: >>>  1) preserving backward compatibility >>>  2) allowing for low-cost kernel ABI maintenance >>>  2) allowing for forward compatibility - application can fetch list of all supported versions of a structure. >>> >>>  Example: >>>  11:25 [1] m@devel0 sysctl -o net.inet6.icmp6.stats >>>  net.inet6.icmp6.stats.0: Format:S Length:4328 Dump:0x00000000000000000000000000000000... >>>  net.inet6.icmp6.stats.1: Format:S Length:4624 Dump:0x00000000000000000000000000000000... >>> >>>  12:42 [1] m@devel0 ~/test net.inet6.icmp6.stats >>>  sysctlnametomib("net.inet6.icmp6.stats")=0 -> 4.28.58.1. >>>  sysctl("net.inet6.icmp6.stats")=-1 sz=512 >>> >>>  12:43 [1] m@devel0 ~/test net.inet6.icmp6.stats.1 >>>  sysctlnametomib("net.inet6.icmp6.stats.1")=0 -> 4.28.58.1.1. >>>  sysctl("net.inet6.icmp6.stats.1")=-1 sz=512 >>> >>>  Some downside of this change would be the potential need to duplicate structures definitions to be 100% sure we don't break API. For example, rebuilding & running 3rd-party software may result in error fetching the necessary structure. Unmodified application build with the latest structure version will request an oldest version of a structure. >>> >>>  I see multiple approaches to address it: >>>  1) duplicate structure with a new name (appending postfix like _v) - works the best for small structure >>>  2) do nothing specific - will mostly work for append-only statistics structures >>>  3) rely on kernel warning on calling unversioned sysctls to identify & fix the problematic customers >>> >>>  Please take a look at [1] for a more detailed technical description of a change. >>> >>>  Any feedback is highly appreciated. >>> >>>  [1] https://reviews.freebsd.org/D27035 >> >>  There was some desire to provide backward ABI-compat shims for sysctls during >>  ino64 work, https://reviews.freebsd.org/D10439. >> >>  Most prominent idea from that time, AFAIR, was to have another MIB tree, >>  that would be have all the same MIBs but rooted with osrel. In other words, >>  if you accessed e.g. MIB 1.2.3.4, libc internally translates that to MIB >>  1024..1.2.3.4, and kernel applies whatever shims it knows about that >>  osrel version. If there is no compat, call goes directly to 1.2.3.4 handler. >>  The osrel value can be taken from the binary ABI note, as an example. >> >>  There was some discussion, but after more work done on this, it appeared >>  that not much sysctls need ABI shims at all, and interesting cases could >>  be adequately handled simply by checking passed buffer length. >> >>  The osrel approach has a drawback that it ignores possibly different ABI >>  of the loaded shared library which might make the call. On the other hand, >>  it avoids introducing additional burden of requiring consumers to learn >>  new MIBs and manually handle versions. > > Some other thoughts were along the lines of having a kind of "sysctl tree" > version that would get bumped when there was an ABI breakage of a node > and then have associated versioned symbols of sysctl() and related symbols > in libc to handle the shared library problem. However, you'd want to avoid > an explosion of symbol versions. > > One of the goals was to keep API compat as much as possible. I think we > might have also considered having 'sysctl_ver()' that takes an explicit > __FreeBSD_version value and having 'sysctl()' become a macro that passes > in '_FreeBSD_version' to sysctl_ver() so that you encode the desired ABI > in each invocation. You would have to keep existing symbols in libc that > would pass in a version of 0. One of the concerns with this approach is > it removes the public 'sysctl' symbol which might break configure, etc. > scripts. Hi Konstantin, John, Sorry for the belated reply. Thank you for sharing the outcome of previous discussions on the topic. It’s extremely useful and generally - re-using osrel is an interesting approach I haven’t considered. Let me try to take a look at it a bit further and summarise my understanding: So, generally there are 2 different parts: choosing structure versioning scheme and passing this information to the kernel. For the former, osrel has an awesome property of increasing counter already attached to an application. It potentially allows to avoid any API/ABI changes in userland programs. However, if we choose to go that way, we will be coupling every structure change to an osrel change. Will it scale when we have more structures moved to this mechanism? Do we want 200 bumps per release? How do we bump it for 3rd-party modules? The latter, version passing mechanism, is comprised of sysctlbyname() and sysctl() calls. Prepending string / OIDs will add performance implications for each and every sysctl call, despite the fact that multiple versions are needed only for the tiny amount of sysctls in kernel. I’m wondering if there are any alternative way to natively pass osrel to the kernel. For example, is p_osrel something that can potentially be made reliable (e.g. return non-zero value) for all “new” binaries? I thought about it a bit more and I still prefer the approach describe in the initial messages. It is * ABI compatible * requires minimal API changes (sysctlbyname("net.inet6.icmp6.stats") becomes sysctlbyname(SYSCTL_VERSIONED("net.inet6.icmp6.stats", ICMP6STAT_VER)) * does not have any perfomance impact / symbols magic * solves any issues with linking - multiple libraries within a binary can request/use different versions of a struct * no problem with 3rd party kernel modules I’d love to get a bit more feedback/critique on that approach as well :-) > -- > John Baldwin > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@freebsd.org Mon Nov 9 22:29:08 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 3A4CF46EFC5 for ; Mon, 9 Nov 2020 22:29:08 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from forward501o.mail.yandex.net (forward501o.mail.yandex.net [IPv6:2a02:6b8:0:1a2d::611]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CVQdb0thWz4S8B for ; Mon, 9 Nov 2020 22:29:06 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from mxback30g.mail.yandex.net (mxback30g.mail.yandex.net [IPv6:2a02:6b8:0:1472:2741:0:8b7:330]) by forward501o.mail.yandex.net (Yandex) with ESMTP id 85B011E80171; Tue, 10 Nov 2020 01:28:57 +0300 (MSK) Received: from localhost (localhost [::1]) by mxback30g.mail.yandex.net (mxback/Yandex) with ESMTP id pbz1iOfF2Y-SuOWEFwW; Tue, 10 Nov 2020 01:28:56 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfw.ru; s=mail; t=1604960936; bh=rsTZAncfKQIfIwHjFDsU1wE4XHGvKMgJIj9yiGi1UBg=; h=Message-Id:Cc:Subject:In-Reply-To:Date:References:To:From; b=ayrw8H+or3mi7ZlWLHpcQmvbV4L/947vCs3YVmqcYxr+O6KGEr7recicup8kYRdMQ fHVCZ7cu0mFuItqGua09LyYR/2d3R6Q+hw+eUF0FUEj71hy717s7adN6bm1SY0s3AF JzATx9ft25SBZe+TkxHRB0Oj/TfT+EPJ2mHB9lpU= Received: by myt3-e9df8ad73dde.qloud-c.yandex.net with HTTP; Tue, 10 Nov 2020 01:28:56 +0300 From: Alexander V. Chernikov To: John-Mark Gurney Cc: freebsd-arch In-Reply-To: <20201102221330.GS31099@funkthat.com> References: <356181604233241@mail.yandex.ru> <20201102221330.GS31099@funkthat.com> Subject: Re: Versioning support for kernel<>userland sysctl interface MIME-Version: 1.0 X-Mailer: Yamail [ http://yandex.ru ] 5.0 Date: Mon, 09 Nov 2020 22:28:56 +0000 Message-Id: <428251604959994@mail.yandex.ru> Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=utf-8 X-Rspamd-Queue-Id: 4CVQdb0thWz4S8B X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=ipfw.ru header.s=mail header.b=ayrw8H+o; dmarc=none; spf=pass (mx1.freebsd.org: domain of melifaro@ipfw.ru designates 2a02:6b8:0:1a2d::611 as permitted sender) smtp.mailfrom=melifaro@ipfw.ru X-Spamd-Result: default: False [-3.50 / 15.00]; ARC_NA(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[2a02:6b8:0:1a2d::611:from]; R_DKIM_ALLOW(-0.20)[ipfw.ru:s=mail]; FREEFALL_USER(0.00)[melifaro]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a02:6b8:0:1000::/52:c]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[ipfw.ru]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; SPAMHAUS_ZRD(0.00)[2a02:6b8:0:1a2d::611:from:127.0.2.255]; RCVD_COUNT_THREE(0.00)[4]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[ipfw.ru:+]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:13238, ipnet:2a02:6b8::/32, country:RU]; MAILMAN_DEST(0.00)[freebsd-arch] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Nov 2020 22:29:08 -0000 02.11.2020, 22:13, "John-Mark Gurney" : > Alexander V. Chernikov wrote this message on Sun, Nov 01, 2020 at 12:47 +0000: >>  I would like to propose a change [1] that introduces versioning support for the data structures exposed to userland by sysctl interface. >> >>  We have dozens of interfaces exposing various statistics and control data by filling in and exporting structures. >>  net.inet6.icmp6.stats or net.inet6.icmp6.nd6_prlist can be a good examples of such interaction. > > We also need to decide the policy on dealing w/ support for these > data structures going forward... Because if we do the simple, default > policy of all userland apps can handle all structures, and kernel can > produce all structures, we now have an unbounded growth of complexity > and testing... I totally agree. While backward compatibility is important, it should not impose notable technical debt. I had the following as my mental model: * the code should be organised to support output for the latest version. * There should be a separate, isolatable, piece of code that converts from latest to n-1 (which can be chained: from n-1 to n-2 and so on) * when introducing changes we should garden older versions by COMPAT_X defines. > I do understand the desire to solve this problem, but IMO, this solution > is too simple, and dangerous to unbounded growth above. > > While I do like it's simplicity, one idea that I've had, while being a > bit more complex, has the ability to handle modification in a more > compatible way. > > Since we have dtrace, one of the outputs of dtrace is ctf, which allows > use to convey the type and structure information in a machine parseable > format. The idea is that each sysctl oid (that supports this) would > have the ability to fetch the ctf data for that oid. The userland would > then be able to convert the members to the local members of a similar > struct. A set of defaults could also be provided, allowing new fields > to have sane initial values. > > As long as the name of a structure member is never reused for a different > meaning, this will get us most of the way there, in a much cleaner > method... > > I do realize that this isn't the easiest thing, but the tools to do this > are in the tree, and would solve this problem, IMO, in a way that is a > lot more maintainable, and long term than the current proposal. > > Other solution, use ctf data to produce nvlist generation/consumption > code for a structure... The data transfered would be larger, but also > more compatible... I do like idea on the self-documenting approach. It addresses append-only case nicely, but that's not always the case. For example, in the initially-discussed icmp6 stats we have 256 64-bit counters representing icmp6 protocol historgram, resulting in 4k frame being allocated on stack for the current kernel implementation. If in the future our icmp6 kernel implementation changes and we won't be able to provide this counters, eventually we would want to remove all these counters from the structure. I'm not sure how can this be addressed without some sort of versioning scheme. > Overall, using bare structures is an ABI compatibility nightmare that > should be fixed in a better method. > > -- >   John-Mark Gurney Voice: +1 415 225 5579 > >      "All that I will do, has been done, All that I have, has not." > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@freebsd.org Mon Nov 9 23:15:40 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 03F6246FFA7 for ; Mon, 9 Nov 2020 23:15:40 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gate2.funkthat.com", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CVRgH0f3cz4Vwl for ; Mon, 9 Nov 2020 23:15:38 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (localhost [127.0.0.1]) by gold.funkthat.com (8.15.2/8.15.2) with ESMTPS id 0A9NFThZ007399 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 9 Nov 2020 15:15:29 -0800 (PST) (envelope-from jmg@gold.funkthat.com) Received: (from jmg@localhost) by gold.funkthat.com (8.15.2/8.15.2/Submit) id 0A9NFTMK007398; Mon, 9 Nov 2020 15:15:29 -0800 (PST) (envelope-from jmg) Date: Mon, 9 Nov 2020 15:15:29 -0800 From: John-Mark Gurney To: "Alexander V. Chernikov" Cc: freebsd-arch Subject: Re: Versioning support for kernel<>userland sysctl interface Message-ID: <20201109231529.GH31099@funkthat.com> Mail-Followup-To: "Alexander V. Chernikov" , freebsd-arch References: <356181604233241@mail.yandex.ru> <20201102221330.GS31099@funkthat.com> <428251604959994@mail.yandex.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <428251604959994@mail.yandex.ru> X-Operating-System: FreeBSD 11.3-STABLE amd64 X-PGP-Fingerprint: D87A 235F FB71 1F3F 55B7 ED9B D5FF 5A51 C0AC 3D65 X-Files: The truth is out there X-URL: https://www.funkthat.com/ X-Resume: https://www.funkthat.com/~jmg/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? User-Agent: Mutt/1.6.1 (2016-04-27) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (gold.funkthat.com [127.0.0.1]); Mon, 09 Nov 2020 15:15:29 -0800 (PST) X-Rspamd-Queue-Id: 4CVRgH0f3cz4Vwl X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of jmg@gold.funkthat.com has no SPF policy when checking 208.87.223.18) smtp.mailfrom=jmg@gold.funkthat.com X-Spamd-Result: default: False [-1.80 / 15.00]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; FREEFALL_USER(0.00)[jmg]; FROM_HAS_DN(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[208.87.223.18:from]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; MID_RHS_MATCH_FROM(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[funkthat.com]; AUTH_NA(1.00)[]; SPAMHAUS_ZRD(0.00)[208.87.223.18:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-1.00)[-1.000]; R_SPF_NA(0.00)[no SPF record]; FORGED_SENDER(0.30)[jmg@funkthat.com,jmg@gold.funkthat.com]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:32354, ipnet:208.87.216.0/21, country:US]; FROM_NEQ_ENVFROM(0.00)[jmg@funkthat.com,jmg@gold.funkthat.com]; MAILMAN_DEST(0.00)[freebsd-arch] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Nov 2020 23:15:40 -0000 Alexander V. Chernikov wrote this message on Mon, Nov 09, 2020 at 22:28 +0000: > 02.11.2020, 22:13, "John-Mark Gurney" : > > Alexander V. Chernikov wrote this message on Sun, Nov 01, 2020 at 12:47 +0000: > >> I would like to propose a change [1] that introduces versioning support for the data structures exposed to userland by sysctl interface. > >> > >> We have dozens of interfaces exposing various statistics and control data by filling in and exporting structures. > >> net.inet6.icmp6.stats or net.inet6.icmp6.nd6_prlist can be a good examples of such interaction. > > > > We also need to decide the policy on dealing w/ support for these > > data structures going forward... Because if we do the simple, default > > policy of all userland apps can handle all structures, and kernel can > > produce all structures, we now have an unbounded growth of complexity > > and testing... > I totally agree. While backward compatibility is important, it should not impose notable technical debt. I had the following as my mental model: > * the code should be organised to support output for the latest version. > * There should be a separate, isolatable, piece of code that converts from latest to n-1 (which can be chained: from n-1 to n-2 and so on) > * when introducing changes we should garden older versions by COMPAT_X defines. Yeah, if we restrict the code to COMPAT_x for the existing versions, and ensure that it doesn't change, it isn't TOO terrible, but still, the likelyhood of people writing tests and verifying that they work to make sure that the compat code works for all n-x versions isn't great... it's doable, but I dobut most people are going to put in the effort.. > > I do understand the desire to solve this problem, but IMO, this solution > > is too simple, and dangerous to unbounded growth above. > > > > While I do like it's simplicity, one idea that I've had, while being a > > bit more complex, has the ability to handle modification in a more > > compatible way. > > > > Since we have dtrace, one of the outputs of dtrace is ctf, which allows > > use to convey the type and structure information in a machine parseable > > format. The idea is that each sysctl oid (that supports this) would > > have the ability to fetch the ctf data for that oid. The userland would > > then be able to convert the members to the local members of a similar > > struct. A set of defaults could also be provided, allowing new fields > > to have sane initial values. > > > > As long as the name of a structure member is never reused for a different > > meaning, this will get us most of the way there, in a much cleaner > > method... > > > > I do realize that this isn't the easiest thing, but the tools to do this > > are in the tree, and would solve this problem, IMO, in a way that is a > > lot more maintainable, and long term than the current proposal. > > > > Other solution, use ctf data to produce nvlist generation/consumption > > code for a structure... The data transfered would be larger, but also > > more compatible... > I do like idea on the self-documenting approach. It addresses append-only case nicely, but that's not always the case. > For example, in the initially-discussed icmp6 stats we have 256 64-bit counters representing icmp6 protocol historgram, resulting in 4k frame being allocated on stack for the current kernel implementation. If in the future our icmp6 kernel implementation changes and we won't be able to provide this counters, eventually we would want to remove all these counters from the structure. I'm not sure how can this be addressed without some sort of versioning scheme. So the bit that gets the ctf data would also have an nvlist (or something) that contains the defaults for when fields are removed... So, initally: struct foo { int x; int y; }; Then it gets changed to: struct foo { int x; int y; int z; }; This is easy, the z will be included, and transmitted, but be ignored by older code, then when it changes to: struct { int y; int z; }; The ctf data would be something like: , Where nvlist defaults is: x: -1 So, the consuming code would set the defaults from the nvlist first, then set the fetch data, so that x gets set to some default value. With a few simple rules like this, handling deletions is not a problem when the older code is expecting it. If a variable must always have a value that "must" be correct (and a default cannot be set), then another member needs to be added (I think ctf handles bit fields properly) that says if that member is valid... when it gets removed, that is valid flag gets set to zero, and then the old code knows not to handle it. > > Overall, using bare structures is an ABI compatibility nightmare that > > should be fixed in a better method. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@freebsd.org Thu Nov 12 22:15:24 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id D1BA5466553 for ; Thu, 12 Nov 2020 22:15:24 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from forward500j.mail.yandex.net (forward500j.mail.yandex.net [IPv6:2a02:6b8:0:801:2::110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CXGBM1dtlz4qBH for ; Thu, 12 Nov 2020 22:15:22 +0000 (UTC) (envelope-from melifaro@ipfw.ru) Received: from mxback28o.mail.yandex.net (mxback28o.mail.yandex.net [IPv6:2a02:6b8:0:1a2d::79]) by forward500j.mail.yandex.net (Yandex) with ESMTP id 1929911C1364; Fri, 13 Nov 2020 01:15:18 +0300 (MSK) Received: from localhost (localhost [::1]) by mxback28o.mail.yandex.net (mxback/Yandex) with ESMTP id 9Y82w6dz20-FHTaL2WK; Fri, 13 Nov 2020 01:15:17 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfw.ru; s=mail; t=1605219317; bh=uOFRdulH6firLsHEtCAEzT9+3dGmAt4uCQTMc1iGnTw=; h=Message-Id:Cc:Subject:In-Reply-To:Date:References:To:From; b=L6W2yZ9xTZYGlEZFN3fj3jEsYl8PDYUlil9K797GFpnoivHUiH0Gyyoc6slm9M4ot gsZM5ByV+wuy+sxup1kwhAV38p9rlHVmkPi3q7P51VV8+O0AhMAfkb/q75SOY6giMK VoVq74M0KDqcgWNiJmO97dOafy1A1LEScWmNG1ro= Received: by iva3-783e33281a0a.qloud-c.yandex.net with HTTP; Fri, 13 Nov 2020 01:15:16 +0300 From: Alexander V. Chernikov To: John-Mark Gurney Cc: freebsd-arch In-Reply-To: <20201109231529.GH31099@funkthat.com> References: <356181604233241@mail.yandex.ru> <20201102221330.GS31099@funkthat.com> <428251604959994@mail.yandex.ru> <20201109231529.GH31099@funkthat.com> Subject: Re: Versioning support for kernel<>userland sysctl interface MIME-Version: 1.0 X-Mailer: Yamail [ http://yandex.ru ] 5.0 Date: Thu, 12 Nov 2020 22:15:16 +0000 Message-Id: <10301605217434@mail.yandex.ru> Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=utf-8 X-Rspamd-Queue-Id: 4CXGBM1dtlz4qBH X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=ipfw.ru header.s=mail header.b=L6W2yZ9x; dmarc=none; spf=pass (mx1.freebsd.org: domain of melifaro@ipfw.ru designates 2a02:6b8:0:801:2::110 as permitted sender) smtp.mailfrom=melifaro@ipfw.ru X-Spamd-Result: default: False [-3.60 / 15.00]; ARC_NA(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[2a02:6b8:0:801:2::110:from]; R_DKIM_ALLOW(-0.20)[ipfw.ru:s=mail]; FREEFALL_USER(0.00)[melifaro]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a02:6b8:0::/52]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[ipfw.ru]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; SPAMHAUS_ZRD(0.00)[2a02:6b8:0:801:2::110:from:127.0.2.255]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[ipfw.ru:+]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:13238, ipnet:2a02:6b8::/32, country:RU]; MAILMAN_DEST(0.00)[freebsd-arch]; RCVD_IN_DNSWL_LOW(-0.10)[2a02:6b8:0:801:2::110:from] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Nov 2020 22:15:24 -0000 09.11.2020, 23:15, "John-Mark Gurney" : > Alexander V. Chernikov wrote this message on Mon, Nov 09, 2020 at 22:28 +0000: >>  02.11.2020, 22:13, "John-Mark Gurney" : >>  > Alexander V. Chernikov wrote this message on Sun, Nov 01, 2020 at 12:47 +0000: >>  >>  I would like to propose a change [1] that introduces versioning support for the data structures exposed to userland by sysctl interface. >>  >> >>  >>  We have dozens of interfaces exposing various statistics and control data by filling in and exporting structures. >>  >>  net.inet6.icmp6.stats or net.inet6.icmp6.nd6_prlist can be a good examples of such interaction. >>  > >>  > We also need to decide the policy on dealing w/ support for these >>  > data structures going forward... Because if we do the simple, default >>  > policy of all userland apps can handle all structures, and kernel can >>  > produce all structures, we now have an unbounded growth of complexity >>  > and testing... >>  I totally agree. While backward compatibility is important, it should not impose notable technical debt. I had the following as my mental model: >>  * the code should be organised to support output for the latest version. >>  * There should be a separate, isolatable, piece of code that converts from latest to n-1 (which can be chained: from n-1 to n-2 and so on) >>  * when introducing changes we should garden older versions by COMPAT_X defines. > > Yeah, if we restrict the code to COMPAT_x for the existing versions, > and ensure that it doesn't change, it isn't TOO terrible, but still, > the likelyhood of people writing tests and verifying that they work > to make sure that the compat code works for all n-x versions isn't > great... it's doable, but I dobut most people are going to put in the > effort.. > >>  > I do understand the desire to solve this problem, but IMO, this solution >>  > is too simple, and dangerous to unbounded growth above. >>  > >>  > While I do like it's simplicity, one idea that I've had, while being a >>  > bit more complex, has the ability to handle modification in a more >>  > compatible way. >>  > >>  > Since we have dtrace, one of the outputs of dtrace is ctf, which allows >>  > use to convey the type and structure information in a machine parseable >>  > format. The idea is that each sysctl oid (that supports this) would >>  > have the ability to fetch the ctf data for that oid. The userland would >>  > then be able to convert the members to the local members of a similar >>  > struct. A set of defaults could also be provided, allowing new fields >>  > to have sane initial values. >>  > >>  > As long as the name of a structure member is never reused for a different >>  > meaning, this will get us most of the way there, in a much cleaner >>  > method... >>  > >>  > I do realize that this isn't the easiest thing, but the tools to do this >>  > are in the tree, and would solve this problem, IMO, in a way that is a >>  > lot more maintainable, and long term than the current proposal. >>  > >>  > Other solution, use ctf data to produce nvlist generation/consumption >>  > code for a structure... The data transfered would be larger, but also >>  > more compatible... >>  I do like idea on the self-documenting approach. It addresses append-only case nicely, but that's not always the case. >>  For example, in the initially-discussed icmp6 stats we have 256 64-bit counters representing icmp6 protocol historgram, resulting in 4k frame being allocated on stack for the current kernel implementation. If in the future our icmp6 kernel implementation changes and we won't be able to provide this counters, eventually we would want to remove all these counters from the structure. I'm not sure how can this be addressed without some sort of versioning scheme. > > So the bit that gets the ctf data would also have an nvlist (or > something) that contains the defaults for when fields are removed... > > So, initally: > struct foo { >         int x; >         int y; > }; > > Then it gets changed to: > struct foo { >         int x; >         int y; >         int z; > }; > > This is easy, the z will be included, and transmitted, but be ignored > by older code, then when it changes to: > struct { >         int y; >         int z; > }; > > The ctf data would be something like: > , > > Where nvlist defaults is: > x: -1 > > So, the consuming code would set the defaults from the nvlist first, > then set the fetch data, so that x gets set to some default value. > > With a few simple rules like this, handling deletions is not a problem > when the older code is expecting it. If a variable must always have > a value that "must" be correct (and a default cannot be set), then > another member needs to be added (I think ctf handles bit fields > properly) that says if that member is valid... when it gets removed, > that is valid flag gets set to zero, and then the old code knows not > to handle it. Yep, thank you for clarifying! It looks pretty close to thrift/protobuf definitions with optional fields and default values. We get much better support for "common" operations like adds and some removals. However, field renaming (or, wider, changing field path, if we're talking about nested structures) is still problematic. For example, https://docs.microsoft.com/en-us/aspnet/core/grpc/versioning?view=aspnetcore-5.0 lists the common practices/problems associated with this approach. > >>  > Overall, using bare structures is an ABI compatibility nightmare that >>  > should be fixed in a better method. I don't disagree and I'd love to see some variations of the interface described above in base - it will make development much easier. I'm trying to state 2 things: 1) Versioning is mostly orthogonal to the encoding scheme. There are cases where we need some sort of versioning to keep ABI even with the variations of an above interface 2) we need to move from current state of the things to _some_ newer interface, which, itself, is _currently_ a breaking change. So what I'm advocating for is the generic mechanism that allows to pass version from userland to kernel in sysctl. Adopting it will allow us to gradually move to nvlist / ctf or whatever approach we prefer, with pace not bound to any release cycle. It will also allow us to switch such mechanism to another one if so desired. This versioning scheme already exists in the code, is light-weight and can be easily adopted before FreeBSD13 release. > > -- >   John-Mark Gurney Voice: +1 415 225 5579 > >      "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@freebsd.org Fri Nov 13 18:33:46 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 7D38D2EA27E for ; Fri, 13 Nov 2020 18:33:46 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CXnD91gg2z4kD0 for ; Fri, 13 Nov 2020 18:33:44 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-qk1-x734.google.com with SMTP id r7so9724971qkf.3 for ; Fri, 13 Nov 2020 10:33:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=r1PowvalaeaThrpHZinHU3BJs6Oitirr5nvgxz9+jgw=; b=nnZdg38BNOWjLhVuNjIgUSksEveh2M31XzFEqEk/xmEoShcqZXiLGng5ArNRb5W2nc EIB5/aBu7UplFewjoCcEAgQSn6ZLZaB3Rf9QXBwImCxITnhnVbVnWEHZMu5nEia1nkqS RcDWMve9D4dkSiXoSbFwVJ+xE+kWknwGDmcXv7MRP5EVDN1/15gi5T/yN/a3dO7kQsPi dFyGTAPLpgIygfQFowUnr+ZFApiPqoI9crXoAfuseN5Y0jB3Ok8eMpyPdC2mAlOcSByy 3qT8I8rvXMIgr4JA/7wQTd/J9vceowA0KIjTbuXVcQHwcPbFYpgwZ74m4wqc848P/JTw WJ/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=r1PowvalaeaThrpHZinHU3BJs6Oitirr5nvgxz9+jgw=; b=lkQA62cyr62RH2LNmMNRGEpxxSTmG1RePlaj4vW2yfxb5oi9iWfWKzZ8GOs9sAzoYJ xdLjFgsugee9dC/1KMjPm2Y4QV7PH8my/OK0hRnhUTH/StOEt5C5DAJ9BK9xCqjgm1B0 0ITay9oD42e17vXacyuglnseq+4O+LV+/eao3ntGqsUO/dQUL1RorpZV0xMWTWgXgrg1 Vj4l5qSAINX6CXvAek5R/0JDKa9vC++rL0YVskMNFxqOTycSakmbe6jPmYetI9TNcxqF 4Aqrb2oH5GXtpQ6WN8W9KRJuZjVGvUr02DeNBuBc4VZQPB1rB2tQx7VeW4W14bSrUB75 b6aw== X-Gm-Message-State: AOAM533jPxSAx8OXTAcVLo5Qf7o38Mrh2ceSJelHl7mQlxwvEWeGIrB+ 4WEOiuhA4Nncqh9HPdt0n/XH14GFMv9o4WfU7wCUKXP2BGPVNg== X-Google-Smtp-Source: ABdhPJxWYe4/yN3dgjHmpquJ1LDoOdfZ3+i12lBq9BqOG+6xp+en/ahwg+Le9lyzagFZYMqVyvObO9XZ5+b6WP160oc= X-Received: by 2002:a37:e207:: with SMTP id g7mr3361018qki.44.1605292421511; Fri, 13 Nov 2020 10:33:41 -0800 (PST) MIME-Version: 1.0 From: Warner Losh Date: Fri, 13 Nov 2020 11:33:30 -0700 Message-ID: Subject: MAXPHYS bump for FreeBSD 13 To: "freebsd-arch@freebsd.org" X-Rspamd-Queue-Id: 4CXnD91gg2z4kD0 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20150623.gappssmtp.com header.s=20150623 header.b=nnZdg38B; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::734) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-3.00 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; RCVD_COUNT_TWO(0.00)[2]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20150623.gappssmtp.com:s=20150623]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-arch@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; RCPT_COUNT_ONE(0.00)[1]; SPAMHAUS_ZRD(0.00)[2607:f8b0:4864:20::734:from:127.0.2.255]; DKIM_TRACE(0.00)[bsdimp-com.20150623.gappssmtp.com:+]; NEURAL_HAM_SHORT(-1.00)[-0.998]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::734:from]; TO_DN_EQ_ADDR_ALL(0.00)[]; R_SPF_NA(0.00)[no SPF record]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RBL_DBL_DONT_QUERY_IPS(0.00)[2607:f8b0:4864:20::734:from]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_TLS_ALL(0.00)[]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; MAILMAN_DEST(0.00)[freebsd-arch] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.34 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Nov 2020 18:33:46 -0000 Greetings, We currently have a MAXPHYS of 128k. This is the maximum size of I/Os that we normally use (though there are exceptions). I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping DFLTPHYS to 1MB. 128k was good back in the 90s/2000s when memory was smaller, drives did smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O devices can easily do 1MB or more and there's performance benefits from scheduling larger I/Os. Bumping this will mean larger struct buf and struct bio. Without some concerted effort, it's hard to make this be a sysctl tunable. While that's desirable, perhaps, it shouldn't gate this bump. The increase in size for 1MB is modest enough. The NVMe driver currently is limited to 1MB transfers due to limitations in the NVMe scatter gather lists and a desire to preallocate as much as possible up front. Most NVMe drivers have maximum transfer sizes between 128k and 1MB, with larger being the trend. The mp[rs] drivers can use larger MAXPHYS, though resource limitations on some cards hamper bumping it beyond about 2MB. The AHCI driver is happy with 1MB and larger sizes. Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too large even for our needs due to limiting factors in the upper layers making it hard to schedule I/Os larger than 3-4MB reliably. So this should be a relatively low risk, and high benefit. I don't think other kernel tunables need to change, but I always run into trouble with runningbufs :) Comments? Anything I forgot? Warner From owner-freebsd-arch@freebsd.org Fri Nov 13 19:09:33 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 8A2C52EA756 for ; Fri, 13 Nov 2020 19:09:33 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CXp1R3vfpz4mKb for ; Fri, 13 Nov 2020 19:09:31 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: by mail-wm1-x332.google.com with SMTP id c9so9760000wml.5 for ; Fri, 13 Nov 2020 11:09:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:subject:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=bmH1E7Qgud8GfIefNnEsEP2XSb1QSnZNld06YaPmBFg=; b=VrcqP/+crwIXRp/JAhwPCCEKR+By1LNVoIojPmzdNk6CwslEnJEcvgVAhsGsikwzZl 7KXi5TIdxJ12FbaoeFGbl0LGSofkMbFz2eI1o8Cow1FlY13bmcFpIUOayUFr48dDSp86 e3wIaPXarNHUylLMCKwvl4k/tAkPFpyVVqE5iB5ZFTXIrw4dwOJkB+IjXNnrdrgriKi2 /cgif8ZA1C/tndd02pwGq5KpzpABfdg6GLsUbB4xm5kU3BzBHKgRWvJ0NKZaaSw6nqRl fT0mRjczZu/a3F6oUVKHrNmnEWBdFa60IzWv3wXkDUJUqiQFd8P4XuWVi2o/nDczrFkb prEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=bmH1E7Qgud8GfIefNnEsEP2XSb1QSnZNld06YaPmBFg=; b=gaZqKLNlPozq30qsbKeNSS00U8ZYUjIfMoM15ohN4+IVBx8rSiw4QuZZkMbghvJfc+ WSrX+XnlbN3OhwyWJNeGVuNxNkYgoOVsH13TrSD6wXp5cQ3AYk6LNBo/kaltO1tloQZo Ux9vmpSeZmwnwNd38RKdsXgee9j0YWJcswQZQ/0nj5DA/sDZoN+k9lr3BJrKDcO/+u89 hj+aKqR3hfN8oH/aWWUR1DbbLgvKa68BEhO5Ps3t1ubYLCoww7HG+QEMYU0ut0NhjsEk KukTRP3RT+dlwIAL7uE+0C2uBeX3Mbf07jnzcEtEPQTpAQ89uALPZLa4g4bzCh5b+2BG SgSg== X-Gm-Message-State: AOAM533Gm4dFG6/5VwQ4lZDnw6Yw5pOV/MQF92bW3Vvx+inD79COD7DX l2RQrXK2bmm0UBtEIgQINgcDbn8f9fI= X-Google-Smtp-Source: ABdhPJw30l3rLLin/fGKS5EN+68ayi6MXzqkg2md51cPvSM3vT8Jai6Gsy3LMFdPoN4viE/n2/7pcA== X-Received: by 2002:a1c:4b18:: with SMTP id y24mr4141654wma.154.1605294568717; Fri, 13 Nov 2020 11:09:28 -0800 (PST) Received: from ernst.home (p5b02350e.dip0.t-ipconnect.de. [91.2.53.14]) by smtp.gmail.com with ESMTPSA id h15sm12055119wrw.15.2020.11.13.11.09.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Nov 2020 11:09:27 -0800 (PST) Date: Fri, 13 Nov 2020 20:09:26 +0100 From: Gary Jennejohn To: "freebsd-arch@freebsd.org" Subject: Re: MAXPHYS bump for FreeBSD 13 Message-ID: <20201113190926.082fbdaf@ernst.home> In-Reply-To: References: Reply-To: gljennjohn@gmail.com X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; amd64-portbld-freebsd13.0) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4CXp1R3vfpz4mKb X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=VrcqP/+c; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of gljennjohn@gmail.com designates 2a00:1450:4864:20::332 as permitted sender) smtp.mailfrom=gljennjohn@gmail.com X-Spamd-Result: default: False [-3.96 / 15.00]; HAS_REPLYTO(0.00)[gljennjohn@gmail.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_SHORT(-0.96)[-0.961]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RBL_DBL_DONT_QUERY_IPS(0.00)[2a00:1450:4864:20::332:from]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; RECEIVED_SPAMHAUS_PBL(0.00)[91.2.53.14:received]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-arch@freebsd.org]; FREEMAIL_REPLYTO(0.00)[gmail.com]; RCPT_COUNT_ONE(0.00)[1]; SPAMHAUS_ZRD(0.00)[2a00:1450:4864:20::332:from:127.0.2.255]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::332:from]; TO_DN_EQ_ADDR_ALL(0.00)[]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-arch] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Nov 2020 19:09:33 -0000 On Fri, 13 Nov 2020 11:33:30 -0700 Warner Losh wrote: > Greetings, > > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os that > we normally use (though there are exceptions). > > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping > DFLTPHYS to 1MB. > > 128k was good back in the 90s/2000s when memory was smaller, drives did > smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O > devices can easily do 1MB or more and there's performance benefits from > scheduling larger I/Os. > > Bumping this will mean larger struct buf and struct bio. Without some > concerted effort, it's hard to make this be a sysctl tunable. While that's > desirable, perhaps, it shouldn't gate this bump. The increase in size for > 1MB is modest enough. > > The NVMe driver currently is limited to 1MB transfers due to limitations in > the NVMe scatter gather lists and a desire to preallocate as much as > possible up front. Most NVMe drivers have maximum transfer sizes between > 128k and 1MB, with larger being the trend. > > The mp[rs] drivers can use larger MAXPHYS, though resource limitations on > some cards hamper bumping it beyond about 2MB. > > The AHCI driver is happy with 1MB and larger sizes. > > Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too large > even for our needs due to limiting factors in the upper layers making it > hard to schedule I/Os larger than 3-4MB reliably. > > So this should be a relatively low risk, and high benefit. > > I don't think other kernel tunables need to change, but I always run into > trouble with runningbufs :) > > Comments? Anything I forgot? > Seems like a good idea to me. I tried 1MB a few months ago and saw no problems, although that change had little effect on transfers to my SSDs. Still, it could be useful for spinning rust. -- Gary Jennejohn From owner-freebsd-arch@freebsd.org Fri Nov 13 19:09:48 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 122D52EB28C for ; Fri, 13 Nov 2020 19:09:48 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CXp1l1J96z4mCB for ; Fri, 13 Nov 2020 19:09:46 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 0ADJ9bHs025664 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Fri, 13 Nov 2020 21:09:40 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 0ADJ9bHs025664 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 0ADJ9bhc025663; Fri, 13 Nov 2020 21:09:37 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 13 Nov 2020 21:09:37 +0200 From: Konstantin Belousov To: Warner Losh Cc: "freebsd-arch@freebsd.org" Subject: Re: MAXPHYS bump for FreeBSD 13 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on tom.home X-Rspamd-Queue-Id: 4CXp1l1J96z4mCB X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=gmail.com (policy=none); spf=softfail (mx1.freebsd.org: 2001:470:d5e7:1::1 is neither permitted nor denied by domain of kostikbel@gmail.com) smtp.mailfrom=kostikbel@gmail.com X-Spamd-Result: default: False [-1.00 / 15.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; DMARC_POLICY_SOFTFAIL(0.10)[gmail.com : No valid SPF, No valid DKIM,none]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[2001:470:d5e7:1::1:from]; R_SPF_SOFTFAIL(0.00)[~all:c]; NEURAL_SPAM_SHORT(1.00)[0.999]; SPAMHAUS_ZRD(0.00)[2001:470:d5e7:1::1:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; MIME_TRACE(0.00)[0:+]; MAILMAN_DEST(0.00)[freebsd-arch]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Nov 2020 19:09:48 -0000 On Fri, Nov 13, 2020 at 11:33:30AM -0700, Warner Losh wrote: > Greetings, > > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os that > we normally use (though there are exceptions). > > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping > DFLTPHYS to 1MB. > > 128k was good back in the 90s/2000s when memory was smaller, drives did > smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O > devices can easily do 1MB or more and there's performance benefits from > scheduling larger I/Os. > > Bumping this will mean larger struct buf and struct bio. Without some > concerted effort, it's hard to make this be a sysctl tunable. While that's > desirable, perhaps, it shouldn't gate this bump. The increase in size for > 1MB is modest enough. To put the specific numbers, for struct buf it means increase by 1792 bytes. For bio it does not, because it does not embed vm_page_t[] into the structure. Worse, typical struct buf addend for excess vm_page pointers is going to be unused, because normal size of the UFS block is 32K. It is going to be only used by clusters and physbufs. So I object against bumping this value without reworking buffers handling of b_pages[]. Most straightforward approach is stop using MAXPHYS to size this array, and use external array for clusters. Pbufs can embed large array. > > The NVMe driver currently is limited to 1MB transfers due to limitations in > the NVMe scatter gather lists and a desire to preallocate as much as > possible up front. Most NVMe drivers have maximum transfer sizes between > 128k and 1MB, with larger being the trend. > > The mp[rs] drivers can use larger MAXPHYS, though resource limitations on > some cards hamper bumping it beyond about 2MB. > > The AHCI driver is happy with 1MB and larger sizes. > > Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too large > even for our needs due to limiting factors in the upper layers making it > hard to schedule I/Os larger than 3-4MB reliably. > > So this should be a relatively low risk, and high benefit. > > I don't think other kernel tunables need to change, but I always run into > trouble with runningbufs :) > > Comments? Anything I forgot? > > Warner > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@freebsd.org Fri Nov 13 22:47:15 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 8A87E2EF91C for ; Fri, 13 Nov 2020 22:47:15 +0000 (UTC) (envelope-from grog@lemis.com) Received: from lax.lemis.com (www.lemis.com [45.32.70.18]) by mx1.freebsd.org (Postfix) with ESMTP id 4CXtrf2gJKz3GNt for ; Fri, 13 Nov 2020 22:47:13 +0000 (UTC) (envelope-from grog@lemis.com) Received: from eureka.lemis.com (aussie-gw.lemis.com [167.179.139.35]) by lax.lemis.com (Postfix) with ESMTP id AA92B280A0; Fri, 13 Nov 2020 22:47:06 +0000 (UTC) Received: by eureka.lemis.com (Postfix, from userid 1004) id 0CC5626359C; Sat, 14 Nov 2020 09:47:06 +1100 (AEDT) Date: Sat, 14 Nov 2020 09:47:06 +1100 From: Greg 'groggy' Lehey To: Warner Losh Cc: "freebsd-arch@freebsd.org" Subject: Re: MAXPHYS bump for FreeBSD 13 Message-ID: <20201113224705.GK99027@eureka.lemis.com> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="4XBiSqiiHCIj6zoM" Content-Disposition: inline In-Reply-To: Organization: The FreeBSD Project Phone: +61-3-5309-0418 Mobile: +61-490-494-038. Use only as instructed. WWW-Home-Page: http://www.FreeBSD X-PGP-Fingerprint: 9A1B 8202 BCCE B846 F92F 09AC 22E6 F290 507A 4223 User-Agent: Mutt/1.6.1 (2016-04-27) X-Rspamd-Queue-Id: 4CXtrf2gJKz3GNt X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of grog@lemis.com has no SPF policy when checking 45.32.70.18) smtp.mailfrom=grog@lemis.com X-Spamd-Result: default: False [-3.79 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; TO_DN_SOME(0.00)[]; HAS_ORG_HEADER(0.00)[]; NEURAL_HAM_SHORT(-0.99)[-0.988]; RCPT_COUNT_TWO(0.00)[2]; SIGNED_PGP(-2.00)[]; FORGED_SENDER(0.30)[grog@FreeBSD.org,grog@lemis.com]; RCVD_NO_TLS_LAST(0.10)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RBL_DBL_DONT_QUERY_IPS(0.00)[45.32.70.18:from]; ASN(0.00)[asn:20473, ipnet:45.32.64.0/19, country:US]; R_DKIM_NA(0.00)[]; FROM_NEQ_ENVFROM(0.00)[grog@FreeBSD.org,grog@lemis.com]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FREEFALL_USER(0.00)[grog]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.20)[multipart/signed,text/plain]; DMARC_NA(0.00)[FreeBSD.org]; AUTH_NA(1.00)[]; SPAMHAUS_ZRD(0.00)[45.32.70.18:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; R_SPF_NA(0.00)[no SPF record]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-arch] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Nov 2020 22:47:15 -0000 --4XBiSqiiHCIj6zoM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Friday, 13 November 2020 at 11:33:30 -0700, Warner Losh wrote: > Greetings, > > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os that > we normally use (though there are exceptions). > > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping > DFLTPHYS to 1MB. Sounds long overdue to me. Greg -- Sent from my desktop computer. See complete headers for address and phone numbers. This message is digitally signed. If your Microsoft mail program reports problems, please read http://lemis.com/broken-MUA --4XBiSqiiHCIj6zoM Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAl+vDOkACgkQIubykFB6QiNZmQCeKbI9hCL5ovFpbbQ9tOKjHeLm DUAAmwVgT6Va8BEq9kzROkhUIte2xNbY =B8aZ -----END PGP SIGNATURE----- --4XBiSqiiHCIj6zoM-- From owner-freebsd-arch@freebsd.org Fri Nov 13 23:06:56 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 8F97F2EFF99 for ; Fri, 13 Nov 2020 23:06:56 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from outbound5a.ore.mailhop.org (outbound5a.ore.mailhop.org [44.233.67.66]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CXvHM4RjJz3Hsh for ; Fri, 13 Nov 2020 23:06:55 +0000 (UTC) (envelope-from ian@freebsd.org) ARC-Seal: i=1; a=rsa-sha256; t=1605308809; cv=none; d=outbound.mailhop.org; s=arc-outbound20181012; b=nlATxSZY6SSIcDyKdSRo6Z4RAiuBO3sJkFkD5BtO6lk2aH0exI8dpfZewFzHM3+0Ma8n4rWavuY1S ytuzAz/zybKfn8f/8RxWrJJyFi37dL/iFT2zU0Sq8huei7DtWH5IGBFhMhYYWU30ru3fY8e8trUO5m sGcrPzCuR2cgyfYSQsdqNZiCgRf9XchYc9x4HdxieTLW/V2tOkOzPJyDEc84xWf0gBfmi7noD/h8Cv XxxT/DMcfMaZtDbWtHILNVtw2jmtrcMcf2zP+PmIQg5I7rb90MtpqCzafajJQ9uvORgoG7pS1xobuB saAumLu+6+mxtKtqPfKoY9kTB4KQ0WA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=outbound.mailhop.org; s=arc-outbound20181012; h=content-transfer-encoding:mime-version:content-type:references:in-reply-to: date:to:from:subject:message-id:dkim-signature:from; bh=yV5Th6jVnQpch4dw8NDr75809TtJ5L9UMZBd/9ftQsQ=; b=LLqINNYTRr/6nPOQgPquNP9JvhSTDcykawj+P1YUaAB+Y+AA+Y5+TopDIkn2oLLRlwdN6tkBGze9F 46OUlZdLYVF3xN02bCBgSjtyDQlLMqxA1kLAVVKVzOcTTvBMiFvcdt3VfjF581o7vlVHlXHi1Ciepg hI7Jnl9PDPcus38f9ibamk/pvHZgps9iB52dCnAk/naFeInfSGl2iz6PWykjPau+1tE+czHBt8xMLt cEzetsyfo5D/WCfw5pQBgKGwiWQWAiDV/pzSr/z376VL8gvbpOqgwWRWJ0fqXB+kw1Ipd/mrjTlVwu BHJi+1izbuBPWaeux394W5dBJcFixlQ== ARC-Authentication-Results: i=1; outbound3.ore.mailhop.org; spf=softfail smtp.mailfrom=freebsd.org smtp.remote-ip=67.177.211.60; dmarc=none header.from=freebsd.org; arc=none header.oldest-pass=0; DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outbound.mailhop.org; s=dkim-high; h=content-transfer-encoding:mime-version:content-type:references:in-reply-to: date:to:from:subject:message-id:from; bh=yV5Th6jVnQpch4dw8NDr75809TtJ5L9UMZBd/9ftQsQ=; b=SnjaFgVEOGhAfJXC+eqZSrLSO6shFuuf11h/k/cdZM2OPP0PdAb9OTxObVpPvDII69eJ9qXaJ/XFo JZH/wIJB5fsGRGnu4qnrV1LYCskR0ZWWnQmwfb6RFngwdGEiQAqR44yxtMbhxE9Q3lGFbopfzAfS7w ffxMMcd0JATz+0Pz2M2YaDYNTXMqEY2WMJXoYglVCs5jIomgyLTZKqMbq0rlwXPoF5mg4hvsEDpFML utwuZKwLDB0ytPt/hk9CXOICIp6h0lEK4KLKaP80c0zJhklu3/ubkaxvwo+5Nu5eOIcK+t/XcZEMDZ WGPcBnylvlllZu7m4inBVP0xU5TP0Yg== X-MHO-RoutePath: aGlwcGll X-MHO-User: e737494d-2604-11eb-8b38-614106969e8d X-Report-Abuse-To: https://support.duocircle.com/support/solutions/articles/5000540958-duocircle-standard-smtp-abuse-information X-Originating-IP: 67.177.211.60 X-Mail-Handler: DuoCircle Outbound SMTP Received: from ilsoft.org (c-67-177-211-60.hsd1.co.comcast.net [67.177.211.60]) by outbound3.ore.mailhop.org (Halon) with ESMTPSA id e737494d-2604-11eb-8b38-614106969e8d; Fri, 13 Nov 2020 23:06:47 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.15.2/8.15.2) with ESMTP id 0ADN6ktN028319; Fri, 13 Nov 2020 16:06:46 -0700 (MST) (envelope-from ian@freebsd.org) Message-ID: <7ff70bea498bf4ec037266ec08b4224c55f76ef3.camel@freebsd.org> Subject: Re: MAXPHYS bump for FreeBSD 13 From: Ian Lepore To: Warner Losh , "freebsd-arch@freebsd.org" Date: Fri, 13 Nov 2020 16:06:46 -0700 In-Reply-To: References: Content-Type: text/plain; charset="ASCII" X-Mailer: Evolution 3.28.5 FreeBSD GNOME Team Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4CXvHM4RjJz3Hsh X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [0.00 / 15.00]; local_wl_from(0.00)[freebsd.org]; ASN(0.00)[asn:16509, ipnet:44.224.0.0/11, country:US] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Nov 2020 23:06:56 -0000 On Fri, 2020-11-13 at 11:33 -0700, Warner Losh wrote: > Greetings, > > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os that > we normally use (though there are exceptions). > > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping > DFLTPHYS to 1MB. > > 128k was good back in the 90s/2000s when memory was smaller, drives did > smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O > devices can easily do 1MB or more and there's performance benefits from > scheduling larger I/Os. > > Bumping this will mean larger struct buf and struct bio. Without some > concerted effort, it's hard to make this be a sysctl tunable. While that's > desirable, perhaps, it shouldn't gate this bump. The increase in size for > 1MB is modest enough. > > The NVMe driver currently is limited to 1MB transfers due to limitations in > the NVMe scatter gather lists and a desire to preallocate as much as > possible up front. Most NVMe drivers have maximum transfer sizes between > 128k and 1MB, with larger being the trend. > > The mp[rs] drivers can use larger MAXPHYS, though resource limitations on > some cards hamper bumping it beyond about 2MB. > > The AHCI driver is happy with 1MB and larger sizes. > > Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too large > even for our needs due to limiting factors in the upper layers making it > hard to schedule I/Os larger than 3-4MB reliably. > > So this should be a relatively low risk, and high benefit. > > I don't think other kernel tunables need to change, but I always run into > trouble with runningbufs :) > > Comments? Anything I forgot? > > Warner > Will this have any negative implications for embedded systems running slow storage such as sdcard? -- Ian From owner-freebsd-arch@freebsd.org Sat Nov 14 00:41:01 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 04F6746359E for ; Sat, 14 Nov 2020 00:41:01 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CXxMv4Z4Yz3Nw5 for ; Sat, 14 Nov 2020 00:40:59 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-qk1-x743.google.com with SMTP id d9so11080678qke.8 for ; Fri, 13 Nov 2020 16:40:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jZLfUR3aI4oqj6Iy3N+s5NibQ8PEzx0hr49Ii0uYbmw=; b=0bl1jz9ulPutfl1+rwqQwCPW4iOVEZNsh+oPQ2Qkn2+yd0OvbV8zXxyOUt7/Jemb4V oS8oJOB2AaAUUdVT4nqrqDTaYB21puYk0f3oj7fZUw43+aqA+17S8CouSUH5Lp1KR70U fFFskt5l2edq1YHkIoFk1zb5mLoNziv9RH8BOiDSDHgOw9WuRJBPp5thJjC5iH6iPzif ARGW5XwUdbEV9GxgrMIdJTOYSs2kawBfwUqjv5VYxYprgpPA8mTfDYPSg9PRrnoIUS+w 0WgX+63A0h4jZYw8VJpoghai0bXpG9r688A5rp4/zBzGxt2rpLCoS1I+U7Vn4IXifyns mZ/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jZLfUR3aI4oqj6Iy3N+s5NibQ8PEzx0hr49Ii0uYbmw=; b=NLArrdeRoER8CxJ1oN5NGmYVeum4mvC+qCrOdVwAdJhtKYBQpbDX+XEMU2clE09X91 gWfkPK0R+Y2d99b5rMBivjhKRQO1Wcjzyrihr7kEhTxq6TOImXsEUhnsz9zr7+1af98Y hnbGQvUXnfCVF0HuJWGblUd+MSZWiJ7i1SyiLzuJhmkzhovKjYzklVer1bAt67PurK2m sEbryoYlQbHuG5hEO25i5rJAVAadvuQMBpMctHpGFsyskijB75V4PGBUjRf8y5jwTxHG oeak+gBtcK6j6tJqnhp3HI5VmLOcYL4iiPQWPI7LQKQsveJVXfR+QcVa99BGclfVTozI eh3A== X-Gm-Message-State: AOAM533opIsRaXcY/mm7GZocH4cu7PHvqDIOf/Z3YCi7nRj/Tn6TN549 lfGfR294sI85g+Hx2QLNnrsJwx5f+AAHsc3fUENzWSdFpccGNw== X-Google-Smtp-Source: ABdhPJwmE1u9pjeSGsxCWwF/m7YhwhBDfJMEG0MXqTQ4cS7g4nL28xBoLLlDx1+YDG6E4ymDICwpv7WxaJxM8R7bTwI= X-Received: by 2002:a37:6307:: with SMTP id x7mr4641785qkb.195.1605314458738; Fri, 13 Nov 2020 16:40:58 -0800 (PST) MIME-Version: 1.0 References: <7ff70bea498bf4ec037266ec08b4224c55f76ef3.camel@freebsd.org> In-Reply-To: <7ff70bea498bf4ec037266ec08b4224c55f76ef3.camel@freebsd.org> From: Warner Losh Date: Fri, 13 Nov 2020 17:40:47 -0700 Message-ID: Subject: Re: MAXPHYS bump for FreeBSD 13 To: Ian Lepore Cc: "freebsd-arch@freebsd.org" X-Rspamd-Queue-Id: 4CXxMv4Z4Yz3Nw5 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20150623.gappssmtp.com header.s=20150623 header.b=0bl1jz9u; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::743) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-3.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20150623.gappssmtp.com:s=20150623]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-arch@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; SPAMHAUS_ZRD(0.00)[2607:f8b0:4864:20::743:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20150623.gappssmtp.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::743:from]; NEURAL_HAM_SHORT(-1.00)[-1.000]; R_SPF_NA(0.00)[no SPF record]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RBL_DBL_DONT_QUERY_IPS(0.00)[2607:f8b0:4864:20::743:from]; RCVD_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; MAILMAN_DEST(0.00)[freebsd-arch] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.34 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 00:41:01 -0000 On Fri, Nov 13, 2020 at 4:06 PM Ian Lepore wrote: > On Fri, 2020-11-13 at 11:33 -0700, Warner Losh wrote: > > Greetings, > > > > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os > that > > we normally use (though there are exceptions). > > > > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping > > DFLTPHYS to 1MB. > > > > 128k was good back in the 90s/2000s when memory was smaller, drives did > > smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O > > devices can easily do 1MB or more and there's performance benefits from > > scheduling larger I/Os. > > > > Bumping this will mean larger struct buf and struct bio. Without some > > concerted effort, it's hard to make this be a sysctl tunable. While > that's > > desirable, perhaps, it shouldn't gate this bump. The increase in size for > > 1MB is modest enough. > > > > The NVMe driver currently is limited to 1MB transfers due to limitations > in > > the NVMe scatter gather lists and a desire to preallocate as much as > > possible up front. Most NVMe drivers have maximum transfer sizes between > > 128k and 1MB, with larger being the trend. > > > > The mp[rs] drivers can use larger MAXPHYS, though resource limitations on > > some cards hamper bumping it beyond about 2MB. > > > > The AHCI driver is happy with 1MB and larger sizes. > > > > Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too > large > > even for our needs due to limiting factors in the upper layers making it > > hard to schedule I/Os larger than 3-4MB reliably. > > > > So this should be a relatively low risk, and high benefit. > > > > I don't think other kernel tunables need to change, but I always run into > > trouble with runningbufs :) > > > > Comments? Anything I forgot? > > > > Warner > > > > Will this have any negative implications for embedded systems running > slow storage such as sdcard? > It will work. If you have memory pressure, you may need to compile with a smaller MAXPHYS. The savings is about 1700 bytes per struct buf. Warner From owner-freebsd-arch@freebsd.org Sat Nov 14 01:23:32 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id C64014658A3 for ; Sat, 14 Nov 2020 01:23:32 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CXyJz47PFz3j0c for ; Sat, 14 Nov 2020 01:23:31 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id 6521F5C00E9; Fri, 13 Nov 2020 20:23:30 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Fri, 13 Nov 2020 20:23:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsco.org; h= content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s=fm1; bh=e wsocnK+HCJFZuKX+fAKuWJBINFirJ7JdDHCEa5t8SA=; b=ammi3UoQVEj7bdWS9 qpkfOPHZiMBggVAeFzTEyvzLGTeOg1HvHDczIhgtUoDs/vDfT89j7/zpZb+XRkMf IVKHeG3WXZWTYZUObNnpbOZB7+h4al65raF1w95IqzTUorQFoFD0ZAxamQVsmCIW RzVmdCo4yx3lvGRePqvb5ryLIkgi2u4fR2G6vy1mZD8D0Lae91pVD9w0t4RceKUK 56gy5tFXOxuPbZ9YikMZK4xMrYBNsGoOEbRS3GkbUK8bCDyoeYIoODzgvin3+Q/R 4Dpp9gbEhNzHmcmXiRfPQVPaEeFz3G0nGvix1GyP60Iug+QDvnzvXPKyZwl8XLJT XoCFQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=ewsocnK+HCJFZuKX+fAKuWJBINFirJ7JdDHCEa5t8 SA=; b=lgN/nbgF5tcG82MHvqJMq0bPZv99e3me9IFedxkRAayAaCfymk84SGn8S fv2ObuZDZqYiYS/G+55CY+kwfXo7DgQs1MYofTCyQio6IC9AhJkLaeFr6VubADgj Nm4I4XljdLiXrw0B0SdQOGQOaY5yqWIXE45K8NaUFLKI2zkgwRsgSrpMvIpbEYnh /83GO31a1+GyNolN+y3xKWzjh6g7fk/erwRSwX3fpwaRBp/hDGnFkfZOYsz3gzN2 84FoLi7a/vt1b3OvToZF3bIt6eZuM/h4FMZIF+o/32YKh8pI1ACPoff4IfP8RSFN DermW5uDI1Atl0zmne1mmkwev5z+Q== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedruddviedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpegtggfuhfgjfffgkfhfvffosehtqh hmtdhhtdejnecuhfhrohhmpefutghothhtucfnohhnghcuoehstghothhtlhesshgrmhhs tghordhorhhgqeenucggtffrrghtthgvrhhnpeduudevkeehheeiudekkeelleevudefve eftedugfdtffetffelheehffeufffgheenucffohhmrghinhepfhhrvggvsghsugdrohhr ghenucfkphepkedrgeeirdekledrvddufeenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpehstghothhtlhesshgrmhhstghordhorhhg X-ME-Proxy: Received: from [192.168.0.114] (unknown [8.46.89.213]) by mail.messagingengine.com (Postfix) with ESMTPA id CE3A63280059; Fri, 13 Nov 2020 20:23:29 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: MAXPHYS bump for FreeBSD 13 From: Scott Long In-Reply-To: Date: Fri, 13 Nov 2020 18:23:29 -0700 Cc: "freebsd-arch@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: <926C3A98-03BF-46FD-9B22-9EFBDC0F44A4@samsco.org> References: To: Warner Losh X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Rspamd-Queue-Id: 4CXyJz47PFz3j0c X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=samsco.org header.s=fm1 header.b=ammi3UoQ; dkim=pass header.d=messagingengine.com header.s=fm1 header.b=lgN/nbgF; dmarc=none; spf=pass (mx1.freebsd.org: domain of scottl@samsco.org designates 66.111.4.26 as permitted sender) smtp.mailfrom=scottl@samsco.org X-Spamd-Result: default: False [-3.10 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ip4:66.111.4.26]; RWL_MAILSPIKE_GOOD(0.00)[66.111.4.26:from]; RCVD_COUNT_THREE(0.00)[4]; DKIM_TRACE(0.00)[samsco.org:+,messagingengine.com:+]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[66.111.4.26:from]; ASN(0.00)[asn:11403, ipnet:66.111.4.0/24, country:US]; MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_IN_DNSWL_LOW(-0.10)[66.111.4.26:from]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[samsco.org:s=fm1,messagingengine.com:s=fm1]; FREEFALL_USER(0.00)[scottl]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[samsco.org]; SPAMHAUS_ZRD(0.00)[66.111.4.26:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MAILMAN_DEST(0.00)[freebsd-arch] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 01:23:32 -0000 I have mixed feelings on this. The Netflix workload isn=E2=80=99t = typical, and this change represents a fairly substantial increase in memory usage for bufs. It=E2=80=99s also a config tunable, so it=E2=80=99s not like this = represents a meaningful diff reduction for Netflix. The upside is that it will likely help benchmarks out of the box. Is = that enough of an upside for the downsides of memory pressure on small memory and high iops systems? I=E2=80=99m not convinced. I really would like = to see the years of talk about fixing this correctly put into action. Scott > On Nov 13, 2020, at 11:33 AM, Warner Losh wrote: >=20 > Greetings, >=20 > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os = that > we normally use (though there are exceptions). >=20 > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping > DFLTPHYS to 1MB. >=20 > 128k was good back in the 90s/2000s when memory was smaller, drives = did > smaller I/Os, etc. Now, however, it doesn't make much sense. Modern = I/O > devices can easily do 1MB or more and there's performance benefits = from > scheduling larger I/Os. >=20 > Bumping this will mean larger struct buf and struct bio. Without some > concerted effort, it's hard to make this be a sysctl tunable. While = that's > desirable, perhaps, it shouldn't gate this bump. The increase in size = for > 1MB is modest enough. >=20 > The NVMe driver currently is limited to 1MB transfers due to = limitations in > the NVMe scatter gather lists and a desire to preallocate as much as > possible up front. Most NVMe drivers have maximum transfer sizes = between > 128k and 1MB, with larger being the trend. >=20 > The mp[rs] drivers can use larger MAXPHYS, though resource limitations = on > some cards hamper bumping it beyond about 2MB. >=20 > The AHCI driver is happy with 1MB and larger sizes. >=20 > Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too = large > even for our needs due to limiting factors in the upper layers making = it > hard to schedule I/Os larger than 3-4MB reliably. >=20 > So this should be a relatively low risk, and high benefit. >=20 > I don't think other kernel tunables need to change, but I always run = into > trouble with runningbufs :) >=20 > Comments? Anything I forgot? >=20 > Warner > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to = "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@freebsd.org Sat Nov 14 02:16:43 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B3B392D0D5D for ; Sat, 14 Nov 2020 02:16:43 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CXzVL5jFvz3nmj for ; Sat, 14 Nov 2020 02:16:42 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-qt1-x82b.google.com with SMTP id p12so8657384qtp.7 for ; Fri, 13 Nov 2020 18:16:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6IpkEO4mAzLLgxUbNBPn4Hsk11PvMnIuwcLloBIP6Rg=; b=phzlYqWlQPYNb+zn3YFza2ZAjlAO26aUw34ztQzNtAoTwy73jRd+YU5fxC9oEQaXT9 h0jwZl/RO5H578DH3DPdH/FT3VdP/NmXw/OJTw1Qi8pEIXOqDcMP363lkyQKzQUBwa4B jHpheq1iTJk6QNuyJrcD1huW/0K2y7h4R0Luy5Z3W9JTfE65Oyz5lwa/HbxGxWwDelzy 6nAFU8XAb7bvcjif0Ml4wwvrjNynMJaddkwzA5kZJ2oXFqDT0NpqfaVj+BJpZXnwH2pV v91lSvy/8xWwPYETsSt+3rF1U3xLd6w6gbvohKSZPFhMIXdTjYKqaRzGbEVJ1yWBkFww YgYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6IpkEO4mAzLLgxUbNBPn4Hsk11PvMnIuwcLloBIP6Rg=; b=mctRoR5A/p6wFW+yRjPNXdtVakL0aaj98yOpYtPK/92iuGxfQfvIjDoiSCMmZfZYdX rlWAB/kD8kKlPQodS7ol4KaBsIh+2xNzg3aw2jOPKJAajWac9r0WaduBCVE7yMfdotfX VgDb+ybrWxx5pUK9oZ8rdZdapVi06K1uBC8OAUXhEz1pwinF1JjwZn1PCX/3ExsKbtbB uSuWaTQM1WE36hOPXaSiQCJZijCaYieGB8MSc/dM+PF1GksPwY1fWi6ShJTAU0q6H7Zc rAaComopzKZG2FxqjjDBWVWJFeFQAPTWTqmKetqVu/U4CAc6jKad1e5kSG797w4K3vKO 2Ppw== X-Gm-Message-State: AOAM532mT8sXW0Uf5xtlD5sLzF4wUXvwoJU/XiJRh+AbqaP2AFw2fTkE xVGfRXu3vxwSVKvcuC6BysOAUyoRAcFKTGXmA7/6jAwK145qgw== X-Google-Smtp-Source: ABdhPJzEmB7Lz5V3Dg0+gH1gD+L6Es3KHAOO272x/3G0rbHR3dL7vSCIVZtrOkmCKlXohPDq03YmwUd6KetkfZeyUa4= X-Received: by 2002:ac8:7619:: with SMTP id t25mr4844541qtq.244.1605320201436; Fri, 13 Nov 2020 18:16:41 -0800 (PST) MIME-Version: 1.0 References: <926C3A98-03BF-46FD-9B22-9EFBDC0F44A4@samsco.org> In-Reply-To: <926C3A98-03BF-46FD-9B22-9EFBDC0F44A4@samsco.org> From: Warner Losh Date: Fri, 13 Nov 2020 19:16:30 -0700 Message-ID: Subject: Re: MAXPHYS bump for FreeBSD 13 To: Scott Long Cc: "freebsd-arch@freebsd.org" X-Rspamd-Queue-Id: 4CXzVL5jFvz3nmj X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20150623.gappssmtp.com header.s=20150623 header.b=phzlYqWl; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::82b) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-3.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20150623.gappssmtp.com:s=20150623]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-arch@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; SPAMHAUS_ZRD(0.00)[2607:f8b0:4864:20::82b:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20150623.gappssmtp.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::82b:from]; NEURAL_HAM_SHORT(-1.00)[-0.999]; R_SPF_NA(0.00)[no SPF record]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RBL_DBL_DONT_QUERY_IPS(0.00)[2607:f8b0:4864:20::82b:from]; RCVD_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; MAILMAN_DEST(0.00)[freebsd-arch] Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.34 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 02:16:43 -0000 On Fri, Nov 13, 2020 at 6:23 PM Scott Long wrote: > I have mixed feelings on this. The Netflix workload isn=E2=80=99t typica= l, and > this > change represents a fairly substantial increase in memory usage for > bufs. It=E2=80=99s also a config tunable, so it=E2=80=99s not like this = represents a > meaningful > diff reduction for Netflix. > This isn't motivated at all by Netflix's work load nor any needs to minimize diffs at all. In fact, Netflix had nothing to do with the proposal apart from me writing it up. This is motivated more by the needs of more people to do larger I/Os than 128k, though maybe 1MB is too large. Alexander Motin proposed it today during the Vendor Summit and I wrote up the idea for arch@. The upside is that it will likely help benchmarks out of the box. Is that > enough of an upside for the downsides of memory pressure on small memory > and high iops systems? I=E2=80=99m not convinced. I really would like t= o see the > years of talk about fixing this correctly put into action. > I'd love years of inaction to end too. I'd also like FreeBSD to perform a bit better out of the box. Would your calculation have changed had the size been 256k or 512k? Both those options use/waste substantially fewer bytes per I/O than 1MB. Warner > Scott > > > > On Nov 13, 2020, at 11:33 AM, Warner Losh wrote: > > > > Greetings, > > > > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os > that > > we normally use (though there are exceptions). > > > > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping > > DFLTPHYS to 1MB. > > > > 128k was good back in the 90s/2000s when memory was smaller, drives did > > smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O > > devices can easily do 1MB or more and there's performance benefits from > > scheduling larger I/Os. > > > > Bumping this will mean larger struct buf and struct bio. Without some > > concerted effort, it's hard to make this be a sysctl tunable. While > that's > > desirable, perhaps, it shouldn't gate this bump. The increase in size f= or > > 1MB is modest enough. > > > > The NVMe driver currently is limited to 1MB transfers due to limitation= s > in > > the NVMe scatter gather lists and a desire to preallocate as much as > > possible up front. Most NVMe drivers have maximum transfer sizes betwee= n > > 128k and 1MB, with larger being the trend. > > > > The mp[rs] drivers can use larger MAXPHYS, though resource limitations = on > > some cards hamper bumping it beyond about 2MB. > > > > The AHCI driver is happy with 1MB and larger sizes. > > > > Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too > large > > even for our needs due to limiting factors in the upper layers making i= t > > hard to schedule I/Os larger than 3-4MB reliably. > > > > So this should be a relatively low risk, and high benefit. > > > > I don't think other kernel tunables need to change, but I always run in= to > > trouble with runningbufs :) > > > > Comments? Anything I forgot? > > > > Warner > > _______________________________________________ > > freebsd-arch@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > From owner-freebsd-arch@freebsd.org Sat Nov 14 04:14:11 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id CF80A2D4F36 for ; Sat, 14 Nov 2020 04:14:11 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ot1-x334.google.com (mail-ot1-x334.google.com [IPv6:2607:f8b0:4864:20::334]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CY25t2G0gz3wKN for ; Sat, 14 Nov 2020 04:14:10 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: by mail-ot1-x334.google.com with SMTP id f16so10834688otl.11 for ; Fri, 13 Nov 2020 20:14:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:to:from:subject:autocrypt:cc:message-id:date:user-agent :mime-version:content-language:content-transfer-encoding; bh=YRDCpEZkZ1ytb6HGL8rmE605MH0kJjiSb/URoGQ/V3U=; b=RgWM26pLSnPInOeiymPF+dBrVZjWAWXHZWAugXS/6RseItHHCqFtCuThqF3MFPjo9U s8iLVnifkb7M7aFEBU9LOMlUOQlDach27GaT5FLX2fQq449o5gudaDZ3Uz3kJNzLtP+n O3904xjt/HMU6Fk7IcGw1P7d4cCxfInhnmSLblmbLtfyqwAefWRDS1JIroSdbR7BvXTT MiObAcFXxxq4aZpLBgip4frFSOZ3YVMnO7ogvZx23pa9liqm/jQWLmjM78+q8neskpbt q76fdthBaIO8pEVR6RD4hEn1yA8AECR5+Z/IioN9cyWVMLGqJlSnlrKUXbvhO3mMBBiK SZ8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:to:from:subject:autocrypt:cc:message-id :date:user-agent:mime-version:content-language :content-transfer-encoding; bh=YRDCpEZkZ1ytb6HGL8rmE605MH0kJjiSb/URoGQ/V3U=; b=RI71QY1M4xA2uIYiY7RJp1h5t1hivdo8+cagn1JjOp/uHjCI1ZWdLH9a2Rc7qEvzKz hRhnudK7FIu37LP1NG4jqShwguR2zc3PlINjzKGHSoFPvmhtooMycmt1SNt3gN2zwPqV dEXqfljBUM/2K9HIMC65nU4VRvahAQwQsXJlOXoGtA9/l5RIdEmGt5clJmq6Po4KEhEt FP/Hj3VO6brSrY2b6hoO1tYvyfd1s75qzNBWI+QYyogDWDFmVp5krMb/v0IFTceH6vI2 EYd7HNaLGN9vFFGMvQYmP2X+d70tG14ER+NQUpToIt/X6IM8H4sIGpDwR3nvYrhrOH+C UWXQ== X-Gm-Message-State: AOAM533JrV5VAsFXOYciIQEQRDh4nHUlJiUw7/DtwjTy1SjA/s5+1tIq YEGGUesGJa2Yq6KWN53gW2CEqMfhX4dGrQ== X-Google-Smtp-Source: ABdhPJyikHhVK23zYriuHJ46KPC2zNV/2tNMxJS5eprmfYnl3ceI0mX7CQZvrun5BvrKtB/o3V3T1g== X-Received: by 2002:a9d:192d:: with SMTP id j45mr3665994ota.207.1605327248970; Fri, 13 Nov 2020 20:14:08 -0800 (PST) Received: from spectre.mavhome.dp.ua ([2600:1700:3580:3560:228:f8ff:fe04:d12]) by smtp.gmail.com with ESMTPSA id n26sm2609323oop.18.2020.11.13.20.14.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 13 Nov 2020 20:14:08 -0800 (PST) Sender: Alexander Motin To: "freebsd-arch@freebsd.org" From: Alexander Motin Subject: Re: MAXPHYS bump for FreeBSD 13 Autocrypt: addr=mav@FreeBSD.org; prefer-encrypt=mutual; keydata= mQENBFOzxAwBCADkPrax0pI2W/ig0CK9nRJJwsHitAGEZ2HZiFEuti+6/4UVxj81yr4ak/4g 9bKUyC7rMEAp/ZHNhd+MFCPAAcHPvtovnfykqE/vuosCS3wlSLloix2iKVLks0CwbLHGAyne 46lTQW74Xl/33c3W1Z6d8jD9gVFT/xaVzZ0U9xdzOmsYAZaAj4ki0tuxO9F7L+ct9grRe7iP g8t9hai7BL4ee3VRwk2JXnKb7UvBiVITKYWKz1jRvZIrjPokgEcCLOSlv7x/1kjuFnj3xWZU 7HSFFT8J93epBbrSSCsYsppIk2fZH41kaaFXsMQfTPH8wkeM6qwrvOh4HiQM08R+9tThABEB AAG0IUFsZXhhbmRlciBNb3RpbiA8bWF2QEZyZWVCU0Qub3JnPokBVwQTAQoAQQIbAwULCQgH AwUVCgkICwUWAwIBAAIeAQIXgAIZARYhBOmM88TmnMPNDledVYMYw5VbqyJ/BQJZYMKuBQkN McyiAAoJEIMYw5VbqyJ/tuUIAOG3ONOSNYqjK4eTZ1TVh9jdUBAhWk5nhDFnODN49Wj0AbYm 7aIqy8O1hnCDSZG5LttjSAo3UfXJZDKQM0BLb0gpRMBnAYqO6tdolLNqAbPGJBnGoPjsh24y 6KcbDaNnis+lD4GwPXwQM+92wZGhCUFElPV9NciZGVS65TNIgk7X+yEjjhD1MSWKKijZ1r9Z zIt4OzUTxxNOvzdlABZS88nNRdJkatOQJPmFdd1mpP6UzTNCiLUo1pIqOEtJgvVVDYq5WHY6 tciWWYdmZG/tIBexJmv2mV2OLVjXR6ZeKmntVH14H72/wRHJuYHQC+r5SVRcWWayrThsY6jZ Yr4+raS5AQ0EU7PEDAEIAOZgWf2cJIu+58IzP2dkXE/urj3tr4OqrB/yHGWUf71Lz6D0Fi6Z AXgDtmcFLGPfMyWuLAvSM+xmoguk7zC4hRBYvQycmIhuqBq1jO1Wp/Z+lpoPM/1cDYLn8Flv mI/c40MhUZh345DA4jYWWaZNjQHUWVQ1fPf595vdVVMPT/abE8E5DaF6fSkRmqFTmfYRkfbt 3ytU8NdUapDcJVY7cEP2nJBVNZPnOIObR/ZIgSxjjrG5o34yXoqeup8JvwEv+/NylzzuyXEZ R1EdEIzQ/a1nh/0j4NXtzZEqKW4aTWlmSqb6wN8jh1OSOOqkYsfnE3nfxcZbxi4IRoNQYlm5 9R8AEQEAAYkBPAQYAQoAJgIbDBYhBOmM88TmnMPNDledVYMYw5VbqyJ/BQJZYMLYBQkNMczM AAoJEIMYw5VbqyJ/TqgH/RQHClkvecE0262lwKoP/m0Mh4I5TLRgoJJn8S7G1BnqohYJkiLq A6xe6urGD7OqdNAl12UbrjWbdJV+zvea3vJoM4MZuYiYrGaXWxzFXqWJcPwMU9sAh8MRghHu uC5vgPb45Tnftw9/+n0i8GfVhQhOqepUGdQg4NPcXviSkoAvig6pp9Lcxisn0groUQKt15Gc sS9YcQWg3j9Hnipc6Mu416HX98Fb113NHJqc2geTHLkRyuBFOoyIqB6N9GKjzOAIzxxsVdl9 TevwGsrp4M4/RFzWbSgsbOnbE7454lmuVZGfReEjnUm8RHp9Q2UWKXlp3exlZjvOp/uVEpCg lz65AQ0EU7PEDAEIAOZgWf2cJIu+58IzP2dkXE/urj3tr4OqrB/yHGWUf71Lz6D0Fi6ZAXgD tmcFLGPfMyWuLAvSM+xmoguk7zC4hRBYvQycmIhuqBq1jO1Wp/Z+lpoPM/1cDYLn8FlvmI/c 40MhUZh345DA4jYWWaZNjQHUWVQ1fPf595vdVVMPT/abE8E5DaF6fSkRmqFTmfYRkfbt3ytU 8NdUapDcJVY7cEP2nJBVNZPnOIObR/ZIgSxjjrG5o34yXoqeup8JvwEv+/NylzzuyXEZR1Ed EIzQ/a1nh/0j4NXtzZEqKW4aTWlmSqb6wN8jh1OSOOqkYsfnE3nfxcZbxi4IRoNQYlm59R8A EQEAAYkBPAQYAQoAJgIbDBYhBOmM88TmnMPNDledVYMYw5VbqyJ/BQJZYMLYBQkNMczMAAoJ EIMYw5VbqyJ/TqgH/RQHClkvecE0262lwKoP/m0Mh4I5TLRgoJJn8S7G1BnqohYJkiLqA6xe 6urGD7OqdNAl12UbrjWbdJV+zvea3vJoM4MZuYiYrGaXWxzFXqWJcPwMU9sAh8MRghHuuC5v gPb45Tnftw9/+n0i8GfVhQhOqepUGdQg4NPcXviSkoAvig6pp9Lcxisn0groUQKt15GcsS9Y cQWg3j9Hnipc6Mu416HX98Fb113NHJqc2geTHLkRyuBFOoyIqB6N9GKjzOAIzxxsVdl9Tevw Gsrp4M4/RFzWbSgsbOnbE7454lmuVZGfReEjnUm8RHp9Q2UWKXlp3exlZjvOp/uVEpCglz4= Message-ID: <1bff381f-3d6e-b20c-28f9-1403a9dfe0f6@FreeBSD.org> Date: Fri, 13 Nov 2020 23:14:07 -0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4CY25t2G0gz3wKN X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=RgWM26pL; dmarc=none; spf=pass (mx1.freebsd.org: domain of mavbsd@gmail.com designates 2607:f8b0:4864:20::334 as permitted sender) smtp.mailfrom=mavbsd@gmail.com X-Spamd-Result: default: False [-0.20 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; FORGED_SENDER(0.30)[mav@FreeBSD.org,mavbsd@gmail.com]; MIME_TRACE(0.00)[0:+]; RBL_DBL_DONT_QUERY_IPS(0.00)[2607:f8b0:4864:20::334:from]; FREEMAIL_ENVFROM(0.00)[gmail.com]; MID_RHS_MATCH_FROM(0.00)[]; FROM_NEQ_ENVFROM(0.00)[mav@FreeBSD.org,mavbsd@gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; ARC_NA(0.00)[]; FAKE_REPLY(1.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(1.00)[1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-arch@freebsd.org]; DMARC_NA(0.00)[FreeBSD.org]; SPAMHAUS_ZRD(0.00)[2607:f8b0:4864:20::334:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::334:from]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-arch] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 04:14:11 -0000 > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os > that we normally use (though there are exceptions). > > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping > DFLTPHYS to 1MB. I am all for the MAXPHYS change, as Warner told it was my proposition on a chat. ZFS uses blocks and aggregates I/O up to 1MB already and can more potentially, and having I/O size lower then this just overflows disk queues, increases processing overheads, complicates scheduling and in some cases causes starvation. I'd just like to note that DFLTPHYS should probably not be changed that straight (if at all), since it is used as a fallback for legacy code. If it is used for anything else -- that should be reviewed and probably migrated to some other constant(s). -- Alexander Motin From owner-freebsd-arch@freebsd.org Sat Nov 14 11:22:50 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 454B82E6F55 for ; Sat, 14 Nov 2020 11:22:50 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [88.99.82.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CYCcT3jf5z4pSZ; Sat, 14 Nov 2020 11:22:49 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2020.home.selasky.org (unknown [178.17.145.105]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id AF5AA260459; Sat, 14 Nov 2020 12:22:41 +0100 (CET) Subject: Re: MAXPHYS bump for FreeBSD 13 To: Alexander Motin , "freebsd-arch@freebsd.org" References: <1bff381f-3d6e-b20c-28f9-1403a9dfe0f6@FreeBSD.org> From: Hans Petter Selasky Message-ID: <07a4ca53-da9d-e7b2-9af3-c5098f15a5c7@selasky.org> Date: Sat, 14 Nov 2020 12:22:36 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:78.0) Gecko/20100101 Thunderbird/78.4.1 MIME-Version: 1.0 In-Reply-To: <1bff381f-3d6e-b20c-28f9-1403a9dfe0f6@FreeBSD.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4CYCcT3jf5z4pSZ X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 88.99.82.50 as permitted sender) smtp.mailfrom=hps@selasky.org X-Spamd-Result: default: False [-3.30 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; MIME_GOOD(-0.10)[text/plain]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net]; RBL_DBL_DONT_QUERY_IPS(0.00)[88.99.82.50:from]; DMARC_NA(0.00)[selasky.org]; SPAMHAUS_ZRD(0.00)[88.99.82.50:from:127.0.2.255]; ARC_NA(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:88.99.0.0/16, country:DE]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-arch] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 11:22:50 -0000 On 11/14/20 5:14 AM, Alexander Motin wrote: >> We currently have a MAXPHYS of 128k. This is the maximum size of I/Os >> that we normally use (though there are exceptions). >> >> I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping >> DFLTPHYS to 1MB. > > I am all for the MAXPHYS change, as Warner told it was my proposition on > a chat. ZFS uses blocks and aggregates I/O up to 1MB already and can > more potentially, and having I/O size lower then this just overflows > disk queues, increases processing overheads, complicates scheduling and > in some cases causes starvation. > > I'd just like to note that DFLTPHYS should probably not be changed that > straight (if at all), since it is used as a fallback for legacy code. > If it is used for anything else -- that should be reviewed and probably > migrated to some other constant(s). > Beware that many USB 2.0 devices will break if you try to transfer more than 64K. Buggy SCSI implementations! --HPS From owner-freebsd-arch@freebsd.org Sat Nov 14 13:26:19 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1616E2EA3AE for ; Sat, 14 Nov 2020 13:26:19 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ot1-x32b.google.com (mail-ot1-x32b.google.com [IPv6:2607:f8b0:4864:20::32b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CYGLy6d4Vz3C48 for ; Sat, 14 Nov 2020 13:26:18 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: by mail-ot1-x32b.google.com with SMTP id y22so11508919oti.10 for ; Sat, 14 Nov 2020 05:26:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:references:from:autocrypt:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Vp0Q4mc0jA3oWMkJKIHdXGEyyTccFD7L0YgGjfPwndk=; b=dJG/kcLBjVj/iNQoWzwtaEPN7B5MRUlodpGufr+kVmcsd+atXy/atudqOnOY/u68J3 NymspMDbxFgC1Tbo3Xt/xhobmvsQv5N26gXYk8pDLk2Tps3uuUdkHkyiia5uv1cU5FEj eCZdITONLoITObB1wEXEXZb+IPoLi0TIu93OwPfjf60Q0VHUWPkCbIHXZ/ATlBDEo8/I ziyWoZgWxMAt2sXs4yHZzpf/OR2Lalan8vdSUUiBslrScTory3/ohIHJaWN00xsUYlVb A6qwNRpglM5Qxty5MFqzgNNNrsIvCm/+3SZQlVm6UCxjb4ve35oofEQaVWhdj4W9D/iq HOYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:references:from:autocrypt :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=Vp0Q4mc0jA3oWMkJKIHdXGEyyTccFD7L0YgGjfPwndk=; b=fIzLAFZ0beNvQYii/5YJ75Zm82NQjoPZl12ygCcBseTflenYkCTkV6a7xiYNk6A32D i2m37h0Am9LlQ1nyjGiuS1z5S2y7sxz58gTlBKjJaPdqPLba8pSSaPvhMaxX25Z58qvk IA6GSlPbF9tAPtbbkH9rqViqyzG2mCoF+kOBPgIiHnQkSlgf9PzdX3OMYvSjwLA4iiBH xPeMchYkns8uo1kWN9EszLcHw/5kBZ/qEfU/L9uETL2r6Nl5oLcKH1H8WQIZk2eIvUEk nB0he1jbhuMBUFrDyl6OKhKq5/MMc7M9JXIY/xxp25G2WC4sR2U7XSgcqsZbyCm/kEYf u4Rw== X-Gm-Message-State: AOAM530JPQVTZLjSntx+EBCof+RwXF87FX1YLhAqfnVgvvT08vu1jvCj 0mzNR7At5IA/ha9ETLscC/foJNLU81j5Tg== X-Google-Smtp-Source: ABdhPJwbi0JbPIgLUXojWenStBSrqsAcHg3uM8FpwdGXY4u9ej9v8l9TgowLKVaC8BlmLqgoBjWG0A== X-Received: by 2002:a9d:2902:: with SMTP id d2mr5205437otb.280.1605360376562; Sat, 14 Nov 2020 05:26:16 -0800 (PST) Received: from spectre.mavhome.dp.ua (104-55-12-234.lightspeed.knvltn.sbcglobal.net. [104.55.12.234]) by smtp.gmail.com with ESMTPSA id z12sm2740071oos.12.2020.11.14.05.26.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 14 Nov 2020 05:26:15 -0800 (PST) Sender: Alexander Motin Subject: Re: MAXPHYS bump for FreeBSD 13 To: Hans Petter Selasky , "freebsd-arch@freebsd.org" References: <1bff381f-3d6e-b20c-28f9-1403a9dfe0f6@FreeBSD.org> <07a4ca53-da9d-e7b2-9af3-c5098f15a5c7@selasky.org> From: Alexander Motin Autocrypt: addr=mav@FreeBSD.org; prefer-encrypt=mutual; keydata= mQENBFOzxAwBCADkPrax0pI2W/ig0CK9nRJJwsHitAGEZ2HZiFEuti+6/4UVxj81yr4ak/4g 9bKUyC7rMEAp/ZHNhd+MFCPAAcHPvtovnfykqE/vuosCS3wlSLloix2iKVLks0CwbLHGAyne 46lTQW74Xl/33c3W1Z6d8jD9gVFT/xaVzZ0U9xdzOmsYAZaAj4ki0tuxO9F7L+ct9grRe7iP g8t9hai7BL4ee3VRwk2JXnKb7UvBiVITKYWKz1jRvZIrjPokgEcCLOSlv7x/1kjuFnj3xWZU 7HSFFT8J93epBbrSSCsYsppIk2fZH41kaaFXsMQfTPH8wkeM6qwrvOh4HiQM08R+9tThABEB AAG0IUFsZXhhbmRlciBNb3RpbiA8bWF2QEZyZWVCU0Qub3JnPokBVwQTAQoAQQIbAwULCQgH AwUVCgkICwUWAwIBAAIeAQIXgAIZARYhBOmM88TmnMPNDledVYMYw5VbqyJ/BQJZYMKuBQkN McyiAAoJEIMYw5VbqyJ/tuUIAOG3ONOSNYqjK4eTZ1TVh9jdUBAhWk5nhDFnODN49Wj0AbYm 7aIqy8O1hnCDSZG5LttjSAo3UfXJZDKQM0BLb0gpRMBnAYqO6tdolLNqAbPGJBnGoPjsh24y 6KcbDaNnis+lD4GwPXwQM+92wZGhCUFElPV9NciZGVS65TNIgk7X+yEjjhD1MSWKKijZ1r9Z zIt4OzUTxxNOvzdlABZS88nNRdJkatOQJPmFdd1mpP6UzTNCiLUo1pIqOEtJgvVVDYq5WHY6 tciWWYdmZG/tIBexJmv2mV2OLVjXR6ZeKmntVH14H72/wRHJuYHQC+r5SVRcWWayrThsY6jZ Yr4+raS5AQ0EU7PEDAEIAOZgWf2cJIu+58IzP2dkXE/urj3tr4OqrB/yHGWUf71Lz6D0Fi6Z AXgDtmcFLGPfMyWuLAvSM+xmoguk7zC4hRBYvQycmIhuqBq1jO1Wp/Z+lpoPM/1cDYLn8Flv mI/c40MhUZh345DA4jYWWaZNjQHUWVQ1fPf595vdVVMPT/abE8E5DaF6fSkRmqFTmfYRkfbt 3ytU8NdUapDcJVY7cEP2nJBVNZPnOIObR/ZIgSxjjrG5o34yXoqeup8JvwEv+/NylzzuyXEZ R1EdEIzQ/a1nh/0j4NXtzZEqKW4aTWlmSqb6wN8jh1OSOOqkYsfnE3nfxcZbxi4IRoNQYlm5 9R8AEQEAAYkBPAQYAQoAJgIbDBYhBOmM88TmnMPNDledVYMYw5VbqyJ/BQJZYMLYBQkNMczM AAoJEIMYw5VbqyJ/TqgH/RQHClkvecE0262lwKoP/m0Mh4I5TLRgoJJn8S7G1BnqohYJkiLq A6xe6urGD7OqdNAl12UbrjWbdJV+zvea3vJoM4MZuYiYrGaXWxzFXqWJcPwMU9sAh8MRghHu uC5vgPb45Tnftw9/+n0i8GfVhQhOqepUGdQg4NPcXviSkoAvig6pp9Lcxisn0groUQKt15Gc sS9YcQWg3j9Hnipc6Mu416HX98Fb113NHJqc2geTHLkRyuBFOoyIqB6N9GKjzOAIzxxsVdl9 TevwGsrp4M4/RFzWbSgsbOnbE7454lmuVZGfReEjnUm8RHp9Q2UWKXlp3exlZjvOp/uVEpCg lz65AQ0EU7PEDAEIAOZgWf2cJIu+58IzP2dkXE/urj3tr4OqrB/yHGWUf71Lz6D0Fi6ZAXgD tmcFLGPfMyWuLAvSM+xmoguk7zC4hRBYvQycmIhuqBq1jO1Wp/Z+lpoPM/1cDYLn8FlvmI/c 40MhUZh345DA4jYWWaZNjQHUWVQ1fPf595vdVVMPT/abE8E5DaF6fSkRmqFTmfYRkfbt3ytU 8NdUapDcJVY7cEP2nJBVNZPnOIObR/ZIgSxjjrG5o34yXoqeup8JvwEv+/NylzzuyXEZR1Ed EIzQ/a1nh/0j4NXtzZEqKW4aTWlmSqb6wN8jh1OSOOqkYsfnE3nfxcZbxi4IRoNQYlm59R8A EQEAAYkBPAQYAQoAJgIbDBYhBOmM88TmnMPNDledVYMYw5VbqyJ/BQJZYMLYBQkNMczMAAoJ EIMYw5VbqyJ/TqgH/RQHClkvecE0262lwKoP/m0Mh4I5TLRgoJJn8S7G1BnqohYJkiLqA6xe 6urGD7OqdNAl12UbrjWbdJV+zvea3vJoM4MZuYiYrGaXWxzFXqWJcPwMU9sAh8MRghHuuC5v gPb45Tnftw9/+n0i8GfVhQhOqepUGdQg4NPcXviSkoAvig6pp9Lcxisn0groUQKt15GcsS9Y cQWg3j9Hnipc6Mu416HX98Fb113NHJqc2geTHLkRyuBFOoyIqB6N9GKjzOAIzxxsVdl9Tevw Gsrp4M4/RFzWbSgsbOnbE7454lmuVZGfReEjnUm8RHp9Q2UWKXlp3exlZjvOp/uVEpCglz4= Message-ID: Date: Sat, 14 Nov 2020 08:26:14 -0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.0 MIME-Version: 1.0 In-Reply-To: <07a4ca53-da9d-e7b2-9af3-c5098f15a5c7@selasky.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4CYGLy6d4Vz3C48 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 13:26:19 -0000 On 14.11.2020 06:22, Hans Petter Selasky wrote: > On 11/14/20 5:14 AM, Alexander Motin wrote: >>> We currently have a MAXPHYS of 128k. This is the maximum size of I/Os >>> that we normally use (though there are exceptions). >>> >>> I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping >>> DFLTPHYS to 1MB. >> >> I am all for the MAXPHYS change, as Warner told it was my proposition on >> a chat.  ZFS uses blocks and aggregates I/O up to 1MB already and can >> more potentially, and having I/O size lower then this just overflows >> disk queues, increases processing overheads, complicates scheduling and >> in some cases causes starvation. >> >> I'd just like to note that DFLTPHYS should probably not be changed that >> straight (if at all), since it is used as a fallback for legacy code. >> If it is used for anything else -- that should be reviewed and probably >> migrated to some other constant(s)> > Beware that many USB 2.0 devices will break if you try to transfer more > than 64K. Buggy SCSI implementations! Yes, thanks, I remember. The code reports MAXPHYS only for USB_SPEED_SUPER devices, relying on DFLTPHYS fallback in CAM otherwise. I think slower ones could just be hardcoded to 64KB to be certain. -- Alexander Motin From owner-freebsd-arch@freebsd.org Sat Nov 14 15:01:09 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 399B22ECA0C for ; Sat, 14 Nov 2020 15:01:09 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CYJSN39ryz3JKG; Sat, 14 Nov 2020 15:01:08 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: by mail-qt1-x831.google.com with SMTP id n63so9456030qte.4; Sat, 14 Nov 2020 07:01:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:to:from:subject:autocrypt:message-id:date:user-agent :mime-version:content-language:content-transfer-encoding; bh=cmJL3CeLzM2KBrUnQt67lJ4T+UCPbN+/VnQa+3LYZp8=; b=H9K5EiCOdUQvnV6AvxKFrZqmTxHyEzdB8DqaPi8OtDU9Oti1dmFGiv16jnBuxu4LAh wM2WGaYNRimovIB642jq5ox+iY07U0EMGfxXi/tVYHTryc8AK6LpxtI4mXuZ8gPU2kW7 mctpc3sKDycrCi6AsY3xUIOXvmfhlL/ieoqu9BK+S7wnrDgzfxtMOyUiWBaLkWTQ3Q4f 06p2xgRTZpN9hG+nTIUIpEwt0ueqy1+i43zTjl3ZgTN/WKb7GpKTorY3cgHlhclyArbK Mn1n6megJO4YbYOdAu/X+fXK1LwrRviYsu6dONqn/7V54E9OZRda/iEQ60uY+cJEDsqd pOHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:to:from:subject:autocrypt:message-id:date :user-agent:mime-version:content-language:content-transfer-encoding; bh=cmJL3CeLzM2KBrUnQt67lJ4T+UCPbN+/VnQa+3LYZp8=; b=AYr7A3Xpx9ALJY7uOf3aSkYtAUptbYTrEAHHfpxvHhU7m0mcJPL1y4ITQ/auE09kcE 9zAHK2cExWj7X5X5daVmG5mLspIMOaLfokCJxtV0PDAhN5Lw8y7U1VKlfoaBMP8PrAG6 bfnbHGBSX1X5OeESv87t5OjQo6m5V0B7ttFLZzmZXfVdxLNjodGFhIlAKzMkcnlGTFll A4inZQh4RFsLTKcOPBThyZEahiFCDnQf+hQRx5eECG1bjr8ewzUHNxqyWzcmMStX7060 gIGoT8BgW3DDiXgPXNlCVdgxpVQtSYM+zmgBpb+FmBlNGttFBkBTR2LxS7YQdTml6PN8 BFvA== X-Gm-Message-State: AOAM530PZrBSH+N46Dm7xuBCUKiag3WnuZKfAyPbuCvwmP9N8G2jsZ4Z f2fIbtSEC/nyxJ1XNkZpgos8XDreEHFF7Q== X-Google-Smtp-Source: ABdhPJzQNva3lZFarFPgXJU/M0/DsfmNYn9DiJTLhRMa+3Re+BD/2L+8WZIAIPj7pONNvKPD0ko1uQ== X-Received: by 2002:ac8:5ac3:: with SMTP id d3mr6752956qtd.384.1605366067067; Sat, 14 Nov 2020 07:01:07 -0800 (PST) Received: from spectre.mavhome.dp.ua ([2600:1700:3580:3560:228:f8ff:fe04:d12]) by smtp.gmail.com with ESMTPSA id p12sm8535864qkp.88.2020.11.14.07.01.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 14 Nov 2020 07:01:06 -0800 (PST) Sender: Alexander Motin To: Konstantin Belousov , "freebsd-arch@freebsd.org" From: Alexander Motin Subject: Re: MAXPHYS bump for FreeBSD 13 Autocrypt: addr=mav@FreeBSD.org; prefer-encrypt=mutual; keydata= mQENBFOzxAwBCADkPrax0pI2W/ig0CK9nRJJwsHitAGEZ2HZiFEuti+6/4UVxj81yr4ak/4g 9bKUyC7rMEAp/ZHNhd+MFCPAAcHPvtovnfykqE/vuosCS3wlSLloix2iKVLks0CwbLHGAyne 46lTQW74Xl/33c3W1Z6d8jD9gVFT/xaVzZ0U9xdzOmsYAZaAj4ki0tuxO9F7L+ct9grRe7iP g8t9hai7BL4ee3VRwk2JXnKb7UvBiVITKYWKz1jRvZIrjPokgEcCLOSlv7x/1kjuFnj3xWZU 7HSFFT8J93epBbrSSCsYsppIk2fZH41kaaFXsMQfTPH8wkeM6qwrvOh4HiQM08R+9tThABEB AAG0IUFsZXhhbmRlciBNb3RpbiA8bWF2QEZyZWVCU0Qub3JnPokBVwQTAQoAQQIbAwULCQgH AwUVCgkICwUWAwIBAAIeAQIXgAIZARYhBOmM88TmnMPNDledVYMYw5VbqyJ/BQJZYMKuBQkN McyiAAoJEIMYw5VbqyJ/tuUIAOG3ONOSNYqjK4eTZ1TVh9jdUBAhWk5nhDFnODN49Wj0AbYm 7aIqy8O1hnCDSZG5LttjSAo3UfXJZDKQM0BLb0gpRMBnAYqO6tdolLNqAbPGJBnGoPjsh24y 6KcbDaNnis+lD4GwPXwQM+92wZGhCUFElPV9NciZGVS65TNIgk7X+yEjjhD1MSWKKijZ1r9Z zIt4OzUTxxNOvzdlABZS88nNRdJkatOQJPmFdd1mpP6UzTNCiLUo1pIqOEtJgvVVDYq5WHY6 tciWWYdmZG/tIBexJmv2mV2OLVjXR6ZeKmntVH14H72/wRHJuYHQC+r5SVRcWWayrThsY6jZ Yr4+raS5AQ0EU7PEDAEIAOZgWf2cJIu+58IzP2dkXE/urj3tr4OqrB/yHGWUf71Lz6D0Fi6Z AXgDtmcFLGPfMyWuLAvSM+xmoguk7zC4hRBYvQycmIhuqBq1jO1Wp/Z+lpoPM/1cDYLn8Flv mI/c40MhUZh345DA4jYWWaZNjQHUWVQ1fPf595vdVVMPT/abE8E5DaF6fSkRmqFTmfYRkfbt 3ytU8NdUapDcJVY7cEP2nJBVNZPnOIObR/ZIgSxjjrG5o34yXoqeup8JvwEv+/NylzzuyXEZ R1EdEIzQ/a1nh/0j4NXtzZEqKW4aTWlmSqb6wN8jh1OSOOqkYsfnE3nfxcZbxi4IRoNQYlm5 9R8AEQEAAYkBPAQYAQoAJgIbDBYhBOmM88TmnMPNDledVYMYw5VbqyJ/BQJZYMLYBQkNMczM AAoJEIMYw5VbqyJ/TqgH/RQHClkvecE0262lwKoP/m0Mh4I5TLRgoJJn8S7G1BnqohYJkiLq A6xe6urGD7OqdNAl12UbrjWbdJV+zvea3vJoM4MZuYiYrGaXWxzFXqWJcPwMU9sAh8MRghHu uC5vgPb45Tnftw9/+n0i8GfVhQhOqepUGdQg4NPcXviSkoAvig6pp9Lcxisn0groUQKt15Gc sS9YcQWg3j9Hnipc6Mu416HX98Fb113NHJqc2geTHLkRyuBFOoyIqB6N9GKjzOAIzxxsVdl9 TevwGsrp4M4/RFzWbSgsbOnbE7454lmuVZGfReEjnUm8RHp9Q2UWKXlp3exlZjvOp/uVEpCg lz65AQ0EU7PEDAEIAOZgWf2cJIu+58IzP2dkXE/urj3tr4OqrB/yHGWUf71Lz6D0Fi6ZAXgD tmcFLGPfMyWuLAvSM+xmoguk7zC4hRBYvQycmIhuqBq1jO1Wp/Z+lpoPM/1cDYLn8FlvmI/c 40MhUZh345DA4jYWWaZNjQHUWVQ1fPf595vdVVMPT/abE8E5DaF6fSkRmqFTmfYRkfbt3ytU 8NdUapDcJVY7cEP2nJBVNZPnOIObR/ZIgSxjjrG5o34yXoqeup8JvwEv+/NylzzuyXEZR1Ed EIzQ/a1nh/0j4NXtzZEqKW4aTWlmSqb6wN8jh1OSOOqkYsfnE3nfxcZbxi4IRoNQYlm59R8A EQEAAYkBPAQYAQoAJgIbDBYhBOmM88TmnMPNDledVYMYw5VbqyJ/BQJZYMLYBQkNMczMAAoJ EIMYw5VbqyJ/TqgH/RQHClkvecE0262lwKoP/m0Mh4I5TLRgoJJn8S7G1BnqohYJkiLqA6xe 6urGD7OqdNAl12UbrjWbdJV+zvea3vJoM4MZuYiYrGaXWxzFXqWJcPwMU9sAh8MRghHuuC5v gPb45Tnftw9/+n0i8GfVhQhOqepUGdQg4NPcXviSkoAvig6pp9Lcxisn0groUQKt15GcsS9Y cQWg3j9Hnipc6Mu416HX98Fb113NHJqc2geTHLkRyuBFOoyIqB6N9GKjzOAIzxxsVdl9Tevw Gsrp4M4/RFzWbSgsbOnbE7454lmuVZGfReEjnUm8RHp9Q2UWKXlp3exlZjvOp/uVEpCglz4= Message-ID: Date: Sat, 14 Nov 2020 10:01:05 -0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4CYJSN39ryz3JKG X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=H9K5EiCO; dmarc=none; spf=pass (mx1.freebsd.org: domain of mavbsd@gmail.com designates 2607:f8b0:4864:20::831 as permitted sender) smtp.mailfrom=mavbsd@gmail.com X-Spamd-Result: default: False [-2.20 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FORGED_SENDER(0.30)[mav@FreeBSD.org,mavbsd@gmail.com]; MIME_TRACE(0.00)[0:+]; RBL_DBL_DONT_QUERY_IPS(0.00)[2607:f8b0:4864:20::831:from]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[mav@FreeBSD.org,mavbsd@gmail.com]; ARC_NA(0.00)[]; FAKE_REPLY(1.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[FreeBSD.org]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; SPAMHAUS_ZRD(0.00)[2607:f8b0:4864:20::831:from:127.0.2.255]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::831:from]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-arch] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 15:01:09 -0000 On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote: > To put the specific numbers, for struct buf it means increase by 1792 > bytes. For bio it does not, because it does not embed vm_page_t[] into > the structure. > > Worse, typical struct buf addend for excess vm_page pointers is going > to be unused, because normal size of the UFS block is 32K. It is > going to be only used by clusters and physbufs. > > So I object against bumping this value without reworking buffers > handling of b_pages[]. Most straightforward approach is stop using > MAXPHYS to size this array, and use external array for clusters. > Pbufs can embed large array. I am not very familiar with struct buf usage, so I'd appreciate some help there. Quickly looking on pbuf, it seems trivial to allocate external b_pages array of any size in pbuf_init, that should easily satisfy all of pbuf descendants. Cluster and vnode/swap pagers code are pbuf descendants also. Vnode pager I guess may only need replacement for nitems(bp->b_pages) in few places. Could you or somebody help with vfs/ffs code, where I suppose the smaller page lists are used? -- Alexander Motin From owner-freebsd-arch@freebsd.org Sat Nov 14 18:37:21 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 8CFBD461D43 for ; Sat, 14 Nov 2020 18:37:21 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CYPFr15bRz3mG9; Sat, 14 Nov 2020 18:37:19 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 0AEIb5VG063137 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sat, 14 Nov 2020 20:37:09 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 0AEIb5VG063137 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 0AEIb5r6063136; Sat, 14 Nov 2020 20:37:05 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 14 Nov 2020 20:37:05 +0200 From: Konstantin Belousov To: Alexander Motin Cc: "freebsd-arch@freebsd.org" Subject: Re: MAXPHYS bump for FreeBSD 13 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on tom.home X-Rspamd-Queue-Id: 4CYPFr15bRz3mG9 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 18:37:21 -0000 On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote: > On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote: > > To put the specific numbers, for struct buf it means increase by 1792 > > bytes. For bio it does not, because it does not embed vm_page_t[] into > > the structure. > > > > Worse, typical struct buf addend for excess vm_page pointers is going > > to be unused, because normal size of the UFS block is 32K. It is > > going to be only used by clusters and physbufs. > > > > So I object against bumping this value without reworking buffers > > handling of b_pages[]. Most straightforward approach is stop using > > MAXPHYS to size this array, and use external array for clusters. > > Pbufs can embed large array. > > I am not very familiar with struct buf usage, so I'd appreciate some > help there. > > Quickly looking on pbuf, it seems trivial to allocate external b_pages > array of any size in pbuf_init, that should easily satisfy all of pbuf > descendants. Cluster and vnode/swap pagers code are pbuf descendants > also. Vnode pager I guess may only need replacement for > nitems(bp->b_pages) in few places. I planned to look at making MAXPHYS a tunable. You are right, we would need: 1. move b_pages to the end of struct buf and declaring it as flexible. This would make KBI worse because struct buf depends on some debugging options, and than b_pages offset depends on config. Another option could be to change b_pages to pointer, if we are fine with one more indirection. But in my plan, real array is always allocated past struct buf, so flexible array is more correct even. 2. Preallocating both normal bufs and pbufs together with the arrays. 3. I considered adding B_SMALLPAGES flag to b_flags and use it to indicate that buffer has 'small' b_pages. All buffers rotated through getnewbuf()/ buf_alloc() should have it set. 4. There could be some places which either malloc() or allocate struct buf on stack (I tend to believe that I converted all later places to formed). They would need to get special handling. md(4) uses pbufs. 4. My larger concern is, in fact, cam and drivers. > > Could you or somebody help with vfs/ffs code, where I suppose the > smaller page lists are used? Do you plan to work on this ? I can help, sure. Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same as DFLPHYS), a tunable, in the scope of this work. From owner-freebsd-arch@freebsd.org Sat Nov 14 18:48:36 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 0583B4623EC for ; Sat, 14 Nov 2020 18:48:36 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CYPVq6Vwfz3n3T; Sat, 14 Nov 2020 18:48:35 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id 6C2BA5C00A4; Sat, 14 Nov 2020 13:48:35 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Sat, 14 Nov 2020 13:48:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsco.org; h= content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s=fm1; bh=n vAXAIETzkZa2C/sm5wBaq+9m/Tv1nAKcXWwdWNFYAk=; b=P95BAkJ37YCSD/dHT Fi1AtT7RfS8u+94geKSKkNQlsIe3K5iPSg5XSAcn/iVcKosmjb7wxnRlVb7MKdpA ef/2QkUHXaZYEXS3bGUhKzUf0xMRAh/CggPBDGNiZk1yKBOjDSPbKcGY9Fl48EOX fgPcm+rsOcrplrfl7Ku0GAcrOEhPqPL73LerUP9LkCSoKTuAwfhwJkcS3toXg+MD S90hadBDSU944bClHUN71sYtrCvq5ltLLjtcbzr3GDUu1yYHx1kfsSjX8YnrElyi rtdBA7Lk6x9Zfvcz5ach1nZhdrq1TPy1daEt9IwPv+sVCuqEtD/5w449oG1H64kf woTcA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=nvAXAIETzkZa2C/sm5wBaq+9m/Tv1nAKcXWwdWNFY Ak=; b=cZQy0tfgXyr/pTVppBDdUeE2DrNfNYHDabYJJhORBtY+QJsegyC1BEWks L0xzN4Ns7wH8VQlXULPNV2K5q4eoLXYg2DYKOyZ/Gp5dluRKai7riGuq1j/Owurt +GctQaqdyq077Svr7ferxHqvnEhlmnU8LHSL4KlXYNc009w5QjNBjUKLqQ9DOmy0 zNWJn/aAQVJTTqMPPEEZIZmJCyS90P9cS2SHe9NJVl8jg2kqfsSmkoxoFZnnLYjQ c2d0VMmw3894zD4ZdBQaQjhwT2r6KBYGnaFaULED2/wRlffC5Sf6Da8nS0Gspluh BOvipQhL2lj3rXL1HxVcHaK/VhDAw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedruddvjedguddukecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurheptggguffhjgffgffkfhfvofesth hqmhdthhdtjeenucfhrhhomhepufgtohhtthcunfhonhhguceoshgtohhtthhlsehsrghm shgtohdrohhrgheqnecuggftrfgrthhtvghrnhepfeejgefgjefhgfdtjeevjeekgeevie elueehjefgudetvefgtdetgffggefgvdegnecukfhppeekrdegiedrkeelrddvudefnecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepshgtohhtth hlsehsrghmshgtohdrohhrgh X-ME-Proxy: Received: from [192.168.0.114] (unknown [8.46.89.213]) by mail.messagingengine.com (Postfix) with ESMTPA id A4B513280060; Sat, 14 Nov 2020 13:48:34 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: MAXPHYS bump for FreeBSD 13 From: Scott Long In-Reply-To: Date: Sat, 14 Nov 2020 11:48:34 -0700 Cc: Alexander Motin , "freebsd-arch@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: <634E35E9-9E2B-40CE-9C70-BB130BD9D614@samsco.org> References: To: Konstantin Belousov X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Rspamd-Queue-Id: 4CYPVq6Vwfz3n3T X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 18:48:36 -0000 > On Nov 14, 2020, at 11:37 AM, Konstantin Belousov = wrote: >=20 > On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote: >> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote: >>> To put the specific numbers, for struct buf it means increase by = 1792 >>> bytes. For bio it does not, because it does not embed vm_page_t[] = into >>> the structure. >>>=20 >>> Worse, typical struct buf addend for excess vm_page pointers is = going >>> to be unused, because normal size of the UFS block is 32K. It is >>> going to be only used by clusters and physbufs. >>>=20 >>> So I object against bumping this value without reworking buffers >>> handling of b_pages[]. Most straightforward approach is stop using >>> MAXPHYS to size this array, and use external array for clusters. >>> Pbufs can embed large array. >>=20 >> I am not very familiar with struct buf usage, so I'd appreciate some >> help there. >>=20 >> Quickly looking on pbuf, it seems trivial to allocate external = b_pages >> array of any size in pbuf_init, that should easily satisfy all of = pbuf >> descendants. Cluster and vnode/swap pagers code are pbuf descendants >> also. Vnode pager I guess may only need replacement for >> nitems(bp->b_pages) in few places. > I planned to look at making MAXPHYS a tunable. >=20 > You are right, we would need: > 1. move b_pages to the end of struct buf and declaring it as flexible. > This would make KBI worse because struct buf depends on some debugging > options, and than b_pages offset depends on config. >=20 > Another option could be to change b_pages to pointer, if we are fine = with > one more indirection. But in my plan, real array is always allocated = past > struct buf, so flexible array is more correct even. >=20 I like this, and I was in the middle of writing up an email that = described it. There could be multiple malloc types or UMA zones of different sizes, depending on the intended i/o size, or just a runtime change to the size = of a single allocation size. > 2. Preallocating both normal bufs and pbufs together with the arrays. >=20 > 3. I considered adding B_SMALLPAGES flag to b_flags and use it to = indicate > that buffer has 'small' b_pages. All buffers rotated through = getnewbuf()/ > buf_alloc() should have it set. >=20 This would work nicely with a variable sized allocator, yes. > 4. There could be some places which either malloc() or allocate struct = buf > on stack (I tend to believe that I converted all later places to = formed). > They would need to get special handling. >=20 I couldn=E2=80=99t find any places that allocated a buf on the stack or = embedded it into another structure. > md(4) uses pbufs. >=20 > 4. My larger concern is, in fact, cam and drivers. >=20 Can you describe your concern? >>=20 >> Could you or somebody help with vfs/ffs code, where I suppose the >> smaller page lists are used? > Do you plan to work on this ? I can help, sure. >=20 > Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same = as > DFLPHYS), a tunable, in the scope of this work. Sounds great, thank you for looking at it. Scott From owner-freebsd-arch@freebsd.org Sat Nov 14 18:50:20 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 4D8DF4626AA for ; Sat, 14 Nov 2020 18:50:20 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gate2.funkthat.com", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CYPXq4Nx4z3n4N for ; Sat, 14 Nov 2020 18:50:19 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (localhost [127.0.0.1]) by gold.funkthat.com (8.15.2/8.15.2) with ESMTPS id 0AEIoH3E002559 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 14 Nov 2020 10:50:17 -0800 (PST) (envelope-from jmg@gold.funkthat.com) Received: (from jmg@localhost) by gold.funkthat.com (8.15.2/8.15.2/Submit) id 0AEIoGxV002558; Sat, 14 Nov 2020 10:50:16 -0800 (PST) (envelope-from jmg) Date: Sat, 14 Nov 2020 10:50:16 -0800 From: John-Mark Gurney To: Warner Losh Cc: Scott Long , "freebsd-arch@freebsd.org" Subject: Re: MAXPHYS bump for FreeBSD 13 Message-ID: <20201114185016.GM31099@funkthat.com> Mail-Followup-To: Warner Losh , Scott Long , "freebsd-arch@freebsd.org" References: <926C3A98-03BF-46FD-9B22-9EFBDC0F44A4@samsco.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: FreeBSD 11.3-STABLE amd64 X-PGP-Fingerprint: D87A 235F FB71 1F3F 55B7 ED9B D5FF 5A51 C0AC 3D65 X-Files: The truth is out there X-URL: https://www.funkthat.com/ X-Resume: https://www.funkthat.com/~jmg/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? User-Agent: Mutt/1.6.1 (2016-04-27) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (gold.funkthat.com [127.0.0.1]); Sat, 14 Nov 2020 10:50:17 -0800 (PST) X-Rspamd-Queue-Id: 4CYPXq4Nx4z3n4N X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of jmg@gold.funkthat.com has no SPF policy when checking 208.87.223.18) smtp.mailfrom=jmg@gold.funkthat.com X-Spamd-Result: default: False [-1.80 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; ARC_NA(0.00)[]; FREEFALL_USER(0.00)[jmg]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; RCVD_TLS_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[funkthat.com]; RBL_DBL_DONT_QUERY_IPS(0.00)[208.87.223.18:from]; AUTH_NA(1.00)[]; MID_RHS_MATCH_FROM(0.00)[]; SPAMHAUS_ZRD(0.00)[208.87.223.18:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_SPF_NA(0.00)[no SPF record]; FORGED_SENDER(0.30)[jmg@funkthat.com,jmg@gold.funkthat.com]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:32354, ipnet:208.87.216.0/21, country:US]; FROM_NEQ_ENVFROM(0.00)[jmg@funkthat.com,jmg@gold.funkthat.com]; MAILMAN_DEST(0.00)[freebsd-arch]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 18:50:20 -0000 Warner Losh wrote this message on Fri, Nov 13, 2020 at 19:16 -0700: > On Fri, Nov 13, 2020 at 6:23 PM Scott Long wrote: > > > I have mixed feelings on this. The Netflix workload isn???t typical, and > > this > > change represents a fairly substantial increase in memory usage for > > bufs. It???s also a config tunable, so it???s not like this represents a > > meaningful > > diff reduction for Netflix. > > > > This isn't motivated at all by Netflix's work load nor any needs to > minimize diffs at all. In fact, Netflix had nothing to do with the proposal > apart from me writing it up. > > This is motivated more by the needs of more people to do larger I/Os than > 128k, though maybe 1MB is too large. Alexander Motin proposed it today > during the Vendor Summit and I wrote up the idea for arch@. I ran into this problem recently w/ my work on ggate. I was doing testing using dd bs=1m. Because of MAXPHYS, the physio for devices breaks down the request into 128kB segments, which are scheduled serially... This means that if there is request latency, it is multiplied 8x because of the smaller requests... Also, some file systems, like ZFS, ignore the MAXPHYS limit, and pass down larger IOs anyways, which clearly work well enough that no one complains about ZFS not working on their devices... I talked briefly w/ Warner about increasing MAXPHYS not to long ago. > The upside is that it will likely help benchmarks out of the box. Is that > > enough of an upside for the downsides of memory pressure on small memory > > and high iops systems? I???m not convinced. I really would like to see the > > years of talk about fixing this correctly put into action. > > I'd love years of inaction to end too. I'd also like FreeBSD to perform a > bit better out of the box. Would your calculation have changed had the size > been 256k or 512k? Both those options use/waste substantially fewer bytes > per I/O than 1MB. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@freebsd.org Sat Nov 14 19:43:08 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 6BB01464483 for ; Sat, 14 Nov 2020 19:43:08 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CYQjm13GYz3sHR; Sat, 14 Nov 2020 19:43:07 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 0AEJh09g078499 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sat, 14 Nov 2020 21:43:03 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 0AEJh09g078499 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 0AEJh0TG078498; Sat, 14 Nov 2020 21:43:00 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 14 Nov 2020 21:43:00 +0200 From: Konstantin Belousov To: Scott Long Cc: Alexander Motin , "freebsd-arch@freebsd.org" Subject: Re: MAXPHYS bump for FreeBSD 13 Message-ID: References: <634E35E9-9E2B-40CE-9C70-BB130BD9D614@samsco.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <634E35E9-9E2B-40CE-9C70-BB130BD9D614@samsco.org> X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on tom.home X-Rspamd-Queue-Id: 4CYQjm13GYz3sHR X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 19:43:08 -0000 On Sat, Nov 14, 2020 at 11:48:34AM -0700, Scott Long wrote: > > > > On Nov 14, 2020, at 11:37 AM, Konstantin Belousov wrote: > > > > On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote: > >> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote: > >>> To put the specific numbers, for struct buf it means increase by 1792 > >>> bytes. For bio it does not, because it does not embed vm_page_t[] into > >>> the structure. > >>> > >>> Worse, typical struct buf addend for excess vm_page pointers is going > >>> to be unused, because normal size of the UFS block is 32K. It is > >>> going to be only used by clusters and physbufs. > >>> > >>> So I object against bumping this value without reworking buffers > >>> handling of b_pages[]. Most straightforward approach is stop using > >>> MAXPHYS to size this array, and use external array for clusters. > >>> Pbufs can embed large array. > >> > >> I am not very familiar with struct buf usage, so I'd appreciate some > >> help there. > >> > >> Quickly looking on pbuf, it seems trivial to allocate external b_pages > >> array of any size in pbuf_init, that should easily satisfy all of pbuf > >> descendants. Cluster and vnode/swap pagers code are pbuf descendants > >> also. Vnode pager I guess may only need replacement for > >> nitems(bp->b_pages) in few places. > > I planned to look at making MAXPHYS a tunable. > > > > You are right, we would need: > > 1. move b_pages to the end of struct buf and declaring it as flexible. > > This would make KBI worse because struct buf depends on some debugging > > options, and than b_pages offset depends on config. > > > > Another option could be to change b_pages to pointer, if we are fine with > > one more indirection. But in my plan, real array is always allocated past > > struct buf, so flexible array is more correct even. > > > > I like this, and I was in the middle of writing up an email that described it. > There could be multiple malloc types or UMA zones of different sizes, > depending on the intended i/o size, or just a runtime change to the size of > a single allocation size. I do not think we need new/many zones. Queued (getnewbuf()) bufs come from buf_zone, and pbufs are allocated from pbuf_zone. That should be fixed alloc size, with small b_pages[] for buf_zone, and large (MAXPHYS) for pbuf. Everything else, if any, would need to pre-calculate malloc size. > > > 2. Preallocating both normal bufs and pbufs together with the arrays. > > > > 3. I considered adding B_SMALLPAGES flag to b_flags and use it to indicate > > that buffer has 'small' b_pages. All buffers rotated through getnewbuf()/ > > buf_alloc() should have it set. > > > > This would work nicely with a variable sized allocator, yes. > > > 4. There could be some places which either malloc() or allocate struct buf > > on stack (I tend to believe that I converted all later places to formed). > > They would need to get special handling. > > > > I couldn’t find any places that allocated a buf on the stack or embedded it > into another structure. As I said, I did a pass to eliminate stack allocations for bufs. As result, for instance flushbufqueues() mallocs struct buf, but it does not use b_pages[] of the allocated sentinel. > > > md(4) uses pbufs. > > > > 4. My larger concern is, in fact, cam and drivers. > > > > Can you describe your concern? My opinion is that during this work all uses of MAXPHYS should be reviewed, and there are a lot of drivers that reference the constant. From the past experience, I expect some evil ingenious (ab)use. Same for bufs, but apart from application of pbufs in cam_periph.c, I do not think drivers have much use of it. > > >> > >> Could you or somebody help with vfs/ffs code, where I suppose the > >> smaller page lists are used? > > Do you plan to work on this ? I can help, sure. > > > > Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same as > > DFLPHYS), a tunable, in the scope of this work. > > Sounds great, thank you for looking at it. > > Scott > From owner-freebsd-arch@freebsd.org Sat Nov 14 20:39:46 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1D4F6465414 for ; Sat, 14 Nov 2020 20:39:46 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com [IPv6:2607:f8b0:4864:20::f2d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CYRz572Pbz3vhS for ; Sat, 14 Nov 2020 20:39:45 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-qv1-xf2d.google.com with SMTP id z17so6749412qvy.11 for ; Sat, 14 Nov 2020 12:39:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lnjSVl6EoFXzNFk4/yW55MhwidLXi3ZGCTWUFrRHt7Q=; b=b5F73J4pQOeHnwiU65CH3qpz4DgvDgfy690S0Fzr2ICh045fqFZ3bpL34OQQYNo6c3 LI9lBzzhd29MQSNbQ/Mnq4rN6wztI1Bj9Q/6JIl+S1ka4Pl7tHD6TQjPlRNwc5INc4k5 Q5PhCC5hfpWahD8QcyZTEFIBw7HIhDosQxhUnxjhmS/E2Nd9nIw1q3IKrAKCgUWpAbxf hD0gmdtzh3yBrK69e8kcPXe1Xrg3TF9AmpvQU7OstUDnNPtFEG9/W2F1OhQQailNl/O/ /DHTsIrYI/otn+WQrd6j965GbQgidjtCQrObcm5vsCxbi1TQ7gjUx4wY7kh85d2hFQyu qZTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lnjSVl6EoFXzNFk4/yW55MhwidLXi3ZGCTWUFrRHt7Q=; b=VbeHwhpZBZHDiFDEi8bSFuIYyM571NUtE1kJF7Qo+5jSj6oY065wdLD7fAg5/10o06 jbW28lAV72BjuNx9Z3/tHIB3RVgW0iKqL5esQLGttB5AuYLcGmRsWIdn1ZQ4eIBPaEjw 4PWDPuzhh1xfQUTFKp2Udu/8F0Z7B16OPVvWXlrk7HGax7kzvHLpyvwOn95ymy971X1w MQIsWikZoXoHvkdBrFfnHukQTY1y4q2rkCGURZFGtMnCQqEbttrSikWCgYl8Hd1s/a9z w4hUU7tR/ncmiG8llKIbKy59WoFNe9R4gadA29FyOkiIumakTN94qf4O8B6/CxUdU9JX Gidw== X-Gm-Message-State: AOAM533qIwYES43JAvdX/jId5P5UJ+90pM3gjDQpoNmuccrtI/6Bwnxf QyE7bebC7oiz0Ym2wpepJhBbQBmuQXoCcFIOpoiOIg== X-Google-Smtp-Source: ABdhPJx8H4oxhLEGaX1dA+fFOBfHudXrkQQ6X0UcR2GnMzo5Bfv9N4Md3H+lGl2hz/lGW75UzWXRj1Y/lAYo+r4ME00= X-Received: by 2002:a0c:82c4:: with SMTP id i62mr8593902qva.28.1605386384605; Sat, 14 Nov 2020 12:39:44 -0800 (PST) MIME-Version: 1.0 References: <634E35E9-9E2B-40CE-9C70-BB130BD9D614@samsco.org> In-Reply-To: From: Warner Losh Date: Sat, 14 Nov 2020 13:39:32 -0700 Message-ID: Subject: Re: MAXPHYS bump for FreeBSD 13 To: Konstantin Belousov Cc: Scott Long , Alexander Motin , "freebsd-arch@freebsd.org" X-Rspamd-Queue-Id: 4CYRz572Pbz3vhS X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.34 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 20:39:46 -0000 On Sat, Nov 14, 2020, 12:43 PM Konstantin Belousov wrote: > On Sat, Nov 14, 2020 at 11:48:34AM -0700, Scott Long wrote: > > > > > > > On Nov 14, 2020, at 11:37 AM, Konstantin Belousov > wrote: > > > > > > On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote: > > >> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote: > > >>> To put the specific numbers, for struct buf it means increase by 17= 92 > > >>> bytes. For bio it does not, because it does not embed vm_page_t[] > into > > >>> the structure. > > >>> > > >>> Worse, typical struct buf addend for excess vm_page pointers is goi= ng > > >>> to be unused, because normal size of the UFS block is 32K. It is > > >>> going to be only used by clusters and physbufs. > > >>> > > >>> So I object against bumping this value without reworking buffers > > >>> handling of b_pages[]. Most straightforward approach is stop using > > >>> MAXPHYS to size this array, and use external array for clusters. > > >>> Pbufs can embed large array. > > >> > > >> I am not very familiar with struct buf usage, so I'd appreciate some > > >> help there. > > >> > > >> Quickly looking on pbuf, it seems trivial to allocate external b_pag= es > > >> array of any size in pbuf_init, that should easily satisfy all of pb= uf > > >> descendants. Cluster and vnode/swap pagers code are pbuf descendant= s > > >> also. Vnode pager I guess may only need replacement for > > >> nitems(bp->b_pages) in few places. > > > I planned to look at making MAXPHYS a tunable. > > > > > > You are right, we would need: > > > 1. move b_pages to the end of struct buf and declaring it as flexible= . > > > This would make KBI worse because struct buf depends on some debuggin= g > > > options, and than b_pages offset depends on config. > > > > > > Another option could be to change b_pages to pointer, if we are fine > with > > > one more indirection. But in my plan, real array is always allocated > past > > > struct buf, so flexible array is more correct even. > > > > > > > I like this, and I was in the middle of writing up an email that > described it. > > There could be multiple malloc types or UMA zones of different sizes, > > depending on the intended i/o size, or just a runtime change to the siz= e > of > > a single allocation size. > I do not think we need new/many zones. > > Queued (getnewbuf()) bufs come from buf_zone, and pbufs are allocated > from pbuf_zone. That should be fixed alloc size, with small b_pages[] > for buf_zone, and large (MAXPHYS) for pbuf. > > Everything else, if any, would need to pre-calculate malloc size. > How will this affect clustered reads for things like read ahead? > > > > 2. Preallocating both normal bufs and pbufs together with the arrays. > > > > > > 3. I considered adding B_SMALLPAGES flag to b_flags and use it to > indicate > > > that buffer has 'small' b_pages. All buffers rotated through > getnewbuf()/ > > > buf_alloc() should have it set. > > > > > > > This would work nicely with a variable sized allocator, yes. > > > > > 4. There could be some places which either malloc() or allocate struc= t > buf > > > on stack (I tend to believe that I converted all later places to > formed). > > > They would need to get special handling. > > > > > > > I couldn=E2=80=99t find any places that allocated a buf on the stack or= embedded > it > > into another structure. > As I said, I did a pass to eliminate stack allocations for bufs. > As result, for instance flushbufqueues() mallocs struct buf, but it does > not use b_pages[] of the allocated sentinel. > Yea. I recall both the pass and looking for them later and not finding any either... > > > > md(4) uses pbufs. > > > > > > 4. My larger concern is, in fact, cam and drivers. > > > > > > > Can you describe your concern? > My opinion is that during this work all uses of MAXPHYS should be reviewe= d, > and there are a lot of drivers that reference the constant. From the pas= t > experience, I expect some evil ingenious (ab)use. > > Same for bufs, but apart from application of pbufs in cam_periph.c, I do > not > think drivers have much use of it. > Do you have precise definitions for DFLTPHYS and MAXPHYS? That might help ferret out the differences between the two. I have seen several places that use one or the other of these that seem incorrect, but that I can't quite articulate precisely why... having a good definition articulated would help. There are some places that likely want a fixed constant to reflect hardware, not a FreeBSD tuning parameter. As an aside, there are times I want to do transfers of arbitrary sizes for certain pass through commands that are vendor specific and that have no way to read the results in chunks. Thankfully most newer drives don't have this restriction, but it still comes up. But that's way below the buf layer and handled today by cam_periph and the pbufs there. These types of operations are rare and typically when the system is mostly idle, so low memory situations can be ignored beyond error handling and retry in the user program. Would this work make those possible? Or would MAXPHYS, however set, still limit them? Warner > > > >> > > >> Could you or somebody help with vfs/ffs code, where I suppose the > > >> smaller page lists are used? > > > Do you plan to work on this ? I can help, sure. > > > > > > Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not sam= e > as > > > DFLPHYS), a tunable, in the scope of this work. > > > > Sounds great, thank you for looking at it. > > > > Scott > > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@freebsd.org Sat Nov 14 21:57:15 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 06F764669E8 for ; Sat, 14 Nov 2020 21:57:15 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CYThV2Sdpz4TlB; Sat, 14 Nov 2020 21:57:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 0AELv15h010601 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sat, 14 Nov 2020 23:57:04 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 0AELv15h010601 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 0AELv0oU010600; Sat, 14 Nov 2020 23:57:00 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 14 Nov 2020 23:57:00 +0200 From: Konstantin Belousov To: Warner Losh Cc: Scott Long , Alexander Motin , "freebsd-arch@freebsd.org" Subject: Re: MAXPHYS bump for FreeBSD 13 Message-ID: References: <634E35E9-9E2B-40CE-9C70-BB130BD9D614@samsco.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on tom.home X-Rspamd-Queue-Id: 4CYThV2Sdpz4TlB X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 21:57:15 -0000 On Sat, Nov 14, 2020 at 01:39:32PM -0700, Warner Losh wrote: > On Sat, Nov 14, 2020, 12:43 PM Konstantin Belousov > wrote: > > > On Sat, Nov 14, 2020 at 11:48:34AM -0700, Scott Long wrote: > > > > > > > > > > On Nov 14, 2020, at 11:37 AM, Konstantin Belousov > > wrote: > > > > > > > > On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote: > > > >> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote: > > > >>> To put the specific numbers, for struct buf it means increase by 1792 > > > >>> bytes. For bio it does not, because it does not embed vm_page_t[] > > into > > > >>> the structure. > > > >>> > > > >>> Worse, typical struct buf addend for excess vm_page pointers is going > > > >>> to be unused, because normal size of the UFS block is 32K. It is > > > >>> going to be only used by clusters and physbufs. > > > >>> > > > >>> So I object against bumping this value without reworking buffers > > > >>> handling of b_pages[]. Most straightforward approach is stop using > > > >>> MAXPHYS to size this array, and use external array for clusters. > > > >>> Pbufs can embed large array. > > > >> > > > >> I am not very familiar with struct buf usage, so I'd appreciate some > > > >> help there. > > > >> > > > >> Quickly looking on pbuf, it seems trivial to allocate external b_pages > > > >> array of any size in pbuf_init, that should easily satisfy all of pbuf > > > >> descendants. Cluster and vnode/swap pagers code are pbuf descendants > > > >> also. Vnode pager I guess may only need replacement for > > > >> nitems(bp->b_pages) in few places. > > > > I planned to look at making MAXPHYS a tunable. > > > > > > > > You are right, we would need: > > > > 1. move b_pages to the end of struct buf and declaring it as flexible. > > > > This would make KBI worse because struct buf depends on some debugging > > > > options, and than b_pages offset depends on config. > > > > > > > > Another option could be to change b_pages to pointer, if we are fine > > with > > > > one more indirection. But in my plan, real array is always allocated > > past > > > > struct buf, so flexible array is more correct even. > > > > > > > > > > I like this, and I was in the middle of writing up an email that > > described it. > > > There could be multiple malloc types or UMA zones of different sizes, > > > depending on the intended i/o size, or just a runtime change to the size > > of > > > a single allocation size. > > I do not think we need new/many zones. > > > > Queued (getnewbuf()) bufs come from buf_zone, and pbufs are allocated > > from pbuf_zone. That should be fixed alloc size, with small b_pages[] > > for buf_zone, and large (MAXPHYS) for pbuf. > > > > Everything else, if any, would need to pre-calculate malloc size. > > > > How will this affect clustered reads for things like read ahead? kern/vfs_cluster.c uses pbufs to create temporal buf by combining pages from the constituent normal (queued) buffers. According to the discussion, pbufs would have b_pages[]/KVA reserved by MAXPHYS. This allows cluster to fill the large request for read-ahead or background write. > > > > > > > 2. Preallocating both normal bufs and pbufs together with the arrays. > > > > > > > > 3. I considered adding B_SMALLPAGES flag to b_flags and use it to > > indicate > > > > that buffer has 'small' b_pages. All buffers rotated through > > getnewbuf()/ > > > > buf_alloc() should have it set. > > > > > > > > > > This would work nicely with a variable sized allocator, yes. > > > > > > > 4. There could be some places which either malloc() or allocate struct > > buf > > > > on stack (I tend to believe that I converted all later places to > > formed). > > > > They would need to get special handling. > > > > > > > > > > I couldn’t find any places that allocated a buf on the stack or embedded > > it > > > into another structure. > > As I said, I did a pass to eliminate stack allocations for bufs. > > As result, for instance flushbufqueues() mallocs struct buf, but it does > > not use b_pages[] of the allocated sentinel. > > > > Yea. I recall both the pass and looking for them later and not finding any > either... > > > > > > > md(4) uses pbufs. > > > > > > > > 4. My larger concern is, in fact, cam and drivers. > > > > > > > > > > Can you describe your concern? > > My opinion is that during this work all uses of MAXPHYS should be reviewed, > > and there are a lot of drivers that reference the constant. From the past > > experience, I expect some evil ingenious (ab)use. > > > > Same for bufs, but apart from application of pbufs in cam_periph.c, I do > > not > > think drivers have much use of it. > > > > Do you have precise definitions for DFLTPHYS and MAXPHYS? That might help > ferret out the differences between the two. I have seen several places that > use one or the other of these that seem incorrect, but that I can't quite > articulate precisely why... having a good definition articulated would > help. There are some places that likely want a fixed constant to reflect > hardware, not a FreeBSD tuning parameter. Right now VFS guarantees that it never creates io request (bio ?) larger than MAXPHYS. In fact, VMIO buffers simply do not allow to express such request because there is no place to put more pages. DFLTPHYS seems to be only used by drivers (and some geoms), and typical driver' usage of it is to clamp the max io request more than MAXPHYS. I see that dump code tries to not write more than DFLTPHYS one time, to ease life of drivers, and physio() sanitize maxio at DFLTPHYS, but this is for really broken drivers. > > As an aside, there are times I want to do transfers of arbitrary sizes for > certain pass through commands that are vendor specific and that have no way > to read the results in chunks. Thankfully most newer drives don't have this > restriction, but it still comes up. But that's way below the buf layer and > handled today by cam_periph and the pbufs there. These types of operations > are rare and typically when the system is mostly idle, so low memory > situations can be ignored beyond error handling and retry in the user > program. Would this work make those possible? Or would MAXPHYS, however > set, still limit them? MAXPHYS would still limit them, at least in the scope of work we are discussing. > > Warner > > > > > > >> > > > >> Could you or somebody help with vfs/ffs code, where I suppose the > > > >> smaller page lists are used? > > > > Do you plan to work on this ? I can help, sure. > > > > > > > > Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same > > as > > > > DFLPHYS), a tunable, in the scope of this work. > > > > > > Sounds great, thank you for looking at it. > > > > > > Scott > > > > > _______________________________________________ > > freebsd-arch@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > From owner-freebsd-arch@freebsd.org Sat Nov 14 22:50:21 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 4B2C64677A5 for ; Sat, 14 Nov 2020 22:50:21 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CYVsn0sKrz4WW2; Sat, 14 Nov 2020 22:50:20 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (v-critter.freebsd.dk [192.168.55.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by phk.freebsd.dk (Postfix) with ESMTPS id C246A8B757; Sat, 14 Nov 2020 22:50:12 +0000 (UTC) Received: (from phk@localhost) by critter.freebsd.dk (8.16.1/8.16.1/Submit) id 0AEMoAVu015970; Sat, 14 Nov 2020 22:50:10 GMT (envelope-from phk) To: Konstantin Belousov cc: Warner Losh , Scott Long , Alexander Motin , "freebsd-arch@freebsd.org" Subject: Re: MAXPHYS bump for FreeBSD 13 In-reply-to: From: "Poul-Henning Kamp" References: <634E35E9-9E2B-40CE-9C70-BB130BD9D614@samsco.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <15968.1605394210.1@critter.freebsd.dk> Date: Sat, 14 Nov 2020 22:50:10 +0000 Message-ID: <15969.1605394210@critter.freebsd.dk> X-Rspamd-Queue-Id: 4CYVsn0sKrz4WW2 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 22:50:21 -0000 -------- Konstantin Belousov writes: > DFLTPHYS seems to be only used by drivers (and some geoms), and typical > driver' usage of it is to clamp the max io request more than MAXPHYS. > I see that dump code tries to not write more than DFLTPHYS one time, to > ease life of drivers, and physio() sanitize maxio at DFLTPHYS, but this > is for really broken drivers. DFLTPHYS is the antique version of g_provider->stripesize, and should be replaced by it throughout. The history behind DFLTPHYS is that tape-drives were limited to MAXPHYS sized tape-blocks, so you wanted it large. For performance reasons disk operations should not span cylinders, a topic I'm sure Kirk can elaborate on if provoked, so DFLTPHYS was reduce them to a tunable size. Peak performance was when fs-blocks divided DFLTPHYS and DFLTPHYS divided the cylinder of the disk. Seagate ST82500[1] with standard formatting had 0x616 sectors per cylinder, (19 heads, 82 sectors each). Formatting with a generous 22 spare sectors per cylinder brought the "usable" cylindersize down to precisly 0x600 sectors, which resulted in around 5-10% higher overall system performance on a heavily loaded Tahoe. Poul-Henning [1] The STI82500 "Sabre" is amusingly available for order in certain web-shops but, alas, "not currently in stock". -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@freebsd.org Sat Nov 14 23:40:48 2020 Return-Path: Delivered-To: freebsd-arch@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 2DDC346884B for ; Sat, 14 Nov 2020 23:40:48 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-qk1-x731.google.com (mail-qk1-x731.google.com [IPv6:2607:f8b0:4864:20::731]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CYX000BvTz4YZK for ; Sat, 14 Nov 2020 23:40:47 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: by mail-qk1-x731.google.com with SMTP id n132so13290385qke.1 for ; Sat, 14 Nov 2020 15:40:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:cc:references:from:autocrypt:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=B72s1xNQtPwQ6zSQGeU3wLJhga3fOFbbeLMIYDaXYNM=; b=C0kqQPfvKStCwDp2bN6i/RAHUZX3RJi7Henn61svEnWOX8FuZHgPBV8JyjzyOhVG5D cx9QU7atEPD/DaohAR9lXtZlaTtnCAY7GRazRN9CN0s18Rm4aVrvPhxe9uIXEPWbhhog PBY/UJwcAziWOPunCez6EeZQHA1yOLl+//T8+maClZzau0LqHf1O87/odHxZy1GZhiGc EM7yq0ACOIsTWhCXTX0R0NHsPJEod2xETgAtJHfO5shbNzx897BkbaRaJglj/3ZWUDh7 k3MGPcYDU6hb72poPXUYzbrkCV7QK303tby8TvNL6YrUCK3co+PDytUWehtnRBQSDFTK eUeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:autocrypt :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=B72s1xNQtPwQ6zSQGeU3wLJhga3fOFbbeLMIYDaXYNM=; b=Fc+w7jW8PMyUaIcAqLYX3b6FJiWO8HQytv05E+LeGw1PGRpMsL/yMmUKoIK+0ZorVH eTae8nFacxBMrGWvA0lAWSgiWL/ZagYI+eMvAY9tOQCwf1J/PMVSBkalxHmbfoLPSMXQ e6EqLWKuzii29oBF/Boo70RGL59vFmmkYOuhDaGz0mETgGcLjF2nrIIHcu9G7Akwb5lJ AbJF1rN983UdxeC1YEXNr7hBS2uJH2WEAp5lfAq9WoAkXye0qM7aPTVT3xq/mlY2K2wn COLE5jgvnBuETtIFWrSWstzxy9ITUbaiLhBrLOemMby52+ywyF5mGZhDhBA1iSVcE+yA diIQ== X-Gm-Message-State: AOAM531H3TcfAthZPxjTGhNbsajrA0+g5TPUxNpyXqqrNVdPJErDbSsE CwuLGnyA1jSGa2rFq9h0yCcxKlhYXuTxXg== X-Google-Smtp-Source: ABdhPJxA+HSxSTbYn/E1OdRGs9DBWi/X1/rbUeSkjpmw92cJO5hVYtHr7wGmWCUgxJzwIxOwSTzq3Q== X-Received: by 2002:a05:620a:c9a:: with SMTP id q26mr7956733qki.272.1605397246099; Sat, 14 Nov 2020 15:40:46 -0800 (PST) Received: from spectre.mavhome.dp.ua ([2600:1700:3580:3560:228:f8ff:fe04:d12]) by smtp.gmail.com with ESMTPSA id d16sm9756250qkc.58.2020.11.14.15.40.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 14 Nov 2020 15:40:45 -0800 (PST) Sender: Alexander Motin Subject: Re: MAXPHYS bump for FreeBSD 13 To: Konstantin Belousov Cc: "freebsd-arch@freebsd.org" References: From: Alexander Motin Autocrypt: addr=mav@FreeBSD.org; prefer-encrypt=mutual; keydata= mQENBFOzxAwBCADkPrax0pI2W/ig0CK9nRJJwsHitAGEZ2HZiFEuti+6/4UVxj81yr4ak/4g 9bKUyC7rMEAp/ZHNhd+MFCPAAcHPvtovnfykqE/vuosCS3wlSLloix2iKVLks0CwbLHGAyne 46lTQW74Xl/33c3W1Z6d8jD9gVFT/xaVzZ0U9xdzOmsYAZaAj4ki0tuxO9F7L+ct9grRe7iP g8t9hai7BL4ee3VRwk2JXnKb7UvBiVITKYWKz1jRvZIrjPokgEcCLOSlv7x/1kjuFnj3xWZU 7HSFFT8J93epBbrSSCsYsppIk2fZH41kaaFXsMQfTPH8wkeM6qwrvOh4HiQM08R+9tThABEB AAG0IUFsZXhhbmRlciBNb3RpbiA8bWF2QEZyZWVCU0Qub3JnPokBVwQTAQoAQQIbAwULCQgH AwUVCgkICwUWAwIBAAIeAQIXgAIZARYhBOmM88TmnMPNDledVYMYw5VbqyJ/BQJZYMKuBQkN McyiAAoJEIMYw5VbqyJ/tuUIAOG3ONOSNYqjK4eTZ1TVh9jdUBAhWk5nhDFnODN49Wj0AbYm 7aIqy8O1hnCDSZG5LttjSAo3UfXJZDKQM0BLb0gpRMBnAYqO6tdolLNqAbPGJBnGoPjsh24y 6KcbDaNnis+lD4GwPXwQM+92wZGhCUFElPV9NciZGVS65TNIgk7X+yEjjhD1MSWKKijZ1r9Z zIt4OzUTxxNOvzdlABZS88nNRdJkatOQJPmFdd1mpP6UzTNCiLUo1pIqOEtJgvVVDYq5WHY6 tciWWYdmZG/tIBexJmv2mV2OLVjXR6ZeKmntVH14H72/wRHJuYHQC+r5SVRcWWayrThsY6jZ Yr4+raS5AQ0EU7PEDAEIAOZgWf2cJIu+58IzP2dkXE/urj3tr4OqrB/yHGWUf71Lz6D0Fi6Z AXgDtmcFLGPfMyWuLAvSM+xmoguk7zC4hRBYvQycmIhuqBq1jO1Wp/Z+lpoPM/1cDYLn8Flv mI/c40MhUZh345DA4jYWWaZNjQHUWVQ1fPf595vdVVMPT/abE8E5DaF6fSkRmqFTmfYRkfbt 3ytU8NdUapDcJVY7cEP2nJBVNZPnOIObR/ZIgSxjjrG5o34yXoqeup8JvwEv+/NylzzuyXEZ R1EdEIzQ/a1nh/0j4NXtzZEqKW4aTWlmSqb6wN8jh1OSOOqkYsfnE3nfxcZbxi4IRoNQYlm5 9R8AEQEAAYkBPAQYAQoAJgIbDBYhBOmM88TmnMPNDledVYMYw5VbqyJ/BQJZYMLYBQkNMczM AAoJEIMYw5VbqyJ/TqgH/RQHClkvecE0262lwKoP/m0Mh4I5TLRgoJJn8S7G1BnqohYJkiLq A6xe6urGD7OqdNAl12UbrjWbdJV+zvea3vJoM4MZuYiYrGaXWxzFXqWJcPwMU9sAh8MRghHu uC5vgPb45Tnftw9/+n0i8GfVhQhOqepUGdQg4NPcXviSkoAvig6pp9Lcxisn0groUQKt15Gc sS9YcQWg3j9Hnipc6Mu416HX98Fb113NHJqc2geTHLkRyuBFOoyIqB6N9GKjzOAIzxxsVdl9 TevwGsrp4M4/RFzWbSgsbOnbE7454lmuVZGfReEjnUm8RHp9Q2UWKXlp3exlZjvOp/uVEpCg lz65AQ0EU7PEDAEIAOZgWf2cJIu+58IzP2dkXE/urj3tr4OqrB/yHGWUf71Lz6D0Fi6ZAXgD tmcFLGPfMyWuLAvSM+xmoguk7zC4hRBYvQycmIhuqBq1jO1Wp/Z+lpoPM/1cDYLn8FlvmI/c 40MhUZh345DA4jYWWaZNjQHUWVQ1fPf595vdVVMPT/abE8E5DaF6fSkRmqFTmfYRkfbt3ytU 8NdUapDcJVY7cEP2nJBVNZPnOIObR/ZIgSxjjrG5o34yXoqeup8JvwEv+/NylzzuyXEZR1Ed EIzQ/a1nh/0j4NXtzZEqKW4aTWlmSqb6wN8jh1OSOOqkYsfnE3nfxcZbxi4IRoNQYlm59R8A EQEAAYkBPAQYAQoAJgIbDBYhBOmM88TmnMPNDledVYMYw5VbqyJ/BQJZYMLYBQkNMczMAAoJ EIMYw5VbqyJ/TqgH/RQHClkvecE0262lwKoP/m0Mh4I5TLRgoJJn8S7G1BnqohYJkiLqA6xe 6urGD7OqdNAl12UbrjWbdJV+zvea3vJoM4MZuYiYrGaXWxzFXqWJcPwMU9sAh8MRghHuuC5v gPb45Tnftw9/+n0i8GfVhQhOqepUGdQg4NPcXviSkoAvig6pp9Lcxisn0groUQKt15GcsS9Y cQWg3j9Hnipc6Mu416HX98Fb113NHJqc2geTHLkRyuBFOoyIqB6N9GKjzOAIzxxsVdl9Tevw Gsrp4M4/RFzWbSgsbOnbE7454lmuVZGfReEjnUm8RHp9Q2UWKXlp3exlZjvOp/uVEpCglz4= Message-ID: Date: Sat, 14 Nov 2020 18:40:44 -0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4CYX000BvTz4YZK X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2020 23:40:48 -0000 On 14.11.2020 13:37, Konstantin Belousov wrote: > On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote: > 4. My larger concern is, in fact, cam and drivers. I am actually the least concerned about this part. I've already reviewed/cleaned it once, and can do again if needed. We have some drivers unaware about MAXPHYS, and they should safely be limited to DFLTPHYS, the others should properly adapt. And if you like to make MAXPHYS tunable -- I'd be happy to take this part. >> Could you or somebody help with vfs/ffs code, where I suppose the >> smaller page lists are used? > Do you plan to work on this ? I can help, sure. Honestly, I haven't planned it. But if that is a price to finally close this topic forever, I could probably figure out something. Otherwise I was mostly looking for somebody to take this part of the project into capable hands. > Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same as > DFLPHYS), a tunable, in the scope of this work. +1 -- Alexander Motin