From owner-freebsd-stable@freebsd.org  Mon Jul 13 08:42:17 2020
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id B9BD935CE2D
 for <freebsd-stable@mailman.nyi.freebsd.org>;
 Mon, 13 Jul 2020 08:42:17 +0000 (UTC)
 (envelope-from bsd-lists@BSDforge.com)
Received: from udns.ultimatedns.net (static-24-113-41-81.wavecable.com
 [24.113.41.81])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "ultimatedns.net",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4B4xwT2Xqcz4cC3
 for <freebsd-stable@freebsd.org>; Mon, 13 Jul 2020 08:42:16 +0000 (UTC)
 (envelope-from bsd-lists@BSDforge.com)
Received: from udns.ultimatedns.net (localhost [IPv6:0:0:0:0:0:0:0:1])
 by udns.ultimatedns.net (8.15.2/8.15.2) with ESMTPS id 06D8gXBG086110
 (version=TLSv1.2 cipher=DHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO);
 Mon, 13 Jul 2020 01:42:46 -0700 (PDT)
 (envelope-from bsd-lists@BSDforge.com)
X-Mailer: Cypht
MIME-Version: 1.0
Cc: <freebsd-stable@freebsd.org>, <dwilde1@gmail.com>
In-Reply-To: <202007130545.06D5jaOj023832@sdf.org>
From: Chris <bsd-lists@BSDforge.com>
Reply-To: bsd-lists@BSDforge.com
To: Scott Bennett <bennett@sdf.org>
Subject: Re: swap space issues
Date: Mon, 13 Jul 2020 01:42:39 -0700
Message-Id: <e602bfe867c0e6fe270442b6809d40b3@udns.ultimatedns.net>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 4B4xwT2Xqcz4cC3
X-Spamd-Bar: /
Authentication-Results: mx1.freebsd.org;
	none
X-Spamd-Result: default: False [0.00 / 15.00];
 ASN(0.00)[asn:11404, ipnet:24.113.0.0/16, country:US];
 local_wl_ip(0.00)[24.113.41.81]
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.33
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Jul 2020 08:42:17 -0000

On Mon, 13 Jul 2020 00:45:36 -0500 Scott Bennett bennett@sdf=2Eorg said

> Don Wilde <dwilde1@gmail=2Ecom> wrote:
>=20
> >
> > On 7/11/20 11:28 PM, Scott Bennett via freebsd-stable wrote:
> > >       I have read this entire thread to date with growing dismay, and=
 I
> > > thank Donald Wilde for reporting his ongoing troubles, although they
> > > spoil my hopes that the kernel's memory management bugs that first be=
came
> > > apparent in 11=2E2-RELEASE (and -STABLE around the same time) were not
> > > propagated into 12=2Ex=2E  A recent update to stable/12 source tree made =
it
> > > finally possible for me to build 12=2E1-STABLE under 11=2E4-PRERELEASE, a=
nd I
> > > was just about to install the upgrade when this thread appeared=2E
> > Spoiler alert=2E Since I gave up on Synth, I haven't had a single swap=20
> > issue=2E It does appear to be one particular port that drove it nuts=20
> > (apparently, one of the 'Google performance' bits, with a=20
> > mismatched-brackets problem)=2E I have rebuilt the machine several times,=
=20
> > but that's more for my sense of tidiness than anything=2E
> >
> > I've got a little Crystal script that walks the installed packages and=
=20
> > ports and updates them with system() calls=2E
> > The machine is very slow, but it's not swapping at all=2E
>=20
>     That's good=2E  I use portmaster, but not often at present because a
> "portmaster -a" run can only be done two or three times per boot before r=
eal
> memory is locked down to the extent that the system is no longer function=
al
> (i=2Ee=2E, even a scrub of ZFS pools comes to a halt in mid scrub due to lack=
 of
> a
> sufficient supply of free page frames)=2E
>     The build procedures of certain ports consistently get killed by the
> OOM
> killer, along with much collateral damage=2E  I've noticed that lang/golang
> and
> lang/rust are prime examples now, although both used to build without
> problems=2E
> >
> > It is quite usable now with 12-STABLE=2E
>=20
>     I don't see any good reason to go through the hassle and lost time of
> an
> upgrade across a major release boundary if I still won't have a productio=
n
> OS
> afterward=2E  I'm already dealing with a graphics stack rendered unsafe to =
use
> by
> the ongoing churn in X11 code=2E  (See PR #247441, kindly filed for me by P=
au
> Amma=2E)
> > >
> > >       On Fri, 26 Jun 2020 03:55:04 -0700 : Donald Wilde <dwilde1@gmai=
l=2Ecom>
> > > wrote:
> > >
> > >> On 6/26/20, Peter Jeremy <peter@rulingia=2Ecom> wrote:
> > >>>
> > [snip]
> > >>> I strongly suggest you don't have more than one swap device on spin=
ning
> > >>> rust - the VM system will stripe I/O across the available devices a=
nd
> > >>> that will give particularly poor results when it has to seek betwee=
n the
> > >>> partitions=2E
> > >       True=2E  The only reason I can think of to use more than one swap=
ping/
> > > paging area on the same device for the same OS instance is for emerge=
ncies
> > > or highly unusual, temporary situations in which more space is needed
> > until
> > > those situations conclude=2E and even in such situations, if the space =
can
> > be
> > > found on another device, it should be placed there=2E  Interleaving of =
swap
> > > space across multiple devices is intended as a performance enhancemen=
t
> > > akin to striping (a=2Ek=2Ea=2E RAID0), although the virtual memory isn't
> > > necessarily always actually striped across those devices=2E  Adding a p=
aging
> > > area on the same device as an existing one is an abhorrent situation,=
 as
> > > Peter Jeremy noted, and it should be eliminated via swapoff(8) as soo=
n as
> > > the extraordinary situation has passed=2E  N=2EB=2E the GENERIC kernel sets=
 a
> > > limit of four swap devices, although it can be rebuilt with a differe=
nt
> > > limit=2E
> > That's good data, Scott, thanks! The only reason I got into that=20
> > situation of trying to add another swap device was that it was crashing=
=20
> > with OO swap messages=2E
>=20
>     I don't recall you posting those messages, but it sounds like exactly
> the
> *temporary* situation in which adding an inappropriately placed paging ar=
ea
> can
> be used long enough to get you out of a bind without a reboot, even thoug=
h
> performance will probably suffer until you have removed it again=2E  Poor
> performance is usually preferable to no performance if it is only tempora=
ry=2E
>     One cautionary note in such situations, though, applies to remote
> paging
> areas=2E  Sparse files allocated on the remote system should not be used as
> paging areas=2E  For example, I discovered the hard way (i=2Ee=2E, the problem =
was
> not documented) that SunOS would crash if a sparse file via NFS were adde=
d
> as
> a paging area and the SunOS system tried to write a page out to an
> unallocated
> region of the file, which was essentially all of the file at first=2E
>=20
> > >> My intent is to make this machine function -- getting the bear
> > >> dancing=2E How deftly she dances is less important than that she dance=
s
> > >> at all=2E My for-real boxen will have real HP and real cores and RAM=2E
> > >>
> > >>> Also, you can't actually use 64GB swap with 4GB RAM=2E  If you look b=
ack
> > >>> through your boot messages, I expect you'll find messages like:
> > >>> warning: total configured swap (524288 pages) exceeds maximum
> > recommended
> > >>> amount (498848 pages)=2E
> > >>> warning: increase kern=2Emaxswzone or reduce amount of swap=2E
> > >       Also true=2E  Unfortunately, no guidance whatsoever is provided t=
o advise
> > > system administrators who need more space as to how to increase the
> > relevant
> > > table sizes and limits=2E  However, that is a documentation bug, not a =
code
> > > bug=2E
> > I've got both my kern=2Emax* and CCACHE set up mostly correctly=2E=20
> > Everything builds and runs well, although I've found that it's helpful=
=20
> > to only use -j3 while building, not -j4 which would be appropriate for=
=20
> > my HAMMER i3=2E I'd much rather have the bear *dancing* than running into=
=20
> > walls=2E :D
>=20
>     I have encountered many ports where MAKE_JOBS_UNSAFE should have been
> set,
> but hadn't been=2E  If you have installed ports-mgmt/portcont, you can set =
this
> on
> a per-port basis as you encounter these ports=2E  There are others that fai=
l
> to
> build with MAKE_JOBS_NO >=3D 4, but will build just fine with MAKE_JOBS_N=
O=3D3 or
> 2=2E
> However, such failures to build are usually timing problems where one
> process
> tries to put a file into a directory that doesn't exist yet or to read a
> file
> that hasn't yet been created=2E  These are not situations involving the OOM
> killer=2E
> If you'd like the lines from my /usr/local/etc/ports=2Econf file for those
> I've
> encountered to date, just email me privately for them=2E
>=20
> > >> Yes, as I posted, those were part of the failure stream from the syn=
th
> > >> program=2E When I had kern=2Emaxswzone increased, it got through boot
> > >> without complaining=2E
> > >>
> > >>> or maybe:
> > >>> WARNING: reducing swap size to maximum of xxxxMB per unit
> > >> The warnings were there, in the as-it-failed complaints=2E
> > >>
> > >>> The absolute limit on swap space is vm=2Eswap_maxpages pages but the
> > >>> realistic
> > >>> limit is about half that=2E  By default the realistic limit is about =
4?RAM
> > >>> (on
> > >>> 64-bit architectures), but this can be adjusted via kern=2Emaxswzone
> > (which
> > >>> defines the #bytes of RAM to allocate to swzone structures - the ac=
tual
> > >>> space allocated is vm=2Eswzone)=2E
> > >>>
> > >>> As a further piece of arcana, vm=2Epageout_oom_seq is a count that
> > controls
> > >>> the number of passes before the pageout daemon gives up and starts
> > killing
> > >>> processes when it can't free up enough RAM=2E  "out of swap space"
> > messages
>=20
>     Yeah, those messages are half truth and half lie=2E  The true part is
> that
> the processes mentioned have indeed been killed=2E  The lie is that the sys=
tem
> is
> out of swap space=2E  (I have seen these messages issued with as little as =
217
> MB
> in use out of 24 GB available on my system=2E)  The kernel might not always
> provide
> all relevant information in error messages, but it should *never* LIE to =
us=2E
>=20
> > >>> generally mean that this number is too low, rather than there being=
 a
> > >>> shortage of swap - particularly if your swap device is rather slow=2E
> > >>>
> > >> Thanks, Peter!
> > >       A second round of thanks to Peter Jeremy for pointing out this =
sysctl
> > > variable (vm=2Epageout_oom_seq), although thus far I have yet to see th=
at it
> > is
> > > actually effective in working around the memory management bugs=2E  I h=
ave
> > added
> > > the following lines to /etc/sysctl=2Econf=2E
> > >
> > > # Because FreeBSD 11=2E{2,3,4} tie up page frames unnecessarily, set va=
lue
> > high
> > > #vm=2Epageout_wakeup_thresh=3D14124 # Default value
> > > vm=2Epageout_wakeup_thresh=3D112640 # 410 MB
> >
> > [snip]
> >
> > I do totally agree that these are crucial issues for both operation and=
=20
> > documentation, although my issues stemmed from bad _userland_ stack=20
> > control=2E
>=20
>     Yes, this is a frequent problem I've observed in the attitudes of
> programmers
> who never experienced working with real-memory-only OS=2E  They often lack =
any
> awareness of wasteful memory usage, ordering of array accesses, locality =
of
> reference issues, etc=2E, resulting in truly ridiculous amounts of bloat an=
d
> lost
> performance, not to mention the failures to perform at all such as you
> encountered=2E
> In their minds, virtual memory frees them from all concerns about these
> issues, so
> their schoolteachers, now brought up the same way, don't even teach them
> about such
> things and perhaps still don't know about them themselves=2E
Feeling the same way=2E C++ IMHO was the beginning of the end -- abstraction =
/
objects do not lead to a better understanding of what you're doing, if you'=
ve
never worked on "bare metal" (at the "chip" level)=2E Those w/o knowledge in
assembler never really fully understand what their doing=2E
Sorry=2E Couldn't resist=2E

>     Another problem, especially with programmers whose memories have not
> yet
> accumulated many painful experiences, is the attraction toward newer, mor=
e
> exciting
> features accompanied by a disinterest in tracking down and fixing existin=
g
> bugs,
> even fairly critical bugs=2E  This problem, if left unchecked by management=
,
> can lead
> to terrible predicaments like the one FreeBSD is in now, namely, having n=
o
> production releases being supported=2E  DragonflyBSD, NetBSD, and OpenBSD d=
o
> not,
> AFAIK, suffer from this predicament at present=2E  They are behind to varyi=
ng
> degrees
> in terms of newer, more exciting features, but at least they appear to wo=
rk=2E=20
> For
> example, sdf=2Eorg has well over 70,000 users and runs quite a few servers =
to
> do so=2E
> It runs
>=20
> NetBSD miku 8=2E1_STABLE NetBSD 8=2E1_STABLE (GENERIC) #0: Wed Sep 11 03:47:4=
5
> UTC 2019  root@ol:/sdf/sys/NetBSD-8/sys/arch/amd64/compile/GENERIC amd64
>=20
> at present=2E  (miku=2Esdf=2Eorg is one of the servers=2E)  Its uptime is current=
ly
> 306 days=2E
> They run several VMs of FreeBSD, OpenBSD, LINUX, and possibly others on s=
ome
> of the
> servers=2E  ZFS appeared in NetBSD 9=2E0=2E  I don't know the sysadmin's reason=
s
> for not
> upgrading to it so far, but I suspect they have to do with the number of
> systems to
> upgrade, the fact that it is a =2E0 release, and that root on ZFS and ZFS b=
oot
> environments are not yet supported, as used to be the case with FreeBSD=2E =
 I'm
> not
> ready to switch to NetBSD quite yet and would not enjoy doing so, but it =
has
> been
> a steadily improving alternative to FreeBSD of late, and if FreeBSD does =
not
> release
> a production system in the meantime, NetBSD may become a better choice fo=
r
> many of
> us who want to run a production OS=2E  It also offers an alternative to
> Micro$lop for
> the so-called "Internet of Things", which no other FOSS OS does, AFAIK,
> although I
> don't know enough about LINUX to be sure=2E
> >
> > Those who live on -CURRENT are used to OOPS, but the rest of us get pai=
d=20
> > not to have them=2E
>=20
>     I've been using -STABLE for the last several major releases, but beca=
use
> of
> the vast numbers of conflicts and failures buried throughout the ports tr=
ee
> and
> the horrendous amount of time it takes to rebuild most of my installed po=
rts
> I am
> considering surrendering to using -RELEASE and using quarterly packages, =
in
> spite
> of the loss of features that doing so entails=2E  That would still not deal
> with the=20
> dependency conflicts or the installation of identically named files by
> different
> ports, but it would reduce the time spent on building ports that fail to
> install=2E
> >
> > I am happy with what the Core Team gives us, AND of course we want=20
> > ['more','better','faster','STABLE']=2E :D
> >
>     As Mark Linimon pointed out, the Core Team only does that indirectly=2E=
=20
> However,
> it is the Core Team's job to give firm direction or redirection to those =
who
> do the
> designing and coding to avoid regressions, avoid ignoring the introductio=
n of
> bugs,
> especially those that render a system unfit for production use, enhance
> testing,
> and so on=2E
>=20
>=20
>                                  Scott Bennett, Comm=2E ASMELG, CFIAG
> **********************************************************************
> * Internet:   bennett at sdf=2Eorg   *xor*   bennett at freeshell=2Eorg  *
> *--------------------------------------------------------------------*
> * "A well regulated and disciplined militia, is at all times a good  *
> * objection to the introduction of that bane of all free governments *
> * -- a standing army=2E"                                               *
> *    -- Gov=2E John Hancock, New York Journal, 28 January 1790         *
> **********************************************************************

--Chris