From owner-freebsd-stable@freebsd.org Mon Jul 13 08:42:17 2020 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B9BD935CE2D for ; Mon, 13 Jul 2020 08:42:17 +0000 (UTC) (envelope-from bsd-lists@BSDforge.com) Received: from udns.ultimatedns.net (static-24-113-41-81.wavecable.com [24.113.41.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "ultimatedns.net", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4B4xwT2Xqcz4cC3 for ; Mon, 13 Jul 2020 08:42:16 +0000 (UTC) (envelope-from bsd-lists@BSDforge.com) Received: from udns.ultimatedns.net (localhost [IPv6:0:0:0:0:0:0:0:1]) by udns.ultimatedns.net (8.15.2/8.15.2) with ESMTPS id 06D8gXBG086110 (version=TLSv1.2 cipher=DHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Mon, 13 Jul 2020 01:42:46 -0700 (PDT) (envelope-from bsd-lists@BSDforge.com) X-Mailer: Cypht MIME-Version: 1.0 Cc: , In-Reply-To: <202007130545.06D5jaOj023832@sdf.org> From: Chris Reply-To: bsd-lists@BSDforge.com To: Scott Bennett Subject: Re: swap space issues Date: Mon, 13 Jul 2020 01:42:39 -0700 Message-Id: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4B4xwT2Xqcz4cC3 X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [0.00 / 15.00]; ASN(0.00)[asn:11404, ipnet:24.113.0.0/16, country:US]; local_wl_ip(0.00)[24.113.41.81] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jul 2020 08:42:17 -0000 On Mon, 13 Jul 2020 00:45:36 -0500 Scott Bennett bennett@sdf=2Eorg said > Don Wilde wrote: >=20 > > > > On 7/11/20 11:28 PM, Scott Bennett via freebsd-stable wrote: > > > I have read this entire thread to date with growing dismay, and= I > > > thank Donald Wilde for reporting his ongoing troubles, although they > > > spoil my hopes that the kernel's memory management bugs that first be= came > > > apparent in 11=2E2-RELEASE (and -STABLE around the same time) were not > > > propagated into 12=2Ex=2E A recent update to stable/12 source tree made = it > > > finally possible for me to build 12=2E1-STABLE under 11=2E4-PRERELEASE, a= nd I > > > was just about to install the upgrade when this thread appeared=2E > > Spoiler alert=2E Since I gave up on Synth, I haven't had a single swap=20 > > issue=2E It does appear to be one particular port that drove it nuts=20 > > (apparently, one of the 'Google performance' bits, with a=20 > > mismatched-brackets problem)=2E I have rebuilt the machine several times,= =20 > > but that's more for my sense of tidiness than anything=2E > > > > I've got a little Crystal script that walks the installed packages and= =20 > > ports and updates them with system() calls=2E > > The machine is very slow, but it's not swapping at all=2E >=20 > That's good=2E I use portmaster, but not often at present because a > "portmaster -a" run can only be done two or three times per boot before r= eal > memory is locked down to the extent that the system is no longer function= al > (i=2Ee=2E, even a scrub of ZFS pools comes to a halt in mid scrub due to lack= of > a > sufficient supply of free page frames)=2E > The build procedures of certain ports consistently get killed by the > OOM > killer, along with much collateral damage=2E I've noticed that lang/golang > and > lang/rust are prime examples now, although both used to build without > problems=2E > > > > It is quite usable now with 12-STABLE=2E >=20 > I don't see any good reason to go through the hassle and lost time of > an > upgrade across a major release boundary if I still won't have a productio= n > OS > afterward=2E I'm already dealing with a graphics stack rendered unsafe to = use > by > the ongoing churn in X11 code=2E (See PR #247441, kindly filed for me by P= au > Amma=2E) > > > > > > On Fri, 26 Jun 2020 03:55:04 -0700 : Donald Wilde > > > wrote: > > > > > >> On 6/26/20, Peter Jeremy wrote: > > >>> > > [snip] > > >>> I strongly suggest you don't have more than one swap device on spin= ning > > >>> rust - the VM system will stripe I/O across the available devices a= nd > > >>> that will give particularly poor results when it has to seek betwee= n the > > >>> partitions=2E > > > True=2E The only reason I can think of to use more than one swap= ping/ > > > paging area on the same device for the same OS instance is for emerge= ncies > > > or highly unusual, temporary situations in which more space is needed > > until > > > those situations conclude=2E and even in such situations, if the space = can > > be > > > found on another device, it should be placed there=2E Interleaving of = swap > > > space across multiple devices is intended as a performance enhancemen= t > > > akin to striping (a=2Ek=2Ea=2E RAID0), although the virtual memory isn't > > > necessarily always actually striped across those devices=2E Adding a p= aging > > > area on the same device as an existing one is an abhorrent situation,= as > > > Peter Jeremy noted, and it should be eliminated via swapoff(8) as soo= n as > > > the extraordinary situation has passed=2E N=2EB=2E the GENERIC kernel sets= a > > > limit of four swap devices, although it can be rebuilt with a differe= nt > > > limit=2E > > That's good data, Scott, thanks! The only reason I got into that=20 > > situation of trying to add another swap device was that it was crashing= =20 > > with OO swap messages=2E >=20 > I don't recall you posting those messages, but it sounds like exactly > the > *temporary* situation in which adding an inappropriately placed paging ar= ea > can > be used long enough to get you out of a bind without a reboot, even thoug= h > performance will probably suffer until you have removed it again=2E Poor > performance is usually preferable to no performance if it is only tempora= ry=2E > One cautionary note in such situations, though, applies to remote > paging > areas=2E Sparse files allocated on the remote system should not be used as > paging areas=2E For example, I discovered the hard way (i=2Ee=2E, the problem = was > not documented) that SunOS would crash if a sparse file via NFS were adde= d > as > a paging area and the SunOS system tried to write a page out to an > unallocated > region of the file, which was essentially all of the file at first=2E >=20 > > >> My intent is to make this machine function -- getting the bear > > >> dancing=2E How deftly she dances is less important than that she dance= s > > >> at all=2E My for-real boxen will have real HP and real cores and RAM=2E > > >> > > >>> Also, you can't actually use 64GB swap with 4GB RAM=2E If you look b= ack > > >>> through your boot messages, I expect you'll find messages like: > > >>> warning: total configured swap (524288 pages) exceeds maximum > > recommended > > >>> amount (498848 pages)=2E > > >>> warning: increase kern=2Emaxswzone or reduce amount of swap=2E > > > Also true=2E Unfortunately, no guidance whatsoever is provided t= o advise > > > system administrators who need more space as to how to increase the > > relevant > > > table sizes and limits=2E However, that is a documentation bug, not a = code > > > bug=2E > > I've got both my kern=2Emax* and CCACHE set up mostly correctly=2E=20 > > Everything builds and runs well, although I've found that it's helpful= =20 > > to only use -j3 while building, not -j4 which would be appropriate for= =20 > > my HAMMER i3=2E I'd much rather have the bear *dancing* than running into= =20 > > walls=2E :D >=20 > I have encountered many ports where MAKE_JOBS_UNSAFE should have been > set, > but hadn't been=2E If you have installed ports-mgmt/portcont, you can set = this > on > a per-port basis as you encounter these ports=2E There are others that fai= l > to > build with MAKE_JOBS_NO >=3D 4, but will build just fine with MAKE_JOBS_N= O=3D3 or > 2=2E > However, such failures to build are usually timing problems where one > process > tries to put a file into a directory that doesn't exist yet or to read a > file > that hasn't yet been created=2E These are not situations involving the OOM > killer=2E > If you'd like the lines from my /usr/local/etc/ports=2Econf file for those > I've > encountered to date, just email me privately for them=2E >=20 > > >> Yes, as I posted, those were part of the failure stream from the syn= th > > >> program=2E When I had kern=2Emaxswzone increased, it got through boot > > >> without complaining=2E > > >> > > >>> or maybe: > > >>> WARNING: reducing swap size to maximum of xxxxMB per unit > > >> The warnings were there, in the as-it-failed complaints=2E > > >> > > >>> The absolute limit on swap space is vm=2Eswap_maxpages pages but the > > >>> realistic > > >>> limit is about half that=2E By default the realistic limit is about = 4?RAM > > >>> (on > > >>> 64-bit architectures), but this can be adjusted via kern=2Emaxswzone > > (which > > >>> defines the #bytes of RAM to allocate to swzone structures - the ac= tual > > >>> space allocated is vm=2Eswzone)=2E > > >>> > > >>> As a further piece of arcana, vm=2Epageout_oom_seq is a count that > > controls > > >>> the number of passes before the pageout daemon gives up and starts > > killing > > >>> processes when it can't free up enough RAM=2E "out of swap space" > > messages >=20 > Yeah, those messages are half truth and half lie=2E The true part is > that > the processes mentioned have indeed been killed=2E The lie is that the sys= tem > is > out of swap space=2E (I have seen these messages issued with as little as = 217 > MB > in use out of 24 GB available on my system=2E) The kernel might not always > provide > all relevant information in error messages, but it should *never* LIE to = us=2E >=20 > > >>> generally mean that this number is too low, rather than there being= a > > >>> shortage of swap - particularly if your swap device is rather slow=2E > > >>> > > >> Thanks, Peter! > > > A second round of thanks to Peter Jeremy for pointing out this = sysctl > > > variable (vm=2Epageout_oom_seq), although thus far I have yet to see th= at it > > is > > > actually effective in working around the memory management bugs=2E I h= ave > > added > > > the following lines to /etc/sysctl=2Econf=2E > > > > > > # Because FreeBSD 11=2E{2,3,4} tie up page frames unnecessarily, set va= lue > > high > > > #vm=2Epageout_wakeup_thresh=3D14124 # Default value > > > vm=2Epageout_wakeup_thresh=3D112640 # 410 MB > > > > [snip] > > > > I do totally agree that these are crucial issues for both operation and= =20 > > documentation, although my issues stemmed from bad _userland_ stack=20 > > control=2E >=20 > Yes, this is a frequent problem I've observed in the attitudes of > programmers > who never experienced working with real-memory-only OS=2E They often lack = any > awareness of wasteful memory usage, ordering of array accesses, locality = of > reference issues, etc=2E, resulting in truly ridiculous amounts of bloat an= d > lost > performance, not to mention the failures to perform at all such as you > encountered=2E > In their minds, virtual memory frees them from all concerns about these > issues, so > their schoolteachers, now brought up the same way, don't even teach them > about such > things and perhaps still don't know about them themselves=2E Feeling the same way=2E C++ IMHO was the beginning of the end -- abstraction = / objects do not lead to a better understanding of what you're doing, if you'= ve never worked on "bare metal" (at the "chip" level)=2E Those w/o knowledge in assembler never really fully understand what their doing=2E Sorry=2E Couldn't resist=2E > Another problem, especially with programmers whose memories have not > yet > accumulated many painful experiences, is the attraction toward newer, mor= e > exciting > features accompanied by a disinterest in tracking down and fixing existin= g > bugs, > even fairly critical bugs=2E This problem, if left unchecked by management= , > can lead > to terrible predicaments like the one FreeBSD is in now, namely, having n= o > production releases being supported=2E DragonflyBSD, NetBSD, and OpenBSD d= o > not, > AFAIK, suffer from this predicament at present=2E They are behind to varyi= ng > degrees > in terms of newer, more exciting features, but at least they appear to wo= rk=2E=20 > For > example, sdf=2Eorg has well over 70,000 users and runs quite a few servers = to > do so=2E > It runs >=20 > NetBSD miku 8=2E1_STABLE NetBSD 8=2E1_STABLE (GENERIC) #0: Wed Sep 11 03:47:4= 5 > UTC 2019 root@ol:/sdf/sys/NetBSD-8/sys/arch/amd64/compile/GENERIC amd64 >=20 > at present=2E (miku=2Esdf=2Eorg is one of the servers=2E) Its uptime is current= ly > 306 days=2E > They run several VMs of FreeBSD, OpenBSD, LINUX, and possibly others on s= ome > of the > servers=2E ZFS appeared in NetBSD 9=2E0=2E I don't know the sysadmin's reason= s > for not > upgrading to it so far, but I suspect they have to do with the number of > systems to > upgrade, the fact that it is a =2E0 release, and that root on ZFS and ZFS b= oot > environments are not yet supported, as used to be the case with FreeBSD=2E = I'm > not > ready to switch to NetBSD quite yet and would not enjoy doing so, but it = has > been > a steadily improving alternative to FreeBSD of late, and if FreeBSD does = not > release > a production system in the meantime, NetBSD may become a better choice fo= r > many of > us who want to run a production OS=2E It also offers an alternative to > Micro$lop for > the so-called "Internet of Things", which no other FOSS OS does, AFAIK, > although I > don't know enough about LINUX to be sure=2E > > > > Those who live on -CURRENT are used to OOPS, but the rest of us get pai= d=20 > > not to have them=2E >=20 > I've been using -STABLE for the last several major releases, but beca= use > of > the vast numbers of conflicts and failures buried throughout the ports tr= ee > and > the horrendous amount of time it takes to rebuild most of my installed po= rts > I am > considering surrendering to using -RELEASE and using quarterly packages, = in > spite > of the loss of features that doing so entails=2E That would still not deal > with the=20 > dependency conflicts or the installation of identically named files by > different > ports, but it would reduce the time spent on building ports that fail to > install=2E > > > > I am happy with what the Core Team gives us, AND of course we want=20 > > ['more','better','faster','STABLE']=2E :D > > > As Mark Linimon pointed out, the Core Team only does that indirectly=2E= =20 > However, > it is the Core Team's job to give firm direction or redirection to those = who > do the > designing and coding to avoid regressions, avoid ignoring the introductio= n of > bugs, > especially those that render a system unfit for production use, enhance > testing, > and so on=2E >=20 >=20 > Scott Bennett, Comm=2E ASMELG, CFIAG > ********************************************************************** > * Internet: bennett at sdf=2Eorg *xor* bennett at freeshell=2Eorg * > *--------------------------------------------------------------------* > * "A well regulated and disciplined militia, is at all times a good * > * objection to the introduction of that bane of all free governments * > * -- a standing army=2E" * > * -- Gov=2E John Hancock, New York Journal, 28 January 1790 * > ********************************************************************** --Chris