Date: Wed, 24 Feb 2021 15:53:37 -0800 From: Mark Millard <marklmi@yahoo.com> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Alan Somers <asomers@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: The out-of-swap killer makes poor choices Message-ID: <90EC4887-A29A-4829-B75B-1D88303791A4@yahoo.com> In-Reply-To: <EA37F4D3-BCED-405B-BF70-2A97B19A9444@yahoo.com> References: <CAOtMX2jYmrK7ftx62_NEfNCWS7O=giHKL1p9kXCqq1t5E1arxA@mail.gmail.com> <CAOtMX2i3Njo=KBP=99_G0%2BKuSa00CVgNvacmzhTaoZUYEhwPPA@mail.gmail.com> <YDYyQ1V/hEAGV%2ByJ@kib.kiev.ua> <1984125.0OzZcVfBr4@ravel> <CAOtMX2iYr4NDYE0xHSa_w1hA5XQ2m9cA28NzPoGbfzAKKox9aQ@mail.gmail.com> <YDacl5/AFzFA4nkg@kib.kiev.ua> <EA37F4D3-BCED-405B-BF70-2A97B19A9444@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2021-Feb-24, at 11:59, Mark Millard <marklmi at yahoo.com> wrote: > On 2021-Feb-24, at 10:36, Konstantin Belousov <kostikbel T gmail.com> = wrote: >=20 >> On Wed, Feb 24, 2021 at 10:34:23AM -0700, Alan Somers wrote: >>> There's another silly problem that I didn't mention in my original = post. >>> The old rule of thumb is that the swap partition's size should be = twice as >>> large as the amount of RAM. However, that's no longer possible in = many >>> cases. The kernel imposes a hard limit of 64 GiB (on amd64 at = least) on >>> the usable size of any swap partition, and many servers now have far = more >>> than 64 GiB of RAM. So the advice needs to change with the times. = I don't >> I do not think so. The usable size of the swap is determined by the >> amount of swap metadata we pre-configure at boot time. Usually it is >> sized proportionally to the available physical memory, but you can >> override swap zones size manually with the knob. >=20 > There was a period of time when the 128 GiByte RAM ThreadRipper > had its previous 192 GiByte swap partition use rejected and I > had to split it into 3 64 GiByte ones. Later I saw a checkin that > was a correction to some calculation (vague memory) and I retried > having one 192 GiByte swap partition and it was again allowed. >=20 > The ability to dump to a swap partition when there was a > 64 GiByte limitation with 128 GiByte of RAM had implications > for the configuration. I actually arranged having a partition > that was only used for dump's potential use. That took some > rearrangement to form a large enough space, making other > tradeoffs to do so. >=20 >=20 > (I'm not sure if I can find the commit that lead to me switching > back to more than 64 GiByte for a swap file on the large memory > machine. I do not remember details any more.) The 64 GiByte size limit (as seen in my environment) was replaced in: = https://cgit.freebsd.org/src/commit/sys/vm/swap_pager.c?id=3D00fd73d2dabde= e2638203dd1145f007787f05be9 a.k.a.: https://svnweb.freebsd.org/base?view=3Drevision&revision=3D363532 QUOTE author Doug Moore <dougm@FreeBSD.org> 2020-07-25 18:29:10 +0000 committer Doug Moore <dougm@FreeBSD.org> 2020-07-25 18:29:10 = +0000 . . . Fix an overflow bug in the blist allocator that needlessly capped max swap size by dividing a value, which was always a multiple of 64, by 64. Remove the code that reduced max swap size down to that cap. Eliminate the distinction between BLIST_BMAP_RADIX and BLIST_META_RADIX. Call them both BLIST_RADIX. Make improvments to the blist self-test code to silence compiler warnings and to test larger blists. Reported by: jmallett Reviewed by: alc Discussed with: kib Tested by: pho Differential Revision:=09 https://reviews.freebsd.org/D25736 Notes Notes: svn path=3D/head/; revision=3D363532 END QUOTE Evidence sequence leading me there: Establish a large swap partition on a device with an old snapshot of my ThreadRipper environment, resulting in: # gpart show -pl nvd1 =3D> 40 937703008 nvd1 GPT (447G) 40 1024 nvd1p1 FBSDFSSDboot (512K) 1064 746586112 nvd1p2 FBSDFSSDroot (356G) 746587176 191115872 nvd1p3 FBSDFSSDswap (91G) I got a kernel from the ci.freebsd.org artifacts and put it in place on the old snapshot of my ThreadRipper environment (that no longer could even boot --ACPI incompatibilities), so updating the old failing kernel but leaving the rest unchanged: # uname -apKU FreeBSD FBSDFSSD 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r358314: Tue Feb = 25 18:08:20 UTC 2020 = root@FreeBSD-head-amd64-build.jail.ci.FreeBSD.org:/usr/obj/usr/src/amd64.a= md64/sys/GENERIC amd64 amd64 1300081 1300037 So: old head (13) environment booted on the 128 GiByte ThreadRipper: =46rom /var/log/messages: WARNING: reducing swap size to maximum of 65536MB per unit # swapinfo Device 1K-blocks Used Avail Capacity /dev/gpt/FBSDFSSDswap 67108864 0 67108864 0% The code that produced the message and limited the size was in sys/vm/swap_pager.c back in that time frame: static void swaponsomething(struct vnode *vp, void *id, u_long nblks, sw_strategy_t *strategy, sw_close_t *close, dev_t dev, int flags) { struct swdevt *sp, *tsp; swblk_t dvbase; u_long mblocks; =20 /* * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd = chunks. * First chop nblks off to page-align it, then convert. * * sw->sw_nblks is in page-sized chunks now too. */ nblks &=3D ~(ctodb(1) - 1); nblks =3D dbtoc(nblks); =20 /* * If we go beyond this, we get overflows in the radix * tree bitmap code. */ mblocks =3D 0x40000000 / BLIST_META_RADIX; if (nblks > mblocks) { printf( "WARNING: reducing swap size to maximum of %luMB per unit\n", mblocks / 1024 / 1024 * PAGE_SIZE); nblks =3D mblocks; } . . . Then I used blame to find the fix in git via looking at: https://cgit.freebsd.org/src/blame/sys/vm/swap_pager.c >> know what the best size would be for a modern server, but I would = guess >>> that it must be at least several times the RSS of your largest = process, and >>> also at least one tenth of RAM (for use as a dump device with = compressed >>> core dumps). =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?90EC4887-A29A-4829-B75B-1D88303791A4>