From owner-freebsd-hackers@freebsd.org Wed Feb 24 23:53:42 2021 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 6474A54D51E for ; Wed, 24 Feb 2021 23:53:42 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic317-21.consmr.mail.gq1.yahoo.com (sonic317-21.consmr.mail.gq1.yahoo.com [98.137.66.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4DmCRn2Dcqz4pHp for ; Wed, 24 Feb 2021 23:53:41 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1614210819; bh=5dGSKdb4FL4b1YHfkB5tlpOHYWz8BOHB7TJVFELs6Zq=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=Eu+L71PGCqmmQdTdlC/buirwCbg52OHgeLMGzwQ5LD3EnROum9JxD94nennFIw2OlFQksx7K4b86MMss5oW21nAi4NfjVASqzx04M+l3AJ04kqIQpCn5exzF25osa2QoixWNSuLz+8GyFaAVB7IvFpPVKu/ODj17NPiTNwJGtUKEStJc7tTuRk6LIi4OfR8Q6x5R263SFOO7D0aAo5fTrFA6WOC8bKx0hNnbKaTfP6GkqfywzLRXIjqkH473tGL261sZVwb1+22Do8qzqcxfff7nlj1yf2MhGrGIU5dFlyQRwx6QgQBXvGCPyyQadCrLmKeKsIYLnJvxTS/qUEU3Fw== X-YMail-OSG: KEQPjs8VM1l1rwKPBJZBAR3u0yKh_XGm2AuFtQgp3TsQFcGHDr5ypJYTLh1i7FG KKcPK3TKqbqPUUdqBRrjP1KKW0jmu_fgk0m5jAugyO2DCRQ3kqzycFXC4Ofg4qFjoyh4Q.Kf0IQ0 sWEQ48Hg6.D_D2EMC6xhIhinADVHQbHmqz9SSQfW_0f0zzsIzZFPbQ.MAHciSK_U.9W9x7WdsP5h iTVT3r2zcmmH_k8HK1MaqGhr9SVUnBgcObCDH7UNRC3QmAEdCJkFq3kcEw4.7jvv5pk2BxAf8vNK W25oF9U2KnYL4Iie4TbahJDsq_joBtayB0kUFqcTqQTchYkFCnXleQ8k07X3I.LPI7.4dlwWU3kU sseZBQu1LyvZ8K35HOSBChPkql3nSIm42at3MiMikjpY2TYBz6WMBwqG9OWWkBDuLZn1kYYCJ5p_ sgemrJ71fcKLmNNbYtcoBW6L.fHpGEPUeTw7PJvgL_rT4t3RlTNqu0r2UwYKaWLmY7ik05KKgCPe ojNVsw4tTWUZO6ZwyZPz_0WmL6J6M9yCw5klGSb.w7OWbfmeCgUopj4vavKfHXhEPvO3mA9OEqW6 TBxRqkv2miAFi_ILYLFOIBw2o_XuTD4wzkPvc3B8pREfB75HOZaBMwpbgVirW2aXsT.M37cmJdNg G.7Adn.s6sXooyaEFr_RPhN_U.QQ2RzLgXS77DYakeeJP53AZkd4jEAqB4Ab8cZKrgL2qxq2ntij eAy3Eu464gF51ePGKAz.TowoNWHoMfVykRPDFawdVf0k7kgqSKYuGsCQNO1cEJH2sM4apaIPHyz. uAXukqWwl2_pHmmXn0JriszAi37.i99nFjw23.te4WFMv2L77loq.2_sJnI8FSBRUMSdJNTNSHri 6yvXa9ZLLfXwzUWFoEwBhaTdZaQd_OYqVm06pbgbi8PN9Kbx7V8JeVZO.LlAqhvdcXt2UfVC4s00 jNhfkNkBam7J4TllbSPFbCp75bw6i12nmu7Z1U9l_3y8tXrQT43GCUY3AQpHcDrexN9P8fE04Afi quGuPKNaFDSE6.O0IsI9N5ODYvShQiz24RYf_4.qcAh5tdAR0nAkgHZJLz6h9k5cg52r9BwsCciA SBYvtMNwiIL1YqWB0azB4OnV0.4BDBgxLGOqrxgKYIU496nyj3HllDAWGT9fOrapqWBSxU8FVQNy UAsL5NaZXVmY7uoabyOFmX6v7c0D2tw0syCDAqmNBqQt8AmtcssBnP.PT4GBvMx9BUhCZRXGkiM5 Ea2ImKcsepEl42z3EQB33CwQ9qVINmHygbAEnwiiH0OstMIX8efdeKIiuoo4Luqe8xuZl7AX7WLU 3RvHD5LDYMXpdVh7bEaSZmPV5uHKc9KdfAthxyK1aaGmiQaNbOLzEHAT18yVJCLj9UdJp5pUElBz lYP..UeTjekBfrVU0hfZm0h5A6_f4tn4atWcWH8zy_CEjKjuMo0xflKFRkJzqisSmLMKLyMi0yFc r7HfrTbKwa_QgltAoSIXctF4QJ6N2yyfn2bdWLX3D8DjZXGQoJOcrPrx5fhsiPNA3efZ1G_k5BX_ FNGMJIPm1ygU5NPSf3UpxDzBw7tXve2ZTb03qAin87eFBwYCyaWy5L5mvxReTytDKkXjGz4RiQzZ 7897yZ8.C5in0.n7TZisXOaXP9tlYLZdnpBM3dzf06VijkR6gEGimnTMy0kHFjtiFLQfTIVedygQ dJbwNtRy9E7mLrUSqOQg4d1oQFQrH4XeRSEztHkQ_YTHQdHpmWTqNawdSrkLvz4TpkjojmAcr27z frWINeDbVlYhg8t6DQn9KBHUaP4tPFR7ZhbBM2gWTgQkFTp2ajWk6ZmHBGmuWFdRttXEgf.BVPdN 0Ft9kIXx.IsHT6eoIDJ2T0QcElszKulgzzm03BjkdHvmE1L_99v6TvYh5clDSu0odq.3Mou.hWXH NkR3OoenCDncgrsni.ww.kCbcjF_PGkqZt1CttawjQ4Kqiksa6W.xKW98SlKJc8dFwFxHFC9T67q jtE6VblyAUhd00.c2.HCPUxPnkdLSrlM_tHihthTlr3iyhQzhdouP6bAIS4aaJ5.ctiPR_WcIKXL h2slnuBAaWiRtYAhpR2JYv72lWGqC3lYZ.3sDTU_Tt8yS5TEJEqHolHIsVLkyaDP5bxXTHHGM6H3 TqB3AvT7jKwSqs2du9aVMSsLstRRLoYb8ysHlanb0hQ4MhUN604NFvh75IOBmGR.rJgxla.fV98y uNwXrkOMb2zyrcNCqZsPL7qTLcagF1mHtSlxEStDdm7lZ6deF.IfQXBktnVfNIoyyzJYbwFNDMgn YEeokf2pqVMmc6FBg0AUOhK77Y80LpY.vFvxJXzolCAxS2JOOdBwojTAYqLuUalOcRMQeerkSsVC fwc6KLqaldIPDGuwzOCLbqbR1wB7lIjmEbdUMhQRyFzZ_Ep7ztF_k4NlKNK1P_8jfbak7aP7rhs6 LCQYeGW_36QqV0hwdiE5DZxjVK5uREd2JWXPYok77jRr4hXWP50Bh228frsAletpGg6kJEzL4np5 pffShwTZZVrub2TCBioGFQXdjvE_jZ6IikrKdV2JyHY1.up92SsDShhB7hUe8 X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic317.consmr.mail.gq1.yahoo.com with HTTP; Wed, 24 Feb 2021 23:53:39 +0000 Received: by smtp409.mail.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 91f204b8deec7cb2386249da4c69927a; Wed, 24 Feb 2021 23:53:37 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Subject: Re: The out-of-swap killer makes poor choices From: Mark Millard In-Reply-To: Date: Wed, 24 Feb 2021 15:53:37 -0800 Cc: Alan Somers , FreeBSD Hackers Content-Transfer-Encoding: quoted-printable Message-Id: <90EC4887-A29A-4829-B75B-1D88303791A4@yahoo.com> References: <1984125.0OzZcVfBr4@ravel> To: Konstantin Belousov X-Mailer: Apple Mail (2.3654.60.0.2.21) X-Rspamd-Queue-Id: 4DmCRn2Dcqz4pHp X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.50 / 15.00]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; ARC_NA(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[98.137.66.147:from]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; SPAMHAUS_ZRD(0.00)[98.137.66.147:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.66.147:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.66.147:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-hackers] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Technical discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Feb 2021 23:53:42 -0000 On 2021-Feb-24, at 11:59, Mark Millard wrote: > On 2021-Feb-24, at 10:36, Konstantin Belousov = wrote: >=20 >> On Wed, Feb 24, 2021 at 10:34:23AM -0700, Alan Somers wrote: >>> There's another silly problem that I didn't mention in my original = post. >>> The old rule of thumb is that the swap partition's size should be = twice as >>> large as the amount of RAM. However, that's no longer possible in = many >>> cases. The kernel imposes a hard limit of 64 GiB (on amd64 at = least) on >>> the usable size of any swap partition, and many servers now have far = more >>> than 64 GiB of RAM. So the advice needs to change with the times. = I don't >> I do not think so. The usable size of the swap is determined by the >> amount of swap metadata we pre-configure at boot time. Usually it is >> sized proportionally to the available physical memory, but you can >> override swap zones size manually with the knob. >=20 > There was a period of time when the 128 GiByte RAM ThreadRipper > had its previous 192 GiByte swap partition use rejected and I > had to split it into 3 64 GiByte ones. Later I saw a checkin that > was a correction to some calculation (vague memory) and I retried > having one 192 GiByte swap partition and it was again allowed. >=20 > The ability to dump to a swap partition when there was a > 64 GiByte limitation with 128 GiByte of RAM had implications > for the configuration. I actually arranged having a partition > that was only used for dump's potential use. That took some > rearrangement to form a large enough space, making other > tradeoffs to do so. >=20 >=20 > (I'm not sure if I can find the commit that lead to me switching > back to more than 64 GiByte for a swap file on the large memory > machine. I do not remember details any more.) The 64 GiByte size limit (as seen in my environment) was replaced in: = https://cgit.freebsd.org/src/commit/sys/vm/swap_pager.c?id=3D00fd73d2dabde= e2638203dd1145f007787f05be9 a.k.a.: https://svnweb.freebsd.org/base?view=3Drevision&revision=3D363532 QUOTE author Doug Moore 2020-07-25 18:29:10 +0000 committer Doug Moore 2020-07-25 18:29:10 = +0000 . . . Fix an overflow bug in the blist allocator that needlessly capped max swap size by dividing a value, which was always a multiple of 64, by 64. Remove the code that reduced max swap size down to that cap. Eliminate the distinction between BLIST_BMAP_RADIX and BLIST_META_RADIX. Call them both BLIST_RADIX. Make improvments to the blist self-test code to silence compiler warnings and to test larger blists. Reported by: jmallett Reviewed by: alc Discussed with: kib Tested by: pho Differential Revision:=09 https://reviews.freebsd.org/D25736 Notes Notes: svn path=3D/head/; revision=3D363532 END QUOTE Evidence sequence leading me there: Establish a large swap partition on a device with an old snapshot of my ThreadRipper environment, resulting in: # gpart show -pl nvd1 =3D> 40 937703008 nvd1 GPT (447G) 40 1024 nvd1p1 FBSDFSSDboot (512K) 1064 746586112 nvd1p2 FBSDFSSDroot (356G) 746587176 191115872 nvd1p3 FBSDFSSDswap (91G) I got a kernel from the ci.freebsd.org artifacts and put it in place on the old snapshot of my ThreadRipper environment (that no longer could even boot --ACPI incompatibilities), so updating the old failing kernel but leaving the rest unchanged: # uname -apKU FreeBSD FBSDFSSD 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r358314: Tue Feb = 25 18:08:20 UTC 2020 = root@FreeBSD-head-amd64-build.jail.ci.FreeBSD.org:/usr/obj/usr/src/amd64.a= md64/sys/GENERIC amd64 amd64 1300081 1300037 So: old head (13) environment booted on the 128 GiByte ThreadRipper: =46rom /var/log/messages: WARNING: reducing swap size to maximum of 65536MB per unit # swapinfo Device 1K-blocks Used Avail Capacity /dev/gpt/FBSDFSSDswap 67108864 0 67108864 0% The code that produced the message and limited the size was in sys/vm/swap_pager.c back in that time frame: static void swaponsomething(struct vnode *vp, void *id, u_long nblks, sw_strategy_t *strategy, sw_close_t *close, dev_t dev, int flags) { struct swdevt *sp, *tsp; swblk_t dvbase; u_long mblocks; =20 /* * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd = chunks. * First chop nblks off to page-align it, then convert. * * sw->sw_nblks is in page-sized chunks now too. */ nblks &=3D ~(ctodb(1) - 1); nblks =3D dbtoc(nblks); =20 /* * If we go beyond this, we get overflows in the radix * tree bitmap code. */ mblocks =3D 0x40000000 / BLIST_META_RADIX; if (nblks > mblocks) { printf( "WARNING: reducing swap size to maximum of %luMB per unit\n", mblocks / 1024 / 1024 * PAGE_SIZE); nblks =3D mblocks; } . . . Then I used blame to find the fix in git via looking at: https://cgit.freebsd.org/src/blame/sys/vm/swap_pager.c >> know what the best size would be for a modern server, but I would = guess >>> that it must be at least several times the RSS of your largest = process, and >>> also at least one tenth of RAM (for use as a dump device with = compressed >>> core dumps). =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)