From nobody Thu Oct 10 02:21:02 2024 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XPD5W4H9Sz5ZL1N for ; Thu, 10 Oct 2024 02:21:19 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic315-55.consmr.mail.gq1.yahoo.com (sonic315-55.consmr.mail.gq1.yahoo.com [98.137.65.31]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4XPD5V4l4Wz46Cv for ; Thu, 10 Oct 2024 02:21:18 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=FoFguIbg; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.65.31 as permitted sender) smtp.mailfrom=marklmi@yahoo.com; dmarc=pass (policy=reject) header.from=yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1728526876; bh=n/JKTNU2va/lirDDPcckg6m/ugbhNRHtANM0xDYPESg=; h=From:Subject:Date:To:References:From:Subject:Reply-To; b=FoFguIbgSum9ifWKLpBdXTyn0SjrhL0Lzc34svnfzq7h+X+ZXw8SaYxRu08OwCZWgfS+0JWB/vobwb2X7g50gkcMULxd3eqDcIgbw2GLQqUFx/dHwTp5NqAaVbZwtyQCQ+EBYafujEGdmCF2y1G98jsRJG5fj05POfpzx1+TnmmvYbOxQ2HSAlLDKKBYhlgiIT8uY1kHliKeWVmSMSHwAzvJMaPPqsx1hnJfx/wEt3Xkjj1aVXbqZGjaBdVsqT8IrwkS+ZMtLdem1rRfk4b26srN533NYYPWGSYixJPjORtetVefVB4CyV6muHUHlZJdsJa277jhcC9qhOfz9+E2Mg== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1728526876; bh=dAmc0oGcAfsVUXAoTqhDAUH/z5q1yGt9qVadabsboba=; h=X-Sonic-MF:From:Subject:Date:To:From:Subject; b=CggV1SMPAwePfXfPvcdKlO/IFzvFN/TCDpqF7EoqypUMjjo2p17+7MnYxvaNsCIU5crGYq0ayZbDb2Sjoa+W3+ZnGNEZjVS6scBMAjDS95iXXG3Gc7kn5fG0V82HGmi13PLOAMKG9xaPoTbXq6A7l0dxgDVvTcqzsC1M7B9K1Lj+i0xsaICPlmadMw26GcP9RArl2RzRf54vvVgOFNMzobipX+YHJP2P51/EkSi0kiohjrwnLQcXHH4Agwz6C7v6Pb2Lp1opzIFnYcG8QGO5HnubrS8/Kr1fqipSmmW6plvOAWYtEkRl6ycaBsMqlyI2+v1LHMic7dJuOAsXFyqUlg== X-YMail-OSG: RCSgjnAVM1kTtmG9VtCFD7Nr6At6dljwM.pJZ9vSDuYHyrXg5Kkq9OqGKCvVGvH JzJaQUm8FxzNIpNbNFHp9GBgqf_RhhsmO1TlhgdG9efvcRQ_u5t8t7au11TbvSERAiVebgPeKikk HsQAUGjGP2wSLnpfLJnZSVieS7ftj8cEYmvA5Nj0QORQSKLORkv1ITbfszrrmafvqy6fyt_areiT 2XIqTiRsIjxPHO1CeJVcOmLlPCguvYs0lNf_EtNGwyD3SpbbU9oC5WDuGwWben_P5vCvhcn_VbtA EGYuV7ZxOCMKKyJ0JRVEmpDfEbmPu9.NlW.K1t_yApkP1VZ6YSZf3O4XOurqRnlQNAMn5ixy6CRx IyWDVH48ehnb.XSw_ylfDUKSfwCeA30USKrys82LrLHfln4DHww9e4dHadbVqfCkezF9.Z7A_yES n2PCrBwA460ayozGt2kq5RZXVRACMtF0L6urMLB_Zd2Yw7kE7SLna03LsiHovKfIXAwWRZuiQSnu GVkhB4M3Zywkj6yKamq.Chbw7yiSh4Dt_nk7N0QhEIzLqEziyOSdA_K.lXOQnlUCJlJ52luxp89N fTMjTLIN6QH1EspKnNFKacG6j35bI7x0xrqhTw_RWpQFt9922gL.5bdrmHD1lIphNXECwp4H3p46 vGNBuKaznijB_qCV4QuoYw0miMSUrzC5HcW_FY1MIUyoMGReqW0k6ZZTLxWdg2iIICBqVBDeE8Os tnXOn1UjjvldTsTxSCkvhSIXhuz8xgnLuAi_9NNf.3BQf7qzlnTu_AQ5AAfTnck_Co8l5v7gWasI YrJn3_U4f7_x39IoP.MUmxYzsWVZ.RhxCFie3bsqZp1g..X2nTkK08m8PDG0Qham2LJf5Xk.2xi9 utpzeD.Q62Qqw6ebR1.SeZ3R4Ingh7YzPB3cFkC6jUuiZpCEaJCY7an8NGO_EZuB4NmtS6fC7Cjb CWTjl9MDymRvm9.IiG2kpAurJpd7UMdaF1u3nMWKnxVyLTqxqUVXgVP0mGqDOK4XcDiwOXM.q6A0 Y1rDHzjvauGVJymyu8u.Y8zNvUrJCaEd4w8OZlZTBy_gcxWdckluOmiWN1f18CxiHDtAZQBj2691 aCu2gyykj30Gf6nZoYifiZrHv.K4f5R2Gb7zLZlctgLnwPBG826A80ZMUGM0g20bsTv47ZfIhL87 iZ8ajA5QCewbzVKrm2XNemGneSMRfIU640muXNpc6WOPXzSOHHM_hCSH.im69ZSeVGqG3E24A6or N0NBaIVe7lrwzi8I2IBrxh_lmHHujCfNd9VUfIhERBE_tHato5b0e6LObOylf8DRf14Z12JGvfAF 2sBWrYApRBZCZAcr5Ss1JogUvu6rzkgYBEU0ir4WsjZAER7T5Fhg44yJDgXtaCIcGSFx7FvUviGF gM7N6H2d01YdY_THDT8azaPFTVGIgXmM7y4hZi6Y1krbj7iae5hXFzkfHNICumaFqDW1qCp7TJk2 A3e7ci7nv4L1wx5JsEupJ8im0T_1GVgKhPyi.RNRgF7eD33vXDIcxj.IJfzYkiY5jQdHm0NZG.ls sugdCEsJd4rPQ6fuJh8zUUqI4J5dGLk.80G5a1JGzR6qw_.B.yYZTjBhB_7nTVgqVjrBjuHI4Z8A TGFWq1LLe.FIykApCEk9n4hUPnAN2_RaGBcq7ZqZEN6GqksENY_yu2XpgHR.kXla8iW3TE2iGIps kmSAvCg_.9bhDTV_UI7nzOu9mg9jOL0XsHORrStQxzUIebyMkL2xrD9q58etIElKBQP_TQcGS_6R inxULm7EJ1TWN7V3nzOQteoYXsOvcvMOWFClvJPytDGbne1B8RvJz0b7rKJLoeWdGUCzym5giHaH LVi4_O.ACQ12_8JNmPeCjoFIBGZTF4AxFw2k4WXap7V7qhmEM.MKtBiAfoNoeSpsWSw9f7DpcWhA tE7zz5g76MLS2gDuaN35kRci1ANINil4Nmu7I4ejTWxdYcGJ71urB.a_3OgGtg0VCnBe0U.Xpxz3 a7c.tXPXC.VOInZ6U9nK11NWWXS5YAmuI3j9hf6eIjzoKCYv4aWVc5MbJ.MNWq3Obd.yZwOHgBaW 34H0uryUTYORM2DLF_Gizx6izbKat3WYDcuDGpGDWtKHZ.eubLlEqtN8W66.LcL_rXa5yaaE__kF rLiAkProATu_uPX8BkT2jBSLr7GA5ifDG9_PxR7jTfaAb7rozaeMgPUxmFg3r5ew_zXda2d1bjSt 6Xir7gKXZEnYYc3qGrPl4HU2fycL8b2DfD6lJuQ-- X-Sonic-MF: X-Sonic-ID: ceb2e492-4ef6-47fa-928f-52bf0e7db5ac Received: from sonic.gate.mail.ne1.yahoo.com by sonic315.consmr.mail.gq1.yahoo.com with HTTP; Thu, 10 Oct 2024 02:21:16 +0000 Received: by hermes--production-gq1-5d95dc458-sd55t (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 7969551c646ab585c17a83abecaba769; Thu, 10 Oct 2024 02:21:13 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\)) Subject: RE: Why is the process gets killed because "a thread waited too long to allocate a page"? Message-Id: Date: Wed, 9 Oct 2024 19:21:02 -0700 To: Yuri , freebsd-hackers X-Mailer: Apple Mail (2.3776.700.51) References: X-Spamd-Result: default: False [-1.46 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_SPAM_SHORT(0.54)[0.539]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; TO_DN_ALL(0.00)[]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; RCPT_COUNT_TWO(0.00)[2]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; FREEMAIL_FROM(0.00)[yahoo.com]; FROM_HAS_DN(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MID_RHS_MATCH_FROM(0.00)[]; APPLE_MAILER_COMMON(0.00)[]; MLMMJ_DEST(0.00)[freebsd-hackers@freebsd.org]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.65.31:from]; RCVD_IN_DNSWL_NONE(0.00)[98.137.65.31:from] X-Rspamd-Queue-Id: 4XPD5V4l4Wz46Cv X-Spamd-Bar: - Yuri wrote on Date: Wed, 09 Oct 2024 16:12:50 UTC : > When I tried to build lang/rust in the 14i386 poudriere VM the = compiler=20 > got killed with this message in the kernel log: >=20 >=20 > > Oct 9 05:21:11 yv kernel: pid 35188 (rustc), jid 1129, uid 65534,=20= > was killed: a thread waited too long to allocate a page >=20 >=20 >=20 > The same system has no problem building lang/rust in the 14amd64 VM. >=20 >=20 > What does it mean "waited too long"? Why is the process killed when=20 > something is slow? > Shouldn't it just wait instead? If you want to allow it to potentially wait forever, you can use: sysctl vm.pfault_oom_attempts=3D-1 (or analogous in appropriate *.conf files taht would later be executed). You might end up with deadlock/livelock/. . . if you do so. (I've not analyzed the details.) Details: Looking around, sys/vm/vm_pageout.c has: case VM_OOM_MEM_PF: reason =3D "a thread waited too long to allocate = a page"; break; # grep -r VM_OOM_MEM_PF /usr/main-src/sys/ /usr/main-src/sys/vm/vm_pageout.h:#define VM_OOM_MEM_PF 2 /usr/main-src/sys/vm/vm_fault.c: vm_pageout_oom(VM_OOM_MEM_PF); /usr/main-src/sys/vm/vm_pageout.c: if (shortage =3D=3D VM_OOM_MEM_PF && /usr/main-src/sys/vm/vm_pageout.c: if (shortage =3D=3D VM_OOM_MEM || = shortage =3D=3D VM_OOM_MEM_PF) /usr/main-src/sys/vm/vm_pageout.c: case VM_OOM_MEM_PF: sys/vm/vm_fault.c : (NOTE: official code has its variant of the printf under a "if (bootverbose)" but I locally remove that conditional.) /* * Initiate page fault after timeout. Returns true if caller should * do vm_waitpfault() after the call. */ static bool vm_fault_allocate_oom(struct faultstate *fs) { struct timeval now; =20 vm_fault_unlock_and_deallocate(fs); if (vm_pfault_oom_attempts < 0) return (true); if (!fs->oom_started) { fs->oom_started =3D true; getmicrotime(&fs->oom_start_time); return (true); } =20 getmicrotime(&now); timevalsub(&now, &fs->oom_start_time); if (now.tv_sec < vm_pfault_oom_attempts * vm_pfault_oom_wait) return (true); =20 printf("vm_fault_allocate_oom: proc %d (%s) failed to alloc page = on fault, starting OOM\n", curproc->p_pid, curproc->p_comm); =20 vm_pageout_oom(VM_OOM_MEM_PF); fs->oom_started =3D false; return (false); } This is associated with vm.pfault_oom_attempts and vm.pfault_oom_wait . An old comment in my /boot/loader.conf is: # # For possibly insufficient swap/paging space # (might run out), increase the pageout delay # that leads to Out Of Memory killing of # processes (showing defaults at the time): #vm.pfault_oom_attempts=3D 3 #vm.pfault_oom_wait=3D 10 # (The multiplication is the total but there # are other potential tradoffs in the factors # multiplied, even for nearly the same total.) (Note: the "tradeoffs" is associated with: sys/vm/vm_fault.c: vm_waitpfault(dset, vm_pfault_oom_wait * hz); ) sys/vm/vm_pageout.c : void vm_pageout_oom(int shortage) { const char *reason; struct proc *p, *bigproc; vm_offset_t size, bigsize; struct thread *td; struct vmspace *vm; int now; bool breakout; /* * For OOM requests originating from vm_fault(), there is a high * chance that a single large process faults simultaneously in * several threads. Also, on an active system running many * processes of middle-size, like buildworld, all of them * could fault almost simultaneously as well. * * To avoid killing too many processes, rate-limit OOMs * initiated by vm_fault() time-outs on the waits for free * pages. */ mtx_lock(&vm_oom_ratelim_mtx); now =3D ticks; if (shortage =3D=3D VM_OOM_MEM_PF && (u_int)(now - vm_oom_ratelim_last) < hz * vm_oom_pf_secs) { mtx_unlock(&vm_oom_ratelim_mtx); return; } vm_oom_ratelim_last =3D now; mtx_unlock(&vm_oom_ratelim_mtx); . . . size =3D vmspace_swap_count(vm); if (shortage =3D=3D VM_OOM_MEM || shortage =3D=3D = VM_OOM_MEM_PF) size +=3D vm_pageout_oom_pagecount(vm); . . . Looks like time based retries and giving up after about the specified overall time for that many retries, avoiding potentially waiting forever when 0 <=3D vm.pfault_oom_attempts . =3D=3D=3D Mark Millard marklmi at yahoo.com