Date: Wed, 9 Oct 2024 19:21:02 -0700 From: Mark Millard <marklmi@yahoo.com> To: Yuri <yuri@FreeBSD.org>, freebsd-hackers <freebsd-hackers@freebsd.org> Subject: RE: Why is the process gets killed because "a thread waited too long to allocate a page"? Message-ID: <CA420288-406A-4A96-BEF4-BEE48B1ABC1F@yahoo.com> References: <CA420288-406A-4A96-BEF4-BEE48B1ABC1F.ref@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Yuri <yuri_at_FreeBSD.org> wrote on Date: Wed, 09 Oct 2024 16:12:50 UTC : > When I tried to build lang/rust in the 14i386 poudriere VM the = compiler=20 > got killed with this message in the kernel log: >=20 >=20 > > Oct 9 05:21:11 yv kernel: pid 35188 (rustc), jid 1129, uid 65534,=20= > was killed: a thread waited too long to allocate a page >=20 >=20 >=20 > The same system has no problem building lang/rust in the 14amd64 VM. >=20 >=20 > What does it mean "waited too long"? Why is the process killed when=20 > something is slow? > Shouldn't it just wait instead? If you want to allow it to potentially wait forever, you can use: sysctl vm.pfault_oom_attempts=3D-1 (or analogous in appropriate *.conf files taht would later be executed). You might end up with deadlock/livelock/. . . if you do so. (I've not analyzed the details.) Details: Looking around, sys/vm/vm_pageout.c has: case VM_OOM_MEM_PF: reason =3D "a thread waited too long to allocate = a page"; break; # grep -r VM_OOM_MEM_PF /usr/main-src/sys/ /usr/main-src/sys/vm/vm_pageout.h:#define VM_OOM_MEM_PF 2 /usr/main-src/sys/vm/vm_fault.c: vm_pageout_oom(VM_OOM_MEM_PF); /usr/main-src/sys/vm/vm_pageout.c: if (shortage =3D=3D VM_OOM_MEM_PF && /usr/main-src/sys/vm/vm_pageout.c: if (shortage =3D=3D VM_OOM_MEM || = shortage =3D=3D VM_OOM_MEM_PF) /usr/main-src/sys/vm/vm_pageout.c: case VM_OOM_MEM_PF: sys/vm/vm_fault.c : (NOTE: official code has its variant of the printf under a "if (bootverbose)" but I locally remove that conditional.) /* * Initiate page fault after timeout. Returns true if caller should * do vm_waitpfault() after the call. */ static bool vm_fault_allocate_oom(struct faultstate *fs) { struct timeval now; =20 vm_fault_unlock_and_deallocate(fs); if (vm_pfault_oom_attempts < 0) return (true); if (!fs->oom_started) { fs->oom_started =3D true; getmicrotime(&fs->oom_start_time); return (true); } =20 getmicrotime(&now); timevalsub(&now, &fs->oom_start_time); if (now.tv_sec < vm_pfault_oom_attempts * vm_pfault_oom_wait) return (true); =20 printf("vm_fault_allocate_oom: proc %d (%s) failed to alloc page = on fault, starting OOM\n", curproc->p_pid, curproc->p_comm); =20 vm_pageout_oom(VM_OOM_MEM_PF); fs->oom_started =3D false; return (false); } This is associated with vm.pfault_oom_attempts and vm.pfault_oom_wait . An old comment in my /boot/loader.conf is: # # For possibly insufficient swap/paging space # (might run out), increase the pageout delay # that leads to Out Of Memory killing of # processes (showing defaults at the time): #vm.pfault_oom_attempts=3D 3 #vm.pfault_oom_wait=3D 10 # (The multiplication is the total but there # are other potential tradoffs in the factors # multiplied, even for nearly the same total.) (Note: the "tradeoffs" is associated with: sys/vm/vm_fault.c: vm_waitpfault(dset, vm_pfault_oom_wait * hz); ) sys/vm/vm_pageout.c : void vm_pageout_oom(int shortage) { const char *reason; struct proc *p, *bigproc; vm_offset_t size, bigsize; struct thread *td; struct vmspace *vm; int now; bool breakout; /* * For OOM requests originating from vm_fault(), there is a high * chance that a single large process faults simultaneously in * several threads. Also, on an active system running many * processes of middle-size, like buildworld, all of them * could fault almost simultaneously as well. * * To avoid killing too many processes, rate-limit OOMs * initiated by vm_fault() time-outs on the waits for free * pages. */ mtx_lock(&vm_oom_ratelim_mtx); now =3D ticks; if (shortage =3D=3D VM_OOM_MEM_PF && (u_int)(now - vm_oom_ratelim_last) < hz * vm_oom_pf_secs) { mtx_unlock(&vm_oom_ratelim_mtx); return; } vm_oom_ratelim_last =3D now; mtx_unlock(&vm_oom_ratelim_mtx); . . . size =3D vmspace_swap_count(vm); if (shortage =3D=3D VM_OOM_MEM || shortage =3D=3D = VM_OOM_MEM_PF) size +=3D vm_pageout_oom_pagecount(vm); . . . Looks like time based retries and giving up after about the specified overall time for that many retries, avoiding potentially waiting forever when 0 <=3D vm.pfault_oom_attempts . =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA420288-406A-4A96-BEF4-BEE48B1ABC1F>