Date: Mon, 15 Jun 2020 18:44:37 -0700 From: Ryan Libby <rlibby@freebsd.org> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Konstantin Belousov <kostikbel@gmail.com>, Jeff Roberson <jeff@freebsd.org>, "freebsd-current@FreeBSD.org" <freebsd-current@freebsd.org> Subject: Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc Message-ID: <CAHgpiFwWMWfUEqnRcMJK7Froew_Q-x7fYg5n-cU858rWP7t4Xg@mail.gmail.com> In-Reply-To: <QB1PR01MB3364EFFE7F96533866C2C770DD9D0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> References: <QB1PR01MB36496B9C1B6CEE512757B039DDB70@QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM> <CAHgpiFzfzF5jLvTP4fMnGhkerMUqgKgVeHWQEMkwg19naTUTwA@mail.gmail.com> <20200521101428.GC64045@kib.kiev.ua> <QB1PR01MB3649E55965E4C1A449A2D4D1DDB40@QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM> <20200523105601.GN64045@kib.kiev.ua> <QB1PR01MB364991F10933FA8D756D3852DDB00@QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM> <QB1PR01MB3364E67BF974C11816F40DECDD820@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> <QB1PR01MB3364EFFE7F96533866C2C770DD9D0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem <rmacklem@uoguelph.ca> wrote: > > Rick Macklem wrote: > >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over = NFS. > >I thought this was the culprit, since I did 6 cycles of r358097 without = a hang. > >However, I just got a hang with r358097, but it looks rather different. > >The r358097 hang did not have any processes sleeping on btalloc. They > >appeared to be waiting on two different locks in the buffer cache. > >As such, I think it might be a different problem. (I'll admit I should h= ave > >made notes about this one before rebooting, but I was flustrated that > >it happened and rebooted before looking at it mush detail.) > Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never > got a hang. > --> It seems that r358097 is the culprit and r358098 makes it easier > to reproduce. > --> Basically runs out of kernel memory. > > It is not obvious if I can revert these two commits without reverting > other ones, since there were a bunch of vm changes after these. > > I'll take a look, but if you guys have any ideas on how to fix this, plea= se > let me know. > > Thanks, rick Interesting. Could you try re-adding UMA_ZONE_NOFREE to the vmem btag zone to see if that rescues it, on whatever base revision gets you a reliable repro? > > Jeff, to fill you in, I have been getting intermittent hangs on a Pentium= 4 > (single core i386) with 1.25Gbytes ram when doing kernel builds using > head kernels from this winter. (I also saw one when doing a kernel build > on UFS, so they aren't NFS specific, although easier to reproduce that wa= y.) > After a typical hang, there will be a bunch of processes sleeping on "bta= lloc" > and several processes holding the following lock: > exclusive sx lock @ vm/vm_map.c:4761 > - I have seen hangs where that is the only lock held by any process excep= t > the interrupt thread. > - I have also seen processes waiting on the following locks: > kern/subr_vmem.c:1343 > kern/subr_vmem.c:633 > > I can't be absolutely sure r358098 is the culprit, but it seems to make t= he > problem more reproducible. > > If anyone has a patch suggestion, I can test it. > Otherwise, I will continue to test r358097 and earlier, to try and see wh= at hangs > occur. (I've done 8 cycles of testing of r356776 without difficulties, bu= t that > doesn't guarantee it isn't broken.) > > There is a bunch more of the stuff I got for Kostik and Ryan below. > I can do "db" when it is hung, but it is a screen console, so I need to > transcribe the output to email by hand. (ie. If you need something > specific I can do that, but trying to do everything Kostik and Ryan asked > for isn't easy.) > > rick > > > > Konstantin Belousov wrote: > >On Fri, May 22, 2020 at 11:46:26PM +0000, Rick Macklem wrote: > >> Konstantin Belousov wrote: > >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote: > >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem <rmacklem@uoguelph.ca>= wrote: > >> >> > > >> >> > Hi, > >> >> > > >> >> > Since I hadn't upgraded a kernel through the winter, it took me a= while > >> >> > to bisect this, but r358252 seems to be the culprit. > No longer true. I succeeded in reproducing the hang to-day running a > r358251 kernel. > > I haven't had much luck sofar, but see below for what I have learned. > > >> >> > > >> >> > If I do a kernel build over NFS using my not so big Pentium 4 (si= ngle core, > >> >> > 1.25Gbytes RAM, i386), about every second attempt will hang. > >> >> > When I do a "ps" in the debugger, I see processes sleeping on bta= lloc. > >> >> > If I revert to r358251, I cannot reproduce this. > As above, this is no longer true. > > >> >> > > >> >> > Any ideas? > >> >> > > >> >> > I can easily test any change you might suggest to see if it fixes= the > >> >> > problem. > >> >> > > >> >> > If you want more debug info, let me know, since I can easily > >> >> > reproduce it. > >> >> > > >> >> > Thanks, rick > >> >> > >> >> Nothing obvious to me. I can maybe try a repro on a VM... > >> >> > >> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welco= me. > >> >> > >> >> "btalloc" is "We're either out of address space or lost a fill race= ." > From what I see, I think it is "out of address space". > For one of the hangs, when I did "show alllocks", everything except the > intr thread, was waiting for the > exclusive sx lock @ vm/vm_map.c:4761 > > >> > > >> >Yes, I would be not surprised to be out of something on 1G i386 machi= ne. > >> >Please also add 'show alllocks'. > >> Ok, I used an up to date head kernel and it took longer to reproduce a= hang. > Go down to Kostik's comment about kern.maxvnodes for the rest of what I'v= e > learned. (The time it takes to reproduce one of these varies greatly, but= I usually > get one within 3 cycles of a full kernel build over NFS. I have had it ha= ppen > once when doing a kernel build over UFS.) > > >> This time, none of the processes are stuck on "btalloc". > > I'll try and give you most of the above, but since I have to type it in= by hand > > from the screen, I might not get it all. (I'm no real typist;-) > > > show alllocks > > exclusive lockmgr ufs (ufs) r =3D 0 locked @ kern/vfs_subr.c: 3259 > > exclusive lockmgr nfs (nfs) r =3D 0 locked @ kern/vfs_lookup.c:737 > > exclusive sleep mutex kernel area domain (kernel arena domain) r =3D 0 = locked @ kern/subr_vmem.c:1343 > > exclusive lockmgr bufwait (bufwait) r =3D 0 locked @ kern/vfs_bio.c:166= 3 > > exclusive lockmgr ufs (ufs) r =3D 0 locked @ kern/vfs_subr.c:2930 > > exclusive lockmgr syncer (syncer) r =3D 0 locked @ kern/vfs_subr.c:2474 > > Process 12 (intr) thread 0x.. (1000008) > > exclusive sleep mutex Giant (Giant) r =3D 0 locked @ kern/kern_intr.c:1= 152 > > > > > ps > > - Not going to list them all, but here are the ones that seem interesti= ng... > > 18 0 0 0 DL vlruwt 0x11d939cc [vnlru] > > 16 0 0 0 DL (threaded) [bufdaemon] > > 100069 D qsleep [bufdaemon] > > 100074 D - [bufspacedaemon-0] > > 100084 D sdflush 0x11923284 [/ worker] > > - and more of these for the other UFS file systems > > 9 0 0 0 DL psleep 0x1e2f830 [vmdaemon] > > 8 0 0 0 DL (threaded) [pagedaemon] > > 100067 D psleep 0x1e2e95c [dom0] > > 100072 D launds 0x1e2e968 [laundry: dom0] > > 100073 D umarcl 0x12cc720 [uma] > > =E2=80=A6 a bunch of usb and cam ones > > 100025 D - 0x1b2ee40 [doneq0] > > =E2=80=A6 > > 12 0 0 0 RL (threaded) [intr] > > 100007 I [swi6: task queue] > > 100008 Run CPU 0 [swi6: Giant taskq] > > =E2=80=A6 > > 100000 D swapin 0x1d96dfc [swapper] > > - and a bunch more in D state. > > Does this mean the swapper was trying to swap in? > > > > > acttrace > > - just shows the keyboard > > kdb_enter() at kdb_enter+0x35/frame > > vt_kbdevent() at vt_kdbevent+0x329/frame > > kdbmux_intr() at kbdmux_intr+0x19/frame > > taskqueue_run_locked() at taskqueue_run_locked+0x175/frame > > taskqueue_run() at taskqueue_run+0x44/frame > > taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame > > ithread_loop() at ithread_loop+0x237/frame > > fork_exit() at fork_exit+0x6c/frame > > fork_trampoline() at 0x../frame > > > > > show all vmem > > vmem 0x.. 'transient arena' > > quantum: 4096 > > size: 23592960 > > inuse: 0 > > free: 23592960 > > busy tags: 0 > > free tags: 2 > > inuse size free size > > 16777216 0 0 1 23592960 > > vmem 0x.. 'buffer arena' > > quantum: 4096 > > size: 94683136 > > inuse: 94502912 > > free: 180224 > > busy tags: 1463 > > free tags: 3 > > inuse size free size > > 16384 2 32768 1 16384 > > 32768 39 1277952 1 32768 > > 65536 1422 93192192 0 0 > > 131072 0 0 1 131072 > > vmem 0x.. 'i386trampoline' > > quantum: 1 > > size: 24576 > > inuse: 20860 > > free: 3716 > > busy tags: 9 > > free tags: 3 > > inuse size free size > > 32 1 48 1 52 > > 64 2 208 0 0 > > 128 2 280 0 0 > > 2048 1 2048 1 3664 > > 4096 2 8192 0 0 > > 8192 1 10084 0 0 > > vmem 0x.. 'kernel rwx arena' > > quantum: 4096 > > size: 0 > > inuse: 0 > > free: 0 > > busy tags: 0 > > free tags: 0 > > vmem 0x.. 'kernel area dom' > > quantum: 4096 > > size: 56623104 > > inuse: 56582144 > >> free: 40960 > >> busy tags: 11224 > >> free tags: 3 > >I think this is the trouble. > > > >Did you tried to reduce kern.maxvnodes ? What is the default value for > >the knob on your machine ? > The default is 84342. > I have tried 64K, 32K and 128K and they all hung sooner or later. > For the 32K case, I did see vnodes being recycled for a while before it g= ot hung, > so it isn't just when it hits the limit. > > Although it is much easier for me to reproduce on an NFS mount, I did see > a hang while doing a kernel build on UFS (no NFS mount on the machine at > that time). > > So, I now know that the problem pre-dates r358252 and is not NFS specific= . > > I'm not bisecting back further to try and isolate the commit that causes = this. > (Unfortunately, each test cycle can take days. I now know that I have to = do > several of these kernel builds, which take hours each, to see if a hang i= s going > to happen.) > > I'll post if/when I have more, rick > > We scaled maxvnodes for ZFS and UFS, might be NFS is even more demanding, > having larger node size. > > > inuse size free size > > 4096 11091 45428736 0 0 > > 8192 63 516096 0 0 > > 16384 12 196608 0 0 > > 32768 6 196608 0 0 > > 40960 2 81920 1 40960 > > 65536 16 1048576 0 0 > > 94208 1 94208 0 0 > > 110592 1 110592 0 0 > > 131072 15 2441216 0 0 > > 262144 15 3997696 0 0 > > 524288 1 524288 0 0 > > 1048576 1 1945600 0 0 > > vmem 0x.. 'kernel arena' > > quantum: 4096 > > size: 390070272 > > inuse: 386613248 > > free: 3457024 > > busy tags: 873 > > free tags: 3 > > inuse size free size > > 4096 35 143360 1 4096 > > 8192 2 16384 2 16384 > > 12288 1 12288 0 0 > > 16384 30 491520 0 0 > > 20480 140 2867200 0 0 > > 65536 1 65536 0 0 > > 131072 631 82706432 0 0 > > 1048576 0 0 1 1339392 > > 2097152 27 56623104 1 2097152 > > 8388608 1 13774848 0 0 > > 16777216 3 74883072 0 0 > > 33554432 1 36753408 0 0 > > 67108864 1 118276096 0 0 > > > > > alltrace > > - I can't face typing too much more, but I'll put a few > > here that look interesting > > > > - for csh > > sched_switch() > > mi_switch() > > kern_yield() > > getblkx() > > breadn_flags() > > ffs_update() > > ufs_inactive() > > VOP_INACTIVE() > > vinactivef() > > vput_final() > > vm_object_deallocate() > > vm_map_process_deferred() > > kern_munmap() > > sys_munmap() > > > > - For cc > > sched_switch() > > mi_switch() > > sleepq_switch() > > sleepq_timedwait() > > _sleep() > > pause_sbt() > > vmem_bt_alloc() > > keg_alloc_slab() > > zone_import() > > cache_alloc() > > cache_alloc_retry() > > uma_zalloc_arg() > > bt_fill() > > vmem_xalloc() > > vmem_alloc() > > kmem_alloc() > > kmem_malloc_domainset() > > page_alloc() > > keg_alloc_slab() > > zone_import() > > cache_alloc() > > cache_alloc_retry() > > uma_zalloc_arg() > > nfscl_nget() > > nfs_create() > > vop_sigdefer() > > nfs_vnodeops_bypass() > > VOP_CREATE_APV() > > vn_open_cred() > > vn_open() > > kern_openat() > > sys_openat() > > > > Then there are a bunch of these... for cc, make > > sched_switch() > > mi_switch() > > sleepq_switch() > > sleepq_catch_signals() > > sleepq_wait_sig() > > kern_wait6() > > sys_wait4() > > > > - for vnlru > > sched_switch() > > mi_switch() > > sleepq_switch() > > sleepq_timedwait() > > _sleep() > > vnlru_proc() > > fork_exit() > > fork_trampoline() > > > > - for syncer > > sched_switch() > > mi_switch() > > critical_exit_preempt() > > intr_event_handle() > > intr_execute_handlers() > > lapic_handle_intr() > > Xapic_isr1() > > - interrupt > > memset() > > cache_alloc() > > cache_alloc_retry() > > uma_zalloc_arg() > > vmem_xalloc() > > vmem_bt_alloc() > > keg_alloc_slab() > > zone_import() > > cache_alloc() > > cache_alloc_retry() > > uma_zalloc_arg() > > bt_fill() > > vmem_xalloc() > > vmem_alloc() > > bufkva_alloc() > > getnewbuf() > > getblkx() > > breadn_flags() > > ffs_update() > > ffs_sync() > > sync_fsync() > > VOP_FSYNC_APV() > > sched_sync() > > fork_exit() > > fork_trampoline() > > > > - For bufdaemon (a bunch of these) > > sched_switch() > > mi_switch() > > sleepq_switch() > > sleepq_timedwait() > > _sleep() > > buf_daemon() > > fork_exit() > > fork_trampoline() > > > > vmdaemon and pagedaemon are basically just like above, > > sleeping in > > vm_daemon() > > or > > vm_pageout_worker() > > or > > vm_pageout_laundry_worker() > > or > > uma_reclaim_worker() > > > > That's all the typing I can take right now. > > I can probably make this happen again if you want more specific stuff. > > > > rick > > > > > > > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= " > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= " > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= "
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHgpiFwWMWfUEqnRcMJK7Froew_Q-x7fYg5n-cU858rWP7t4Xg>