Date: Fri, 19 Jun 2020 03:42:36 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Ryan Libby <rlibby@freebsd.org> Cc: Konstantin Belousov <kib@FreeBSD.org>, Jeff Roberson <jeff@FreeBSD.org>, "freebsd-current@freebsd.org" <freebsd-current@freebsd.org> Subject: Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc Message-ID: <QB1PR01MB33649F146649F6C8BA0788B5DD980@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <QB1PR01MB3364685F9EC2FB4B7DADBB5BDD980@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> References: <QB1PR01MB3364685F9EC2FB4B7DADBB5BDD980@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
--_002_QB1PR01MB33649F146649F6C8BA0788B5DD980QB1PR01MB3364CANP_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Ryan Libby wrote:=0A= >On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem <rmacklem@uoguelph.ca> wrote:= =0A= >>=0A= >> Rick Macklem wrote:=0A= >> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over= =3D=0A= NFS.=0A= >> >I thought this was the culprit, since I did 6 cycles of r358097 without= =3D=0A= a hang.=0A= >> >However, I just got a hang with r358097, but it looks rather different.= =0A= >> >The r358097 hang did not have any processes sleeping on btalloc. They= =0A= >> >appeared to be waiting on two different locks in the buffer cache.=0A= >> >As such, I think it might be a different problem. (I'll admit I should = h=3D=0A= ave=0A= >> >made notes about this one before rebooting, but I was flustrated that= =0A= >> >it happened and rebooted before looking at it mush detail.)=0A= >> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and neve= r=0A= >> got a hang.=0A= >> --> It seems that r358097 is the culprit and r358098 makes it easier=0A= >> to reproduce.=0A= >> --> Basically runs out of kernel memory.=0A= >>=0A= >> It is not obvious if I can revert these two commits without reverting=0A= >> other ones, since there were a bunch of vm changes after these.=0A= >>=0A= >> I'll take a look, but if you guys have any ideas on how to fix this, ple= a=3D=0A= se=0A= >> let me know.=0A= >>=0A= >> Thanks, rick=0A= >=0A= >Interesting. Could you try re-adding UMA_ZONE_NOFREE to the vmem btag=0A= >zone to see if that rescues it, on whatever base revision gets you a=0A= >reliable repro?=0A= Good catch! That seems to fix it. I've done 8 cycles of kernel build over= =0A= NFS without a hang (normally I'd get one in the first 1-3 cycles).=0A= =0A= I don't know if the intend was to delete UMA_ZONE_VM and r358097=0A= had a typo in it and deleted UMA_ZONE_NOFREE or ???=0A= =0A= Anyhow, I just put it back to UMA_ZONE_VM | UMA_ZONE_NOFREE and=0A= the hangs seem to have gone away.=0A= =0A= The small patch I did is attached, in case that isn't what you meant.=0A= =0A= I'll run a few more cycles just in case, but I think this fixes it.=0A= =0A= Thanks, rick=0A= =0A= >=0A= > Jeff, to fill you in, I have been getting intermittent hangs on a Pentium= =3D=0A= 4=0A= > (single core i386) with 1.25Gbytes ram when doing kernel builds using=0A= > head kernels from this winter. (I also saw one when doing a kernel build= =0A= > on UFS, so they aren't NFS specific, although easier to reproduce that wa= =3D=0A= y.)=0A= > After a typical hang, there will be a bunch of processes sleeping on "bta= =3D=0A= lloc"=0A= > and several processes holding the following lock:=0A= > exclusive sx lock @ vm/vm_map.c:4761=0A= > - I have seen hangs where that is the only lock held by any process excep= =3D=0A= t=0A= > the interrupt thread.=0A= > - I have also seen processes waiting on the following locks:=0A= > kern/subr_vmem.c:1343=0A= > kern/subr_vmem.c:633=0A= >=0A= > I can't be absolutely sure r358098 is the culprit, but it seems to make t= =3D=0A= he=0A= > problem more reproducible.=0A= >=0A= > If anyone has a patch suggestion, I can test it.=0A= > Otherwise, I will continue to test r358097 and earlier, to try and see wh= =3D=0A= at hangs=0A= > occur. (I've done 8 cycles of testing of r356776 without difficulties, bu= =3D=0A= t that=0A= > doesn't guarantee it isn't broken.)=0A= >=0A= > There is a bunch more of the stuff I got for Kostik and Ryan below.=0A= > I can do "db" when it is hung, but it is a screen console, so I need to= =0A= > transcribe the output to email by hand. (ie. If you need something=0A= > specific I can do that, but trying to do everything Kostik and Ryan asked= =0A= > for isn't easy.)=0A= >=0A= > rick=0A= >=0A= >=0A= >=0A= > Konstantin Belousov wrote:=0A= > >On Fri, May 22, 2020 at 11:46:26PM +0000, Rick Macklem wrote:=0A= > >> Konstantin Belousov wrote:=0A= > >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:=0A= > >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem <rmacklem@uoguelph.ca>= =3D=0A= wrote:=0A= > >> >> >=0A= > >> >> > Hi,=0A= > >> >> >=0A= > >> >> > Since I hadn't upgraded a kernel through the winter, it took me a= =3D=0A= while=0A= > >> >> > to bisect this, but r358252 seems to be the culprit.=0A= > No longer true. I succeeded in reproducing the hang to-day running a=0A= > r358251 kernel.=0A= >=0A= > I haven't had much luck sofar, but see below for what I have learned.=0A= >=0A= > >> >> >=0A= > >> >> > If I do a kernel build over NFS using my not so big Pentium 4 (si= =3D=0A= ngle core,=0A= > >> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.=0A= > >> >> > When I do a "ps" in the debugger, I see processes sleeping on bta= =3D=0A= lloc.=0A= > >> >> > If I revert to r358251, I cannot reproduce this.=0A= > As above, this is no longer true.=0A= >=0A= > >> >> >=0A= > >> >> > Any ideas?=0A= > >> >> >=0A= > >> >> > I can easily test any change you might suggest to see if it fixes= =3D=0A= the=0A= > >> >> > problem.=0A= > >> >> >=0A= > >> >> > If you want more debug info, let me know, since I can easily=0A= > >> >> > reproduce it.=0A= > >> >> >=0A= > >> >> > Thanks, rick=0A= > >> >>=0A= > >> >> Nothing obvious to me. I can maybe try a repro on a VM...=0A= > >> >>=0A= > >> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welco= =3D=0A= me.=0A= > >> >>=0A= > >> >> "btalloc" is "We're either out of address space or lost a fill race= =3D=0A= ."=0A= > From what I see, I think it is "out of address space".=0A= > For one of the hangs, when I did "show alllocks", everything except the= =0A= > intr thread, was waiting for the=0A= > exclusive sx lock @ vm/vm_map.c:4761=0A= >=0A= > >> >=0A= > >> >Yes, I would be not surprised to be out of something on 1G i386 machi= =3D=0A= ne.=0A= > >> >Please also add 'show alllocks'.=0A= > >> Ok, I used an up to date head kernel and it took longer to reproduce a= =3D=0A= hang.=0A= > Go down to Kostik's comment about kern.maxvnodes for the rest of what I'v= =3D=0A= e=0A= > learned. (The time it takes to reproduce one of these varies greatly, but= =3D=0A= I usually=0A= > get one within 3 cycles of a full kernel build over NFS. I have had it ha= =3D=0A= ppen=0A= > once when doing a kernel build over UFS.)=0A= >=0A= > >> This time, none of the processes are stuck on "btalloc".=0A= > > I'll try and give you most of the above, but since I have to type it in= =3D=0A= by hand=0A= > > from the screen, I might not get it all. (I'm no real typist;-)=0A= > > > show alllocks=0A= > > exclusive lockmgr ufs (ufs) r =3D3D 0 locked @ kern/vfs_subr.c: 3259=0A= > > exclusive lockmgr nfs (nfs) r =3D3D 0 locked @ kern/vfs_lookup.c:737=0A= > > exclusive sleep mutex kernel area domain (kernel arena domain) r =3D3D = 0 =3D=0A= locked @ kern/subr_vmem.c:1343=0A= > > exclusive lockmgr bufwait (bufwait) r =3D3D 0 locked @ kern/vfs_bio.c:1= 66=3D=0A= 3=0A= > > exclusive lockmgr ufs (ufs) r =3D3D 0 locked @ kern/vfs_subr.c:2930=0A= > > exclusive lockmgr syncer (syncer) r =3D3D 0 locked @ kern/vfs_subr.c:24= 74=0A= > > Process 12 (intr) thread 0x.. (1000008)=0A= > > exclusive sleep mutex Giant (Giant) r =3D3D 0 locked @ kern/kern_intr.c= :1=3D=0A= 152=0A= > >=0A= > > > ps=0A= > > - Not going to list them all, but here are the ones that seem interesti= =3D=0A= ng...=0A= > > 18 0 0 0 DL vlruwt 0x11d939cc [vnlru]=0A= > > 16 0 0 0 DL (threaded) [bufdaemon]=0A= > > 100069 D qsleep [bufdaemon]=0A= > > 100074 D - [bufspacedaemon-0]=0A= > > 100084 D sdflush 0x11923284 [/ worker]=0A= > > - and more of these for the other UFS file systems=0A= > > 9 0 0 0 DL psleep 0x1e2f830 [vmdaemon]=0A= > > 8 0 0 0 DL (threaded) [pagedaemon]=0A= > > 100067 D psleep 0x1e2e95c [dom0]=0A= > > 100072 D launds 0x1e2e968 [laundry: dom0]=0A= > > 100073 D umarcl 0x12cc720 [uma]=0A= > > =3DE2=3D80=3DA6 a bunch of usb and cam ones=0A= > > 100025 D - 0x1b2ee40 [doneq0]=0A= > > =3DE2=3D80=3DA6=0A= > > 12 0 0 0 RL (threaded) [intr]=0A= > > 100007 I [swi6: task queue]=0A= > > 100008 Run CPU 0 [swi6: Giant taskq]=0A= > > =3DE2=3D80=3DA6=0A= > > 100000 D swapin 0x1d96dfc [swapper]=0A= > > - and a bunch more in D state.=0A= > > Does this mean the swapper was trying to swap in?=0A= > >=0A= > > > acttrace=0A= > > - just shows the keyboard=0A= > > kdb_enter() at kdb_enter+0x35/frame=0A= > > vt_kbdevent() at vt_kdbevent+0x329/frame=0A= > > kdbmux_intr() at kbdmux_intr+0x19/frame=0A= > > taskqueue_run_locked() at taskqueue_run_locked+0x175/frame=0A= > > taskqueue_run() at taskqueue_run+0x44/frame=0A= > > taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame=0A= > > ithread_loop() at ithread_loop+0x237/frame=0A= > > fork_exit() at fork_exit+0x6c/frame=0A= > > fork_trampoline() at 0x../frame=0A= > >=0A= > > > show all vmem=0A= > > vmem 0x.. 'transient arena'=0A= > > quantum: 4096=0A= > > size: 23592960=0A= > > inuse: 0=0A= > > free: 23592960=0A= > > busy tags: 0=0A= > > free tags: 2=0A= > > inuse size free size=0A= > > 16777216 0 0 1 23592960=0A= > > vmem 0x.. 'buffer arena'=0A= > > quantum: 4096=0A= > > size: 94683136=0A= > > inuse: 94502912=0A= > > free: 180224=0A= > > busy tags: 1463=0A= > > free tags: 3=0A= > > inuse size free size=0A= > > 16384 2 32768 1 16384=0A= > > 32768 39 1277952 1 32768=0A= > > 65536 1422 93192192 0 0=0A= > > 131072 0 0 1 131072=0A= > > vmem 0x.. 'i386trampoline'=0A= > > quantum: 1=0A= > > size: 24576=0A= > > inuse: 20860=0A= > > free: 3716=0A= > > busy tags: 9=0A= > > free tags: 3=0A= > > inuse size free size=0A= > > 32 1 48 1 52=0A= > > 64 2 208 0 0=0A= > > 128 2 280 0 0=0A= > > 2048 1 2048 1 3664=0A= > > 4096 2 8192 0 0=0A= > > 8192 1 10084 0 0=0A= > > vmem 0x.. 'kernel rwx arena'=0A= > > quantum: 4096=0A= > > size: 0=0A= > > inuse: 0=0A= > > free: 0=0A= > > busy tags: 0=0A= > > free tags: 0=0A= > > vmem 0x.. 'kernel area dom'=0A= > > quantum: 4096=0A= > > size: 56623104=0A= > > inuse: 56582144=0A= > >> free: 40960=0A= > >> busy tags: 11224=0A= > >> free tags: 3=0A= > >I think this is the trouble.=0A= > >=0A= > >Did you tried to reduce kern.maxvnodes ? What is the default value for= =0A= > >the knob on your machine ?=0A= > The default is 84342.=0A= > I have tried 64K, 32K and 128K and they all hung sooner or later.=0A= > For the 32K case, I did see vnodes being recycled for a while before it g= =3D=0A= ot hung,=0A= > so it isn't just when it hits the limit.=0A= >=0A= > Although it is much easier for me to reproduce on an NFS mount, I did see= =0A= > a hang while doing a kernel build on UFS (no NFS mount on the machine at= =0A= > that time).=0A= >=0A= > So, I now know that the problem pre-dates r358252 and is not NFS specific= =3D=0A= .=0A= >=0A= > I'm not bisecting back further to try and isolate the commit that causes = =3D=0A= this.=0A= > (Unfortunately, each test cycle can take days. I now know that I have to = =3D=0A= do=0A= > several of these kernel builds, which take hours each, to see if a hang i= =3D=0A= s going=0A= > to happen.)=0A= >=0A= > I'll post if/when I have more, rick=0A= >=0A= > We scaled maxvnodes for ZFS and UFS, might be NFS is even more demanding,= =0A= > having larger node size.=0A= >=0A= > > inuse size free size=0A= > > 4096 11091 45428736 0 0=0A= > > 8192 63 516096 0 0=0A= > > 16384 12 196608 0 0=0A= > > 32768 6 196608 0 0=0A= > > 40960 2 81920 1 40960=0A= > > 65536 16 1048576 0 0=0A= > > 94208 1 94208 0 0=0A= > > 110592 1 110592 0 0=0A= > > 131072 15 2441216 0 0=0A= > > 262144 15 3997696 0 0=0A= > > 524288 1 524288 0 0=0A= > > 1048576 1 1945600 0 0=0A= > > vmem 0x.. 'kernel arena'=0A= > > quantum: 4096=0A= > > size: 390070272=0A= > > inuse: 386613248=0A= > > free: 3457024=0A= > > busy tags: 873=0A= > > free tags: 3=0A= > > inuse size free size=0A= > > 4096 35 143360 1 4096=0A= > > 8192 2 16384 2 16384=0A= > > 12288 1 12288 0 0=0A= > > 16384 30 491520 0 0=0A= > > 20480 140 2867200 0 0=0A= > > 65536 1 65536 0 0=0A= > > 131072 631 82706432 0 0=0A= > > 1048576 0 0 1 1339392=0A= > > 2097152 27 56623104 1 2097152=0A= > > 8388608 1 13774848 0 0=0A= > > 16777216 3 74883072 0 0=0A= > > 33554432 1 36753408 0 0=0A= > > 67108864 1 118276096 0 0=0A= > >=0A= > > > alltrace=0A= > > - I can't face typing too much more, but I'll put a few=0A= > > here that look interesting=0A= > >=0A= > > - for csh=0A= > > sched_switch()=0A= > > mi_switch()=0A= > > kern_yield()=0A= > > getblkx()=0A= > > breadn_flags()=0A= > > ffs_update()=0A= > > ufs_inactive()=0A= > > VOP_INACTIVE()=0A= > > vinactivef()=0A= > > vput_final()=0A= > > vm_object_deallocate()=0A= > > vm_map_process_deferred()=0A= > > kern_munmap()=0A= > > sys_munmap()=0A= > >=0A= > > - For cc=0A= > > sched_switch()=0A= > > mi_switch()=0A= > > sleepq_switch()=0A= > > sleepq_timedwait()=0A= > > _sleep()=0A= > > pause_sbt()=0A= > > vmem_bt_alloc()=0A= > > keg_alloc_slab()=0A= > > zone_import()=0A= > > cache_alloc()=0A= > > cache_alloc_retry()=0A= > > uma_zalloc_arg()=0A= > > bt_fill()=0A= > > vmem_xalloc()=0A= > > vmem_alloc()=0A= > > kmem_alloc()=0A= > > kmem_malloc_domainset()=0A= > > page_alloc()=0A= > > keg_alloc_slab()=0A= > > zone_import()=0A= > > cache_alloc()=0A= > > cache_alloc_retry()=0A= > > uma_zalloc_arg()=0A= > > nfscl_nget()=0A= > > nfs_create()=0A= > > vop_sigdefer()=0A= > > nfs_vnodeops_bypass()=0A= > > VOP_CREATE_APV()=0A= > > vn_open_cred()=0A= > > vn_open()=0A= > > kern_openat()=0A= > > sys_openat()=0A= > >=0A= > > Then there are a bunch of these... for cc, make=0A= > > sched_switch()=0A= > > mi_switch()=0A= > > sleepq_switch()=0A= > > sleepq_catch_signals()=0A= > > sleepq_wait_sig()=0A= > > kern_wait6()=0A= > > sys_wait4()=0A= > >=0A= > > - for vnlru=0A= > > sched_switch()=0A= > > mi_switch()=0A= > > sleepq_switch()=0A= > > sleepq_timedwait()=0A= > > _sleep()=0A= > > vnlru_proc()=0A= > > fork_exit()=0A= > > fork_trampoline()=0A= > >=0A= > > - for syncer=0A= > > sched_switch()=0A= > > mi_switch()=0A= > > critical_exit_preempt()=0A= > > intr_event_handle()=0A= > > intr_execute_handlers()=0A= > > lapic_handle_intr()=0A= > > Xapic_isr1()=0A= > > - interrupt=0A= > > memset()=0A= > > cache_alloc()=0A= > > cache_alloc_retry()=0A= > > uma_zalloc_arg()=0A= > > vmem_xalloc()=0A= > > vmem_bt_alloc()=0A= > > keg_alloc_slab()=0A= > > zone_import()=0A= > > cache_alloc()=0A= > > cache_alloc_retry()=0A= > > uma_zalloc_arg()=0A= > > bt_fill()=0A= > > vmem_xalloc()=0A= > > vmem_alloc()=0A= > > bufkva_alloc()=0A= > > getnewbuf()=0A= > > getblkx()=0A= > > breadn_flags()=0A= > > ffs_update()=0A= > > ffs_sync()=0A= > > sync_fsync()=0A= > > VOP_FSYNC_APV()=0A= > > sched_sync()=0A= > > fork_exit()=0A= > > fork_trampoline()=0A= > >=0A= > > - For bufdaemon (a bunch of these)=0A= > > sched_switch()=0A= > > mi_switch()=0A= > > sleepq_switch()=0A= > > sleepq_timedwait()=0A= > > _sleep()=0A= > > buf_daemon()=0A= > > fork_exit()=0A= > > fork_trampoline()=0A= > >=0A= > > vmdaemon and pagedaemon are basically just like above,=0A= > > sleeping in=0A= > > vm_daemon()=0A= > > or=0A= > > vm_pageout_worker()=0A= > > or=0A= > > vm_pageout_laundry_worker()=0A= > > or=0A= > > uma_reclaim_worker()=0A= > >=0A= > > That's all the typing I can take right now.=0A= > > I can probably make this happen again if you want more specific stuff.= =0A= > >=0A= > > rick=0A= > >=0A= > >=0A= > >=0A= > >=0A= > _______________________________________________=0A= > freebsd-current@freebsd.org mailing list=0A= > https://lists.freebsd.org/mailman/listinfo/freebsd-current=0A= > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= =3D=0A= "=0A= > _______________________________________________=0A= > freebsd-current@freebsd.org mailing list=0A= > https://lists.freebsd.org/mailman/listinfo/freebsd-current=0A= > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= =3D=0A= "=0A= > _______________________________________________=0A= > freebsd-current@freebsd.org mailing list=0A= > https://lists.freebsd.org/mailman/listinfo/freebsd-current=0A= > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= =0A= --_002_QB1PR01MB33649F146649F6C8BA0788B5DD980QB1PR01MB3364CANP_ Content-Type: application/octet-stream; name="vmem.patch" Content-Description: vmem.patch Content-Disposition: attachment; filename="vmem.patch"; size=497; creation-date="Fri, 19 Jun 2020 03:42:27 GMT"; modification-date="Fri, 19 Jun 2020 03:42:27 GMT" Content-Transfer-Encoding: base64 LS0tIGtlcm4vc3Vicl92bWVtLmMubm9mcmVlCTIwMjAtMDYtMTUgMTk6NTY6MDcuMzg5MTg3MDAw IC0wNzAwCisrKyBrZXJuL3N1YnJfdm1lbS5jCTIwMjAtMDYtMTUgMTk6NTk6MTYuODExMDM2MDAw IC0wNzAwCkBAIC02NzAsNyArNjcwLDcgQEAgdm1lbV9zdGFydHVwKHZvaWQpCiAJICAgIFVNQV9B TElHTl9QVFIsIDApOwogCXZtZW1fYnRfem9uZSA9IHVtYV96Y3JlYXRlKCJ2bWVtIGJ0YWciLAog CSAgICBzaXplb2Yoc3RydWN0IHZtZW1fYnRhZyksIE5VTEwsIE5VTEwsIE5VTEwsIE5VTEwsCi0J ICAgIFVNQV9BTElHTl9QVFIsIFVNQV9aT05FX1ZNKTsKKwkgICAgVU1BX0FMSUdOX1BUUiwgVU1B X1pPTkVfVk0gfCBVTUFfWk9ORV9OT0ZSRUUpOwogI2lmbmRlZiBVTUFfTURfU01BTExfQUxMT0MK IAltdHhfaW5pdCgmdm1lbV9idF9sb2NrLCAiYnRhZyBsb2NrIiwgTlVMTCwgTVRYX0RFRik7CiAJ dW1hX3ByZWFsbG9jKHZtZW1fYnRfem9uZSwgQlRfTUFYQUxMT0MpOwo= --_002_QB1PR01MB33649F146649F6C8BA0788B5DD980QB1PR01MB3364CANP_--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?QB1PR01MB33649F146649F6C8BA0788B5DD980>