Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Jun 2020 00:06:39 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Konstantin Belousov <kostikbel@gmail.com>, Jeff Roberson <jeff@FreeBSD.org>
Cc:        Ryan Libby <rlibby@freebsd.org>, "freebsd-current@FreeBSD.org" <freebsd-current@FreeBSD.org>
Subject:   Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc
Message-ID:  <QB1PR01MB3364EFFE7F96533866C2C770DD9D0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <QB1PR01MB3364E67BF974C11816F40DECDD820@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>
References:  <QB1PR01MB36496B9C1B6CEE512757B039DDB70@QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM> <CAHgpiFzfzF5jLvTP4fMnGhkerMUqgKgVeHWQEMkwg19naTUTwA@mail.gmail.com> <20200521101428.GC64045@kib.kiev.ua> <QB1PR01MB3649E55965E4C1A449A2D4D1DDB40@QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM>, <20200523105601.GN64045@kib.kiev.ua>, <QB1PR01MB364991F10933FA8D756D3852DDB00@QB1PR01MB3649.CANPRD01.PROD.OUTLOOK.COM>, <QB1PR01MB3364E67BF974C11816F40DECDD820@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
Rick Macklem wrote:=0A=
>r358098 will hang fairly easily, in 1-3 cycles of the kernel build over NF=
S.=0A=
>I thought this was the culprit, since I did 6 cycles of r358097 without a =
hang.=0A=
>However, I just got a hang with r358097, but it looks rather different.=0A=
>The r358097 hang did not have any processes sleeping on btalloc. They=0A=
>appeared to be waiting on two different locks in the buffer cache.=0A=
>As such, I think it might be a different problem. (I'll admit I should hav=
e=0A=
>made notes about this one before rebooting, but I was flustrated that=0A=
>it happened and rebooted before looking at it mush detail.)=0A=
Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never=
=0A=
got a hang.=0A=
--> It seems that r358097 is the culprit and r358098 makes it easier=0A=
      to reproduce.=0A=
      --> Basically runs out of kernel memory.=0A=
=0A=
It is not obvious if I can revert these two commits without reverting=0A=
other ones, since there were a bunch of vm changes after these.=0A=
=0A=
I'll take a look, but if you guys have any ideas on how to fix this, please=
=0A=
let me know.=0A=
=0A=
Thanks, rick=0A=
=0A=
Jeff, to fill you in, I have been getting intermittent hangs on a Pentium 4=
=0A=
(single core i386) with 1.25Gbytes ram when doing kernel builds using=0A=
head kernels from this winter. (I also saw one when doing a kernel build=0A=
on UFS, so they aren't NFS specific, although easier to reproduce that way.=
)=0A=
After a typical hang, there will be a bunch of processes sleeping on "btall=
oc"=0A=
and several processes holding the following lock:=0A=
exclusive sx lock @ vm/vm_map.c:4761=0A=
- I have seen hangs where that is the only lock held by any process except=
=0A=
   the interrupt thread.=0A=
- I have also seen processes waiting on the following locks:=0A=
kern/subr_vmem.c:1343=0A=
kern/subr_vmem.c:633=0A=
=0A=
I can't be absolutely sure r358098 is the culprit, but it seems to make the=
=0A=
problem more reproducible.=0A=
=0A=
If anyone has a patch suggestion, I can test it.=0A=
Otherwise, I will continue to test r358097 and earlier, to try and see what=
 hangs=0A=
occur. (I've done 8 cycles of testing of r356776 without difficulties, but =
that=0A=
doesn't guarantee it isn't broken.)=0A=
=0A=
There is a bunch more of the stuff I got for Kostik and Ryan below.=0A=
I can do "db" when it is hung, but it is a screen console, so I need to=0A=
transcribe the output to email by hand. (ie. If you need something=0A=
specific I can do that, but trying to do everything Kostik and Ryan asked=
=0A=
for isn't easy.)=0A=
=0A=
rick=0A=
=0A=
=0A=
=0A=
Konstantin Belousov wrote:=0A=
>On Fri, May 22, 2020 at 11:46:26PM +0000, Rick Macklem wrote:=0A=
>> Konstantin Belousov wrote:=0A=
>> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:=0A=
>> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem <rmacklem@uoguelph.ca> w=
rote:=0A=
>> >> >=0A=
>> >> > Hi,=0A=
>> >> >=0A=
>> >> > Since I hadn't upgraded a kernel through the winter, it took me a w=
hile=0A=
>> >> > to bisect this, but r358252 seems to be the culprit.=0A=
No longer true. I succeeded in reproducing the hang to-day running a=0A=
r358251 kernel.=0A=
=0A=
I haven't had much luck sofar, but see below for what I have learned.=0A=
=0A=
>> >> >=0A=
>> >> > If I do a kernel build over NFS using my not so big Pentium 4 (sing=
le core,=0A=
>> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.=0A=
>> >> > When I do a "ps" in the debugger, I see processes sleeping on btall=
oc.=0A=
>> >> > If I revert to r358251, I cannot reproduce this.=0A=
As above, this is no longer true.=0A=
=0A=
>> >> >=0A=
>> >> > Any ideas?=0A=
>> >> >=0A=
>> >> > I can easily test any change you might suggest to see if it fixes t=
he=0A=
>> >> > problem.=0A=
>> >> >=0A=
>> >> > If you want more debug info, let me know, since I can easily=0A=
>> >> > reproduce it.=0A=
>> >> >=0A=
>> >> > Thanks, rick=0A=
>> >>=0A=
>> >> Nothing obvious to me.  I can maybe try a repro on a VM...=0A=
>> >>=0A=
>> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome=
.=0A=
>> >>=0A=
>> >> "btalloc" is "We're either out of address space or lost a fill race."=
=0A=
>From what I see, I think it is "out of address space".=0A=
For one of the hangs, when I did "show alllocks", everything except the=0A=
intr thread, was waiting for the=0A=
exclusive sx lock @ vm/vm_map.c:4761=0A=
=0A=
>> >=0A=
>> >Yes, I would be not surprised to be out of something on 1G i386 machine=
.=0A=
>> >Please also add 'show alllocks'.=0A=
>> Ok, I used an up to date head kernel and it took longer to reproduce a h=
ang.=0A=
Go down to Kostik's comment about kern.maxvnodes for the rest of what I've=
=0A=
learned. (The time it takes to reproduce one of these varies greatly, but I=
 usually=0A=
get one within 3 cycles of a full kernel build over NFS. I have had it happ=
en=0A=
once when doing a kernel build over UFS.)=0A=
=0A=
>> This time, none of the processes are stuck on "btalloc".=0A=
> I'll try and give you most of the above, but since I have to type it in b=
y hand=0A=
> from the screen, I might not get it all. (I'm no real typist;-)=0A=
> > show alllocks=0A=
> exclusive lockmgr ufs (ufs) r =3D 0 locked @ kern/vfs_subr.c: 3259=0A=
> exclusive lockmgr nfs (nfs) r =3D 0 locked @ kern/vfs_lookup.c:737=0A=
> exclusive sleep mutex kernel area domain (kernel arena domain) r =3D 0 lo=
cked @ kern/subr_vmem.c:1343=0A=
> exclusive lockmgr bufwait (bufwait) r =3D 0 locked @ kern/vfs_bio.c:1663=
=0A=
> exclusive lockmgr ufs (ufs) r =3D 0 locked @ kern/vfs_subr.c:2930=0A=
> exclusive lockmgr syncer (syncer) r =3D 0 locked @ kern/vfs_subr.c:2474=
=0A=
> Process 12 (intr) thread 0x.. (1000008)=0A=
> exclusive sleep mutex Giant (Giant) r =3D 0 locked @ kern/kern_intr.c:115=
2=0A=
>=0A=
> > ps=0A=
> - Not going to list them all, but here are the ones that seem interesting=
...=0A=
> 18 0 0 0 DL vlruwt 0x11d939cc [vnlru]=0A=
> 16 0 0 0 DL (threaded)               [bufdaemon]=0A=
> 100069  D  qsleep                      [bufdaemon]=0A=
> 100074  D  -                               [bufspacedaemon-0]=0A=
> 100084  D  sdflush  0x11923284 [/ worker]=0A=
> - and more of these for the other UFS file systems=0A=
> 9 0 0 0   DL psleep  0x1e2f830  [vmdaemon]=0A=
> 8 0 0 0   DL (threaded)               [pagedaemon]=0A=
> 100067  D   psleep 0x1e2e95c   [dom0]=0A=
> 100072  D   launds 0x1e2e968   [laundry: dom0]=0A=
> 100073  D   umarcl 0x12cc720   [uma]=0A=
> =85 a bunch of usb and cam ones=0A=
> 100025  D   -           0x1b2ee40  [doneq0]=0A=
> =85=0A=
> 12 0 0 0 RL  (threaded)               [intr]=0A=
> 100007  I                                     [swi6: task queue]=0A=
> 100008  Run           CPU 0           [swi6: Giant taskq]=0A=
> =85=0A=
> 100000  D   swapin 0x1d96dfc    [swapper]=0A=
> - and a bunch more in D state.=0A=
> Does this mean the swapper was trying to swap in?=0A=
>=0A=
> > acttrace=0A=
> - just shows the keyboard=0A=
> kdb_enter() at kdb_enter+0x35/frame=0A=
> vt_kbdevent() at vt_kdbevent+0x329/frame=0A=
> kdbmux_intr() at kbdmux_intr+0x19/frame=0A=
> taskqueue_run_locked() at taskqueue_run_locked+0x175/frame=0A=
> taskqueue_run() at taskqueue_run+0x44/frame=0A=
> taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame=0A=
> ithread_loop() at ithread_loop+0x237/frame=0A=
> fork_exit() at fork_exit+0x6c/frame=0A=
> fork_trampoline() at 0x../frame=0A=
>=0A=
> > show all vmem=0A=
> vmem 0x.. 'transient arena'=0A=
>   quantum: 4096=0A=
>   size:  23592960=0A=
>   inuse: 0=0A=
>   free: 23592960=0A=
>   busy tags:   0=0A=
>   free tags:    2=0A=
>                      inuse        size       free        size=0A=
>   16777216   0               0           1            23592960=0A=
> vmem 0x.. 'buffer arena'=0A=
>   quantum:  4096=0A=
>   size:   94683136=0A=
>   inuse: 94502912=0A=
>   free: 180224=0A=
>   busy tags:    1463=0A=
>   free tags:      3=0A=
>                        inuse      size         free        size=0A=
>   16384           2             32768     1             16384=0A=
>   32768           39           1277952 1              32768=0A=
>   65536            1422       93192192 0           0=0A=
>   131072          0             0            1              131072=0A=
> vmem 0x.. 'i386trampoline'=0A=
>   quantum:      1=0A=
>   size:               24576=0A=
>   inuse:             20860=0A=
>   free:               3716=0A=
>   busy tags:     9=0A=
>   free tags:      3=0A=
>                        inuse      size      free      size=0A=
>   32                 1             48        1           52=0A=
>   64                  2            208       0           0=0A=
>   128                2            280       0            0=0A=
>   2048              1            2048     1           3664=0A=
>   4096              2            8192     0           0=0A=
>   8192              1            10084   0           0=0A=
> vmem 0x.. 'kernel rwx arena'=0A=
>   quantum:    4096=0A=
>   size: 0=0A=
>   inuse: 0=0A=
>   free:   0=0A=
>   busy tags: 0=0A=
>   free tags:  0=0A=
> vmem 0x.. 'kernel area dom'=0A=
>   quantum:  4096=0A=
>   size: 56623104=0A=
>   inuse: 56582144=0A=
>>   free: 40960=0A=
>>   busy tags: 11224=0A=
>>   free tags: 3=0A=
>I think this is the trouble.=0A=
>=0A=
>Did you tried to reduce kern.maxvnodes ?  What is the default value for=0A=
>the knob on your machine ?=0A=
The default is 84342.=0A=
I have tried 64K, 32K and 128K and they all hung sooner or later.=0A=
For the 32K case, I did see vnodes being recycled for a while before it got=
 hung,=0A=
so it isn't just when it hits the limit.=0A=
=0A=
Although it is much easier for me to reproduce on an NFS mount, I did see=
=0A=
a hang while doing a kernel build on UFS (no NFS mount on the machine at=0A=
that time).=0A=
=0A=
So, I now know that the problem pre-dates r358252 and is not NFS specific.=
=0A=
=0A=
I'm not bisecting back further to try and isolate the commit that causes th=
is.=0A=
(Unfortunately, each test cycle can take days. I now know that I have to do=
=0A=
several of these kernel builds, which take hours each, to see if a hang is =
going=0A=
to happen.)=0A=
=0A=
I'll post if/when I have more, rick=0A=
=0A=
We scaled maxvnodes for ZFS and UFS, might be NFS is even more demanding,=
=0A=
having larger node size.=0A=
=0A=
>                 inuse     size          free              size=0A=
>   4096      11091    45428736 0                 0=0A=
>   8192       63         516096     0                 0=0A=
>   16384     12          196608    0                  0=0A=
>   32768      6           196608    0                  0=0A=
>   40960      2           81920      1                  40960=0A=
>   65536      16         1048576   0                 0=0A=
>   94208      1            94208      0                 0=0A=
>   110592     1          110592     0                 0=0A=
>   131072      15       2441216    0                0=0A=
>   262144      15       3997696    0                 0=0A=
>   524288      1         524288      0                 0=0A=
>   1048576     1        1945600     0                0=0A=
> vmem 0x.. 'kernel arena'=0A=
>   quantum: 4096=0A=
>   size: 390070272=0A=
>   inuse: 386613248=0A=
>   free: 3457024=0A=
>   busy tags:  873=0A=
>   free tags:   3=0A=
>                    inuse       size         free        size=0A=
> 4096           35           143360    1            4096=0A=
> 8192           2              16384      2           16384=0A=
> 12288         1              12288      0           0=0A=
> 16384         30             491520   0           0=0A=
> 20480         140           2867200  0          0=0A=
> 65536          1              65536     0           0=0A=
> 131072        631          82706432 0         0=0A=
> 1048576      0              0               1         1339392=0A=
> 2097152      27            56623104   1        2097152=0A=
> 8388608      1              13774848    0       0=0A=
> 16777216    3              74883072   0        0=0A=
> 33554432    1              36753408   0        0=0A=
> 67108864    1              118276096  0       0=0A=
>=0A=
> > alltrace=0A=
> - I can't face typing too much more, but I'll put a few=0A=
>   here that look interesting=0A=
>=0A=
> - for csh=0A=
> sched_switch()=0A=
> mi_switch()=0A=
> kern_yield()=0A=
> getblkx()=0A=
> breadn_flags()=0A=
> ffs_update()=0A=
> ufs_inactive()=0A=
> VOP_INACTIVE()=0A=
> vinactivef()=0A=
> vput_final()=0A=
> vm_object_deallocate()=0A=
> vm_map_process_deferred()=0A=
> kern_munmap()=0A=
> sys_munmap()=0A=
>=0A=
> - For cc=0A=
> sched_switch()=0A=
> mi_switch()=0A=
> sleepq_switch()=0A=
> sleepq_timedwait()=0A=
> _sleep()=0A=
> pause_sbt()=0A=
> vmem_bt_alloc()=0A=
> keg_alloc_slab()=0A=
> zone_import()=0A=
> cache_alloc()=0A=
> cache_alloc_retry()=0A=
> uma_zalloc_arg()=0A=
> bt_fill()=0A=
> vmem_xalloc()=0A=
> vmem_alloc()=0A=
> kmem_alloc()=0A=
> kmem_malloc_domainset()=0A=
> page_alloc()=0A=
> keg_alloc_slab()=0A=
> zone_import()=0A=
> cache_alloc()=0A=
> cache_alloc_retry()=0A=
> uma_zalloc_arg()=0A=
> nfscl_nget()=0A=
> nfs_create()=0A=
> vop_sigdefer()=0A=
> nfs_vnodeops_bypass()=0A=
> VOP_CREATE_APV()=0A=
> vn_open_cred()=0A=
> vn_open()=0A=
> kern_openat()=0A=
> sys_openat()=0A=
>=0A=
> Then there are a bunch of these... for cc, make=0A=
> sched_switch()=0A=
> mi_switch()=0A=
> sleepq_switch()=0A=
> sleepq_catch_signals()=0A=
> sleepq_wait_sig()=0A=
> kern_wait6()=0A=
> sys_wait4()=0A=
>=0A=
> - for vnlru=0A=
> sched_switch()=0A=
> mi_switch()=0A=
> sleepq_switch()=0A=
> sleepq_timedwait()=0A=
> _sleep()=0A=
> vnlru_proc()=0A=
> fork_exit()=0A=
> fork_trampoline()=0A=
>=0A=
> - for syncer=0A=
> sched_switch()=0A=
> mi_switch()=0A=
> critical_exit_preempt()=0A=
> intr_event_handle()=0A=
> intr_execute_handlers()=0A=
> lapic_handle_intr()=0A=
> Xapic_isr1()=0A=
> - interrupt=0A=
> memset()=0A=
> cache_alloc()=0A=
> cache_alloc_retry()=0A=
> uma_zalloc_arg()=0A=
> vmem_xalloc()=0A=
> vmem_bt_alloc()=0A=
> keg_alloc_slab()=0A=
> zone_import()=0A=
> cache_alloc()=0A=
> cache_alloc_retry()=0A=
> uma_zalloc_arg()=0A=
> bt_fill()=0A=
> vmem_xalloc()=0A=
> vmem_alloc()=0A=
> bufkva_alloc()=0A=
> getnewbuf()=0A=
> getblkx()=0A=
> breadn_flags()=0A=
> ffs_update()=0A=
> ffs_sync()=0A=
> sync_fsync()=0A=
> VOP_FSYNC_APV()=0A=
> sched_sync()=0A=
> fork_exit()=0A=
> fork_trampoline()=0A=
>=0A=
> - For bufdaemon (a bunch of these)=0A=
> sched_switch()=0A=
> mi_switch()=0A=
> sleepq_switch()=0A=
> sleepq_timedwait()=0A=
> _sleep()=0A=
> buf_daemon()=0A=
> fork_exit()=0A=
> fork_trampoline()=0A=
>=0A=
> vmdaemon and pagedaemon are basically just like above,=0A=
> sleeping in=0A=
> vm_daemon()=0A=
> or=0A=
> vm_pageout_worker()=0A=
> or=0A=
> vm_pageout_laundry_worker()=0A=
> or=0A=
> uma_reclaim_worker()=0A=
>=0A=
> That's all the typing I can take right now.=0A=
> I can probably make this happen again if you want more specific stuff.=0A=
>=0A=
> rick=0A=
>=0A=
>=0A=
>=0A=
>=0A=
_______________________________________________=0A=
freebsd-current@freebsd.org mailing list=0A=
https://lists.freebsd.org/mailman/listinfo/freebsd-current=0A=
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"=
=0A=
_______________________________________________=0A=
freebsd-current@freebsd.org mailing list=0A=
https://lists.freebsd.org/mailman/listinfo/freebsd-current=0A=
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"=
=0A=
_______________________________________________=0A=
freebsd-current@freebsd.org mailing list=0A=
https://lists.freebsd.org/mailman/listinfo/freebsd-current=0A=
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"=
=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?QB1PR01MB3364EFFE7F96533866C2C770DD9D0>