Date: Fri, 29 Nov 2019 13:40:29 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Willem Jan Withagen <wjw@digiware.nl> Cc: Eugene Grosbein <eugen@grosbein.net>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, Alexander Motin <mav@FreeBSD.org>, Andriy Gapon <avg@freebsd.org> Subject: Re: Process in T state does not want to die..... Message-ID: <20191129114029.GX10580@kib.kiev.ua> In-Reply-To: <c9b8b5e9-93dc-0535-bb2c-5860f2f231dd@digiware.nl> References: <3c57e51d-fa36-39a3-9691-49698e8d2124@grosbein.net> <91490c30-45e9-3c38-c55b-12534fd09e28@digiware.nl> <20191128115122.GN10580@kib.kiev.ua> <296874db-40f0-c7c9-a573-410e4c86049a@digiware.nl> <20191128195013.GU10580@kib.kiev.ua> <1ae7ad65-902c-8e5f-bcf1-1e98448c64bb@digiware.nl> <20191128214633.GV10580@kib.kiev.ua> <a2daa66f-b073-8c20-3668-aceec25b4ba9@grosbein.net> <b7cf405a-ceaf-5d4c-214c-d7ad5c9557e7@grosbein.net> <c9b8b5e9-93dc-0535-bb2c-5860f2f231dd@digiware.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Nov 29, 2019 at 12:05:34PM +0100, Willem Jan Withagen wrote: > On 29-11-2019 11:43, Eugene Grosbein wrote: > > 29.11.2019 16:24, Eugene Grosbein wrote: > > > >> 29.11.2019 4:46, Konstantin Belousov write: > >> > >>>> sys_extattr_set_fd+0xee amd64_syscall+0x364 fast_syscall_common+0x101 > >>> This is an example of the cause for your problem. > >> > >> I observe this problem too, but my use case is different. > >> > >> I have several bhyve instances running Windows guests over ZVOLs over SSD-only RAIDZ1 pool. > >> "zfs destroy" for snapshots with large "used" numbers takes long time (several minutes) due to slow TRIM. > >> Sometimes this makes virtual guest unresponsible and attempt to restart the bhyve instance may bring it to Exiting (E) > >> state for several minutes and it finishes successfully after that. But sometimes bhyve process hangs in T state indefinitely. > >> > >> This is 11.3-STABLE/amd64 r354667. Should I try your patch too? > > > > OTOH, same system has several FreeBSD jails over mounted ZFS (file systems) over same pool. > > These file systems have snapshots created/removed too and snapshot are large (upto 10G). > > > > From what I get from Konstantin is that this problem is due to memory > pressure build by both ZFS and the buffercache used by UFS. > And the buffercache is waiting for some buffer memory to be able to do > its work. > > If wanted I can try and put a ZFS fs on /dev/ggate0 so that any > buffering would be in ZFS and not in UFS. > > But even with the patch I still now have: > root 3471 0.0 5.8 646768 480276 - TsJ 11:16 0:10.74 ceph-osd -i 0 > root 3530 0.0 11.8 1153860 985020 - TsJ 11:17 0:11.51 ceph-osd -i 1 > root 3532 0.0 5.3 608760 438676 - TsJ 11:17 0:07.31 ceph-osd -i 2 > root 3534 0.0 3.2 435564 266328 - IsJ 11:17 0:07.35 ceph-osd -i 3 > root 3536 0.0 4.8 565792 398392 - IsJ 11:17 0:08.73 ceph-osd -i 5 > root 3553 0.0 2.3 362892 192348 - TsJ 11:17 0:04.21 ceph-osd -i 6 > root 3556 0.0 3.0 421516 246956 - TsJ 11:17 0:04.81 ceph-osd -i 4 > > And from procstat -kk below it looks like things are still stuck in > bwillwrite, but now with another set of functions. I guess not writing > an extattrib() but writing a file. Yes, it should resolve after you end the load that starves the buffer cache' dirty space. Or wait some time until the thread gets its portion of share, which is unfair and could take a long time. I will commit the VN_OPEN_INVFS patch shortly. > > # ps -o pid,lwp,flags,flags2,state,tracer,command -p 3471 > PID LWP F F2 STAT TRACER COMMAND > 3471 104097 11080081 00000000 TsJ 0 ceph-osd -i 0 > > # procstat -kk 3471: > 3471 104310 ceph-osd journal_write mi_switch+0xe0 > sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 dofilewrite+0x93 > sys_writev+0x6e amd64_syscall+0x362 fast_syscall_common+0x101 > 3471 104311 ceph-osd fn_jrn_objstore mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104312 ceph-osd tp_fstore_op mi_switch+0xe0 > sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 dofilewrite+0x93 > sys_write+0xc1 amd64_syscall+0x362 fast_syscall_common+0x101 > 3471 104313 ceph-osd tp_fstore_op mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104314 ceph-osd fn_odsk_fstore mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104315 ceph-osd fn_appl_fstore mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104316 ceph-osd safe_timer mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104355 ceph-osd ms_dispatch mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104356 ceph-osd ms_local mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104357 ceph-osd safe_timer mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104358 ceph-osd fn_anonymous mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104359 ceph-osd safe_timer mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104360 ceph-osd ms_dispatch mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104361 ceph-osd ms_local mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104362 ceph-osd ms_dispatch mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104363 ceph-osd ms_local mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104364 ceph-osd ms_dispatch mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104365 ceph-osd ms_local mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104366 ceph-osd ms_dispatch mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104367 ceph-osd ms_local mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104368 ceph-osd ms_dispatch mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104369 ceph-osd ms_local mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104370 ceph-osd ms_dispatch mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104371 ceph-osd ms_local mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104372 ceph-osd fn_anonymous mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104373 ceph-osd finisher mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104374 ceph-osd safe_timer mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104375 ceph-osd safe_timer mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104376 ceph-osd osd_srv_agent mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104377 ceph-osd tp_osd_tp mi_switch+0xe0 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f > 3471 104378 ceph-osd tp_osd_tp mi_switch+0xe0 > thread_suspend_switch+0x140 thread_single+0x47b sigexit+0x53 > postsig+0x304 ast+0x327 fast_syscall_common+0x198 > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20191129114029.GX10580>