Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Nov 2019 13:40:29 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Willem Jan Withagen <wjw@digiware.nl>
Cc:        Eugene Grosbein <eugen@grosbein.net>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, Alexander Motin <mav@FreeBSD.org>, Andriy Gapon <avg@freebsd.org>
Subject:   Re: Process in T state does not want to die.....
Message-ID:  <20191129114029.GX10580@kib.kiev.ua>
In-Reply-To: <c9b8b5e9-93dc-0535-bb2c-5860f2f231dd@digiware.nl>
References:  <3c57e51d-fa36-39a3-9691-49698e8d2124@grosbein.net> <91490c30-45e9-3c38-c55b-12534fd09e28@digiware.nl> <20191128115122.GN10580@kib.kiev.ua> <296874db-40f0-c7c9-a573-410e4c86049a@digiware.nl> <20191128195013.GU10580@kib.kiev.ua> <1ae7ad65-902c-8e5f-bcf1-1e98448c64bb@digiware.nl> <20191128214633.GV10580@kib.kiev.ua> <a2daa66f-b073-8c20-3668-aceec25b4ba9@grosbein.net> <b7cf405a-ceaf-5d4c-214c-d7ad5c9557e7@grosbein.net> <c9b8b5e9-93dc-0535-bb2c-5860f2f231dd@digiware.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Nov 29, 2019 at 12:05:34PM +0100, Willem Jan Withagen wrote:
> On 29-11-2019 11:43, Eugene Grosbein wrote:
> > 29.11.2019 16:24, Eugene Grosbein wrote:
> > 
> >> 29.11.2019 4:46, Konstantin Belousov write:
> >>
> >>>> sys_extattr_set_fd+0xee amd64_syscall+0x364 fast_syscall_common+0x101
> >>> This is an example of the cause for your problem.
> >>
> >> I observe this problem too, but my use case is different.
> >>
> >> I have several bhyve instances running Windows guests over ZVOLs over SSD-only RAIDZ1 pool.
> >> "zfs destroy" for snapshots with large "used" numbers takes long time (several minutes) due to slow TRIM.
> >> Sometimes this makes virtual guest unresponsible and attempt to restart the bhyve instance may bring it to Exiting (E)
> >> state for several minutes and it finishes successfully after that. But sometimes bhyve process hangs in T state indefinitely.
> >>
> >> This is 11.3-STABLE/amd64 r354667. Should I try your patch too?
> > 
> > OTOH, same system has several FreeBSD jails over mounted ZFS (file systems) over same pool.
> > These file systems have snapshots created/removed too and snapshot are large (upto 10G).
> > 
> 
>  From what I get from Konstantin is that this problem is due to memory 
> pressure build by both ZFS and the buffercache used by UFS.
> And the buffercache is waiting for some buffer memory to be able to do 
> its work.
> 
> If wanted I can try and put a ZFS fs on /dev/ggate0 so that any 
> buffering would be in ZFS and not in UFS.
> 
> But even with the patch I still now have:
> root 3471   0.0  5.8  646768 480276  - TsJ  11:16  0:10.74 ceph-osd -i 0
> root 3530   0.0 11.8 1153860 985020  - TsJ  11:17  0:11.51 ceph-osd -i 1
> root 3532   0.0  5.3  608760 438676  - TsJ  11:17  0:07.31 ceph-osd -i 2
> root 3534   0.0  3.2  435564 266328  - IsJ  11:17  0:07.35 ceph-osd -i 3
> root 3536   0.0  4.8  565792 398392  - IsJ  11:17  0:08.73 ceph-osd -i 5
> root 3553   0.0  2.3  362892 192348  - TsJ  11:17  0:04.21 ceph-osd -i 6
> root 3556   0.0  3.0  421516 246956  - TsJ  11:17  0:04.81 ceph-osd -i 4
> 
> And from procstat -kk below it looks like things are still stuck in 
> bwillwrite, but now with another set of functions. I guess not writing 
> an extattrib() but writing a file.
Yes, it should resolve after you end the load that starves the buffer
cache' dirty space.  Or wait some time until the thread gets its portion
of share, which is unfair and could take a long time.

I will commit the VN_OPEN_INVFS patch shortly.
> 
> # ps -o pid,lwp,flags,flags2,state,tracer,command -p 3471
>   PID    LWP        F       F2 STAT TRACER COMMAND
> 3471 104097 11080081 00000000 TsJ       0 ceph-osd -i 0
> 
> # procstat -kk 3471:
>   3471 104310 ceph-osd            journal_write       mi_switch+0xe0 
> sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 dofilewrite+0x93 
> sys_writev+0x6e amd64_syscall+0x362 fast_syscall_common+0x101
>   3471 104311 ceph-osd            fn_jrn_objstore     mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104312 ceph-osd            tp_fstore_op        mi_switch+0xe0 
> sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 dofilewrite+0x93 
> sys_write+0xc1 amd64_syscall+0x362 fast_syscall_common+0x101
>   3471 104313 ceph-osd            tp_fstore_op        mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104314 ceph-osd            fn_odsk_fstore      mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104315 ceph-osd            fn_appl_fstore      mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104316 ceph-osd            safe_timer          mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104355 ceph-osd            ms_dispatch         mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104356 ceph-osd            ms_local            mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104357 ceph-osd            safe_timer          mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104358 ceph-osd            fn_anonymous        mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104359 ceph-osd            safe_timer          mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104360 ceph-osd            ms_dispatch         mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104361 ceph-osd            ms_local            mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104362 ceph-osd            ms_dispatch         mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104363 ceph-osd            ms_local            mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104364 ceph-osd            ms_dispatch         mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104365 ceph-osd            ms_local            mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104366 ceph-osd            ms_dispatch         mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104367 ceph-osd            ms_local            mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104368 ceph-osd            ms_dispatch         mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104369 ceph-osd            ms_local            mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104370 ceph-osd            ms_dispatch         mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104371 ceph-osd            ms_local            mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104372 ceph-osd            fn_anonymous        mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104373 ceph-osd            finisher            mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104374 ceph-osd            safe_timer          mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104375 ceph-osd            safe_timer          mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104376 ceph-osd            osd_srv_agent       mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104377 ceph-osd            tp_osd_tp           mi_switch+0xe0 
> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>   3471 104378 ceph-osd            tp_osd_tp           mi_switch+0xe0 
> thread_suspend_switch+0x140 thread_single+0x47b sigexit+0x53 
> postsig+0x304 ast+0x327 fast_syscall_common+0x198
> 
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20191129114029.GX10580>