Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Nov 2019 10:08:39 +0100
From:      Willem Jan Withagen <wjw@digiware.nl>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>, Eugene Grosbein <eugen@grosbein.net>
Subject:   Re: Process in T state does not want to die.....
Message-ID:  <24afad3c-331c-099a-e5f2-32e1de74c985@digiware.nl>
In-Reply-To: <20191128214633.GV10580@kib.kiev.ua>
References:  <966f830c-bf09-3683-90da-e70aa343cc16@digiware.nl> <3c57e51d-fa36-39a3-9691-49698e8d2124@grosbein.net> <91490c30-45e9-3c38-c55b-12534fd09e28@digiware.nl> <20191128115122.GN10580@kib.kiev.ua> <296874db-40f0-c7c9-a573-410e4c86049a@digiware.nl> <20191128195013.GU10580@kib.kiev.ua> <1ae7ad65-902c-8e5f-bcf1-1e98448c64bb@digiware.nl> <20191128214633.GV10580@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 28-11-2019 22:46, Konstantin Belousov wrote:
> On Thu, Nov 28, 2019 at 09:52:50PM +0100, Willem Jan Withagen wrote:
>>    # ps -o pid,lwp,flags,flags2,state,tracer,command -p 3532
>>    PID    LWP        F       F2 STAT TRACER COMMAND
>> 3532 103955 11080081 00000000 TsJ       0 ceph-osd -i 5
>>
>> # procstat -kk 3532
>>     PID    TID COMM                TDNAME              KSTACK
.......
>>    3532 104829 ceph-osd            filestore_sync      mi_switch+0xe2
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3532 104830 ceph-osd            journal_write       mi_switch+0xe2
>> sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 dofilewrite+0x93
>> sys_writev+0x6e amd64_syscall+0x364 fast_syscall_common+0x101
>>    3532 104831 ceph-osd            fn_jrn_objstore     mi_switch+0xe2
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3532 104832 ceph-osd            tp_fstore_op        mi_switch+0xe2
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3532 104833 ceph-osd            tp_fstore_op        mi_switch+0xe2
>> sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 vn_open_cred+0xc8
>> zfs_setextattr+0x216 VOP_SETEXTATTR_APV+0x7c extattr_set_vp+0x11d
>> sys_extattr_set_fd+0xee amd64_syscall+0x364 fast_syscall_common+0x101
> This is an example of the cause for your problem.
>
> The thread is executing some ZFS code, zfs_setextattr() VOP probably to
> do something with the ext attrs. There, it recurses into VFS to open a
> file, and vn_open_cred() waits for buffer space pressure because it is
> assumed the vn_open_cred() is called from top level, not from inside
> VFS/fs code.
>
> Until this thread finished its operation and safely returned back to
> kernel/user boundary, the process cannot exit.

> There are two problems.  One is this call to bwillwrite(), and it is easy
> to get rid of it, see the patch at the end of the message.  But I wonder
> why do you have so many dirty buffers and why it does not resolve itself.
> Note that ZFS does not use buffer cache, you must have some other very
> active fs, using buffer cache, that is somehow blocked on writes.

Oke,
Thanx for the analysis. I'll try the patch..

I think the use of the buffer cache comes from bonnie++ test that is 
hammering the UFS filesystem
that is mounted on a ceph rbd-ggate device. rbd-ggate uses geom-gate to 
offer a disk device
that is backed by an rbd-image in the ceph cluster. And some of the 
nodes in the cluster run
on the same node as the test, so there is a lot of ZFS activity as well.
Likely this server's memory is a bit small for the load thrown at it, 
but atm. I do not have more
beefy hardware.
Bonnie is actually the only way thus far to get this type of problems...

This would probably also explain why this problem does not occur when 
using small testsizes
in bonnie: the memory pressure does not get critical.

--WjW




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?24afad3c-331c-099a-e5f2-32e1de74c985>