From owner-freebsd-hackers@freebsd.org Fri Nov 29 15:06:19 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 8262F1B2F22 for ; Fri, 29 Nov 2019 15:06:19 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (smtp.digiware.nl [IPv6:2001:4cb8:90:ffff::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 47PdBK3MLsz414r; Fri, 29 Nov 2019 15:06:16 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from router.digiware.nl (localhost.digiware.nl [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id E3555A9D08; Fri, 29 Nov 2019 16:06:10 +0100 (CET) X-Virus-Scanned: amavisd-new at digiware.com Received: from smtp.digiware.nl ([127.0.0.1]) by router.digiware.nl (router.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bshASlAQhIJH; Fri, 29 Nov 2019 16:06:09 +0100 (CET) Received: from [192.168.101.70] (unknown [192.168.101.70]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.digiware.nl (Postfix) with ESMTPSA id 5D407A9CE9; Fri, 29 Nov 2019 16:06:09 +0100 (CET) Subject: Re: Process in T state does not want to die..... To: Konstantin Belousov Cc: FreeBSD Hackers , Alexander Motin , Andriy Gapon , Eugene Grosbein References: <3c57e51d-fa36-39a3-9691-49698e8d2124@grosbein.net> <91490c30-45e9-3c38-c55b-12534fd09e28@digiware.nl> <20191128115122.GN10580@kib.kiev.ua> <296874db-40f0-c7c9-a573-410e4c86049a@digiware.nl> <20191128195013.GU10580@kib.kiev.ua> <1ae7ad65-902c-8e5f-bcf1-1e98448c64bb@digiware.nl> <20191128214633.GV10580@kib.kiev.ua> <20191129114029.GX10580@kib.kiev.ua> From: Willem Jan Withagen Message-ID: <21b8b806-614b-cc9c-7b5f-496e0e8c541c@digiware.nl> Date: Fri, 29 Nov 2019 16:06:10 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1 MIME-Version: 1.0 In-Reply-To: <20191129114029.GX10580@kib.kiev.ua> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 47PdBK3MLsz414r X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of wjw@digiware.nl designates 2001:4cb8:90:ffff::3 as permitted sender) smtp.mailfrom=wjw@digiware.nl X-Spamd-Result: default: False [-5.31 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[digiware.nl]; RCPT_COUNT_FIVE(0.00)[5]; RCVD_COUNT_THREE(0.00)[4]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; IP_SCORE(-3.01)[ip: (-9.52), ipnet: 2001:4cb8::/29(-4.64), asn: 28878(-0.93), country: NL(0.02)]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:28878, ipnet:2001:4cb8::/29, country:NL]; MID_RHS_MATCH_FROM(0.00)[] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2019 15:06:19 -0000 On 29-11-2019 12:40, Konstantin Belousov wrote: > On Fri, Nov 29, 2019 at 12:05:34PM +0100, Willem Jan Withagen wrote: >> On 29-11-2019 11:43, Eugene Grosbein wrote: >>> 29.11.2019 16:24, Eugene Grosbein wrote: >>> >>>> 29.11.2019 4:46, Konstantin Belousov write: >>>> >>>>>> sys_extattr_set_fd+0xee amd64_syscall+0x364 fast_syscall_common+0x101 >>>>> This is an example of the cause for your problem. >>>> >>>> I observe this problem too, but my use case is different. >>>> >>>> I have several bhyve instances running Windows guests over ZVOLs over SSD-only RAIDZ1 pool. >>>> "zfs destroy" for snapshots with large "used" numbers takes long time (several minutes) due to slow TRIM. >>>> Sometimes this makes virtual guest unresponsible and attempt to restart the bhyve instance may bring it to Exiting (E) >>>> state for several minutes and it finishes successfully after that. But sometimes bhyve process hangs in T state indefinitely. >>>> >>>> This is 11.3-STABLE/amd64 r354667. Should I try your patch too? >>> >>> OTOH, same system has several FreeBSD jails over mounted ZFS (file systems) over same pool. >>> These file systems have snapshots created/removed too and snapshot are large (upto 10G). >>> >> >> From what I get from Konstantin is that this problem is due to memory >> pressure build by both ZFS and the buffercache used by UFS. >> And the buffercache is waiting for some buffer memory to be able to do >> its work. >> >> If wanted I can try and put a ZFS fs on /dev/ggate0 so that any >> buffering would be in ZFS and not in UFS. >> >> But even with the patch I still now have: >> root 3471 0.0 5.8 646768 480276 - TsJ 11:16 0:10.74 ceph-osd -i 0 >> root 3530 0.0 11.8 1153860 985020 - TsJ 11:17 0:11.51 ceph-osd -i 1 >> root 3532 0.0 5.3 608760 438676 - TsJ 11:17 0:07.31 ceph-osd -i 2 >> root 3534 0.0 3.2 435564 266328 - IsJ 11:17 0:07.35 ceph-osd -i 3 >> root 3536 0.0 4.8 565792 398392 - IsJ 11:17 0:08.73 ceph-osd -i 5 >> root 3553 0.0 2.3 362892 192348 - TsJ 11:17 0:04.21 ceph-osd -i 6 >> root 3556 0.0 3.0 421516 246956 - TsJ 11:17 0:04.81 ceph-osd -i 4 >> >> And from procstat -kk below it looks like things are still stuck in >> bwillwrite, but now with another set of functions. I guess not writing >> an extattrib() but writing a file. > Yes, it should resolve after you end the load that starves the buffer > cache' dirty space. Or wait some time until the thread gets its portion > of share, which is unfair and could take a long time. Eh, right.... This pointed me in a direction to offer some stress relief. The other process is: root 3581 0.0 0.0 10724 2372 v1 D+ 11:20 0:00.91 bonnie -s 256 Which is also stuck in disk i/o, in the kernel I guess. So killing it only works once any of the writes succeed. Luckely the geom-gateway (rbd-ggate) is also in userspace, and can be killed. Which I guess is where the buffers collected, because shooting that down, immediately allows the ceph-osd to continue crashing. So is there any controls that I can apply as to make all these components behave better? - One thing would be more memory but this board only allows 8G. (It s an oldy) - Don't run heavy UFS buffer consumers. Are there any sysctl values I can monitor to check used buffersize? I guess the system has these values, since top can find 'm # sysctl -a | grep buffer vfs.hifreebuffers: 768 vfs.lofreebuffers: 512 vfs.numfreebuffers: 52820 vfs.hidirtybuffers: 13225 vfs.lodirtybuffers: 6612 vfs.numdirtybuffers: 0 vfs.altbufferflushes: 0 vfs.dirtybufferflushes: 0 --WjW > > I will commit the VN_OPEN_INVFS patch shortly. >> >> # ps -o pid,lwp,flags,flags2,state,tracer,command -p 3471 >> PID LWP F F2 STAT TRACER COMMAND >> 3471 104097 11080081 00000000 TsJ 0 ceph-osd -i 0 >> >> # procstat -kk 3471: >> 3471 104310 ceph-osd journal_write mi_switch+0xe0 >> sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 dofilewrite+0x93 >> sys_writev+0x6e amd64_syscall+0x362 fast_syscall_common+0x101 >> 3471 104311 ceph-osd fn_jrn_objstore mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104312 ceph-osd tp_fstore_op mi_switch+0xe0 >> sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 dofilewrite+0x93 >> sys_write+0xc1 amd64_syscall+0x362 fast_syscall_common+0x101 >> 3471 104313 ceph-osd tp_fstore_op mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104314 ceph-osd fn_odsk_fstore mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104315 ceph-osd fn_appl_fstore mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104316 ceph-osd safe_timer mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104355 ceph-osd ms_dispatch mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104356 ceph-osd ms_local mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104357 ceph-osd safe_timer mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104358 ceph-osd fn_anonymous mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104359 ceph-osd safe_timer mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104360 ceph-osd ms_dispatch mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104361 ceph-osd ms_local mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104362 ceph-osd ms_dispatch mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104363 ceph-osd ms_local mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104364 ceph-osd ms_dispatch mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104365 ceph-osd ms_local mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104366 ceph-osd ms_dispatch mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104367 ceph-osd ms_local mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104368 ceph-osd ms_dispatch mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104369 ceph-osd ms_local mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104370 ceph-osd ms_dispatch mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104371 ceph-osd ms_local mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104372 ceph-osd fn_anonymous mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104373 ceph-osd finisher mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104374 ceph-osd safe_timer mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104375 ceph-osd safe_timer mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104376 ceph-osd osd_srv_agent mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104377 ceph-osd tp_osd_tp mi_switch+0xe0 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >> 3471 104378 ceph-osd tp_osd_tp mi_switch+0xe0 >> thread_suspend_switch+0x140 thread_single+0x47b sigexit+0x53 >> postsig+0x304 ast+0x327 fast_syscall_common+0x198 >> >> > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >