From owner-freebsd-hackers@freebsd.org Fri Nov 29 09:08:45 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 0A89F1A999C for ; Fri, 29 Nov 2019 09:08:45 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (smtp.digiware.nl [176.74.240.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 47PTFl2kpwz3Bnp for ; Fri, 29 Nov 2019 09:08:43 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from router.digiware.nl (localhost.digiware.nl [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id 6F69C17A73; Fri, 29 Nov 2019 10:08:40 +0100 (CET) X-Virus-Scanned: amavisd-new at digiware.com Received: from smtp.digiware.nl ([127.0.0.1]) by router.digiware.nl (router.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wn73f945ZW9b; Fri, 29 Nov 2019 10:08:39 +0100 (CET) Received: from [192.168.10.9] (vaio [192.168.10.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.digiware.nl (Postfix) with ESMTPSA id 9324617A72; Fri, 29 Nov 2019 10:08:39 +0100 (CET) Subject: Re: Process in T state does not want to die..... To: Konstantin Belousov Cc: FreeBSD Hackers , Eugene Grosbein References: <966f830c-bf09-3683-90da-e70aa343cc16@digiware.nl> <3c57e51d-fa36-39a3-9691-49698e8d2124@grosbein.net> <91490c30-45e9-3c38-c55b-12534fd09e28@digiware.nl> <20191128115122.GN10580@kib.kiev.ua> <296874db-40f0-c7c9-a573-410e4c86049a@digiware.nl> <20191128195013.GU10580@kib.kiev.ua> <1ae7ad65-902c-8e5f-bcf1-1e98448c64bb@digiware.nl> <20191128214633.GV10580@kib.kiev.ua> From: Willem Jan Withagen Message-ID: <24afad3c-331c-099a-e5f2-32e1de74c985@digiware.nl> Date: Fri, 29 Nov 2019 10:08:39 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1 MIME-Version: 1.0 In-Reply-To: <20191128214633.GV10580@kib.kiev.ua> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-Rspamd-Queue-Id: 47PTFl2kpwz3Bnp X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of wjw@digiware.nl designates 176.74.240.9 as permitted sender) smtp.mailfrom=wjw@digiware.nl X-Spamd-Result: default: False [-5.61 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+mx]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[digiware.nl]; RCVD_COUNT_THREE(0.00)[4]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[9.240.74.176.list.dnswl.org : 127.0.9.2]; IP_SCORE(-3.11)[ip: (-9.79), ipnet: 176.74.224.0/19(-4.90), asn: 28878(-0.87), country: NL(0.02)]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:28878, ipnet:176.74.224.0/19, country:NL]; MID_RHS_MATCH_FROM(0.00)[] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Nov 2019 09:08:45 -0000 On 28-11-2019 22:46, Konstantin Belousov wrote: > On Thu, Nov 28, 2019 at 09:52:50PM +0100, Willem Jan Withagen wrote: >>  # ps -o pid,lwp,flags,flags2,state,tracer,command -p 3532 >>  PID    LWP        F       F2 STAT TRACER COMMAND >> 3532 103955 11080081 00000000 TsJ       0 ceph-osd -i 5 >> >> # procstat -kk 3532 >>   PID    TID COMM                TDNAME              KSTACK ....... >>  3532 104829 ceph-osd            filestore_sync      mi_switch+0xe2 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >>  3532 104830 ceph-osd            journal_write       mi_switch+0xe2 >> sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 dofilewrite+0x93 >> sys_writev+0x6e amd64_syscall+0x364 fast_syscall_common+0x101 >>  3532 104831 ceph-osd            fn_jrn_objstore     mi_switch+0xe2 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >>  3532 104832 ceph-osd            tp_fstore_op        mi_switch+0xe2 >> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >>  3532 104833 ceph-osd            tp_fstore_op        mi_switch+0xe2 >> sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 vn_open_cred+0xc8 >> zfs_setextattr+0x216 VOP_SETEXTATTR_APV+0x7c extattr_set_vp+0x11d >> sys_extattr_set_fd+0xee amd64_syscall+0x364 fast_syscall_common+0x101 > This is an example of the cause for your problem. > > The thread is executing some ZFS code, zfs_setextattr() VOP probably to > do something with the ext attrs. There, it recurses into VFS to open a > file, and vn_open_cred() waits for buffer space pressure because it is > assumed the vn_open_cred() is called from top level, not from inside > VFS/fs code. > > Until this thread finished its operation and safely returned back to > kernel/user boundary, the process cannot exit. > There are two problems. One is this call to bwillwrite(), and it is easy > to get rid of it, see the patch at the end of the message. But I wonder > why do you have so many dirty buffers and why it does not resolve itself. > Note that ZFS does not use buffer cache, you must have some other very > active fs, using buffer cache, that is somehow blocked on writes. Oke, Thanx for the analysis. I'll try the patch.. I think the use of the buffer cache comes from bonnie++ test that is hammering the UFS filesystem that is mounted on a ceph rbd-ggate device. rbd-ggate uses geom-gate to offer a disk device that is backed by an rbd-image in the ceph cluster. And some of the nodes in the cluster run on the same node as the test, so there is a lot of ZFS activity as well. Likely this server's memory is a bit small for the load thrown at it, but atm. I do not have more beefy hardware. Bonnie is actually the only way thus far to get this type of problems... This would probably also explain why this problem does not occur when using small testsizes in bonnie: the memory pressure does not get critical. --WjW