From owner-freebsd-hackers@freebsd.org Thu Nov 28 21:46:50 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 527901B9983 for ; Thu, 28 Nov 2019 21:46:50 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 47PB6w5vYNz483y for ; Thu, 28 Nov 2019 21:46:48 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id xASLkY4W033250 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Thu, 28 Nov 2019 23:46:37 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua xASLkY4W033250 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id xASLkXjS033249; Thu, 28 Nov 2019 23:46:33 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 28 Nov 2019 23:46:33 +0200 From: Konstantin Belousov To: Willem Jan Withagen Cc: FreeBSD Hackers , Eugene Grosbein Subject: Re: Process in T state does not want to die..... Message-ID: <20191128214633.GV10580@kib.kiev.ua> References: <966f830c-bf09-3683-90da-e70aa343cc16@digiware.nl> <3c57e51d-fa36-39a3-9691-49698e8d2124@grosbein.net> <91490c30-45e9-3c38-c55b-12534fd09e28@digiware.nl> <20191128115122.GN10580@kib.kiev.ua> <296874db-40f0-c7c9-a573-410e4c86049a@digiware.nl> <20191128195013.GU10580@kib.kiev.ua> <1ae7ad65-902c-8e5f-bcf1-1e98448c64bb@digiware.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1ae7ad65-902c-8e5f-bcf1-1e98448c64bb@digiware.nl> User-Agent: Mutt/1.12.2 (2019-09-21) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-Rspamd-Queue-Id: 47PB6w5vYNz483y X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=gmail.com (policy=none); spf=softfail (mx1.freebsd.org: 2001:470:d5e7:1::1 is neither permitted nor denied by domain of kostikbel@gmail.com) smtp.mailfrom=kostikbel@gmail.com X-Spamd-Result: default: False [-2.00 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[gmail.com]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[]; R_SPF_SOFTFAIL(0.00)[~all]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; IP_SCORE(0.00)[ip: (-2.74), ipnet: 2001:470::/32(-4.64), asn: 6939(-3.51), country: US(-0.05)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; FREEMAIL_ENVFROM(0.00)[gmail.com]; DMARC_POLICY_SOFTFAIL(0.10)[gmail.com : No valid SPF, No valid DKIM,none] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Nov 2019 21:46:50 -0000 On Thu, Nov 28, 2019 at 09:52:50PM +0100, Willem Jan Withagen wrote: >  # ps -o pid,lwp,flags,flags2,state,tracer,command -p 3532 >  PID    LWP        F       F2 STAT TRACER COMMAND > 3532 103955 11080081 00000000 TsJ       0 ceph-osd -i 5 > > # procstat -kk 3532 >   PID    TID COMM                TDNAME              KSTACK >  3532 103166 ceph-osd            log                 mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103167 ceph-osd            service             mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103168 ceph-osd            admin_socket        mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103169 ceph-osd            msgr-worker-0       mi_switch+0xe2 > thread_suspend_switch+0x140 thread_single+0x47b sigexit+0x53 > postsig+0x304 ast+0x327 fast_syscall_common+0x198 >  3532 103170 ceph-osd            msgr-worker-1       mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103171 ceph-osd            msgr-worker-2       mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103172 ceph-osd            signal_handler      mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103406 ceph-osd            OpHistorySvc        mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103407 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103418 ceph-osd            safe_timer          mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103419 ceph-osd            safe_timer          mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103421 ceph-osd            safe_timer          mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103427 ceph-osd            safe_timer          mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103663 ceph-osd            fn_anonymous        mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103675 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103677 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103678 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103679 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103680 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103681 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103682 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103683 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103684 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103685 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 103955 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 104621 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 104826 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 104827 ceph-osd            -                   mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 104828 ceph-osd            wb_throttle         mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 104829 ceph-osd            filestore_sync      mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 104830 ceph-osd            journal_write       mi_switch+0xe2 > sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 dofilewrite+0x93 > sys_writev+0x6e amd64_syscall+0x364 fast_syscall_common+0x101 >  3532 104831 ceph-osd            fn_jrn_objstore     mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 104832 ceph-osd            tp_fstore_op        mi_switch+0xe2 > thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f >  3532 104833 ceph-osd            tp_fstore_op        mi_switch+0xe2 > sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 vn_open_cred+0xc8 > zfs_setextattr+0x216 VOP_SETEXTATTR_APV+0x7c extattr_set_vp+0x11d > sys_extattr_set_fd+0xee amd64_syscall+0x364 fast_syscall_common+0x101 This is an example of the cause for your problem. The thread is executing some ZFS code, zfs_setextattr() VOP probably to do something with the ext attrs. There, it recurses into VFS to open a file, and vn_open_cred() waits for buffer space pressure because it is assumed the vn_open_cred() is called from top level, not from inside VFS/fs code. Until this thread finished its operation and safely returned back to kernel/user boundary, the process cannot exit. There are two problems. One is this call to bwillwrite(), and it is easy to get rid of it, see the patch at the end of the message. But I wonder why do you have so many dirty buffers and why it does not resolve itself. Note that ZFS does not use buffer cache, you must have some other very active fs, using buffer cache, that is somehow blocked on writes. diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c index ebcc0ad92e0..ae37dd1fba1 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c @@ -5490,7 +5490,7 @@ vop_getextattr { flags = FREAD; NDINIT_ATVP(&nd, LOOKUP, NOFOLLOW, UIO_SYSSPACE, attrname, xvp, td); - error = vn_open_cred(&nd, &flags, 0, 0, ap->a_cred, NULL); + error = vn_open_cred(&nd, &flags, VN_OPEN_INVFS, 0, ap->a_cred, NULL); vp = nd.ni_vp; NDFREE(&nd, NDF_ONLY_PNBUF); if (error != 0) { @@ -5627,7 +5627,8 @@ vop_setextattr { flags = FFLAGS(O_WRONLY | O_CREAT); NDINIT_ATVP(&nd, LOOKUP, NOFOLLOW, UIO_SYSSPACE, attrname, xvp, td); - error = vn_open_cred(&nd, &flags, 0600, 0, ap->a_cred, NULL); + error = vn_open_cred(&nd, &flags, 0600, VN_OPEN_INVFS, ap->a_cred, + NULL); vp = nd.ni_vp; NDFREE(&nd, NDF_ONLY_PNBUF); if (error != 0) { diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c index a0c018deb32..c69010dd999 100644 --- a/sys/kern/vfs_vnops.c +++ b/sys/kern/vfs_vnops.c @@ -219,7 +219,8 @@ vn_open_cred(struct nameidata *ndp, int *flagp, int cmode, u_int vn_open_flags, ndp->ni_cnd.cn_flags |= AUDITVNODE1; if (vn_open_flags & VN_OPEN_NOCAPCHECK) ndp->ni_cnd.cn_flags |= NOCAPCHECK; - bwillwrite(); + if ((vn_open_flags & VN_OPEN_INVFS) == 0) + bwillwrite(); if ((error = namei(ndp)) != 0) return (error); if (ndp->ni_vp == NULL) { diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h index 8472bc0fb7b..27dbcbc58b1 100644 --- a/sys/sys/vnode.h +++ b/sys/sys/vnode.h @@ -579,6 +579,7 @@ typedef void vop_getpages_iodone_t(void *, vm_page_t *, int, int); #define VN_OPEN_NOAUDIT 0x00000001 #define VN_OPEN_NOCAPCHECK 0x00000002 #define VN_OPEN_NAMECACHE 0x00000004 +#define VN_OPEN_INVFS 0x00000008 /* * Public vnode manipulation functions.