From owner-freebsd-hackers@freebsd.org Wed Nov 27 15:11:47 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1FA0A1AEB1E for ; Wed, 27 Nov 2019 15:11:47 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (smtp.digiware.nl [IPv6:2001:4cb8:90:ffff::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 47NPPY3Qnjz4B6m for ; Wed, 27 Nov 2019 15:11:45 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from router.digiware.nl (localhost.digiware.nl [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id 565A2C3169; Wed, 27 Nov 2019 16:11:36 +0100 (CET) X-Virus-Scanned: amavisd-new at digiware.com Received: from smtp.digiware.nl ([127.0.0.1]) by router.digiware.nl (router.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Y4FUXNDFDT-F; Wed, 27 Nov 2019 16:11:35 +0100 (CET) Received: from [192.168.10.67] (opteron [192.168.10.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.digiware.nl (Postfix) with ESMTPSA id B1BA7C3168 for ; Wed, 27 Nov 2019 16:11:35 +0100 (CET) From: Willem Jan Withagen Subject: Process in T state does not want to die..... To: FreeBSD Hackers Message-ID: <966f830c-bf09-3683-90da-e70aa343cc16@digiware.nl> Date: Wed, 27 Nov 2019 16:11:33 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 47NPPY3Qnjz4B6m X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of wjw@digiware.nl designates 2001:4cb8:90:ffff::3 as permitted sender) smtp.mailfrom=wjw@digiware.nl X-Spamd-Result: default: False [-5.26 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; DMARC_NA(0.00)[digiware.nl]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCVD_COUNT_THREE(0.00)[4]; RCVD_TLS_LAST(0.00)[]; TO_DN_ALL(0.00)[]; IP_SCORE(-2.96)[ip: (-9.52), ipnet: 2001:4cb8::/29(-4.61), asn: 28878(-0.71), country: NL(0.02)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:28878, ipnet:2001:4cb8::/29, country:NL]; MID_RHS_MATCH_FROM(0.00)[] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2019 15:11:47 -0000 Hi, Probably a "dumb" question, but still I wondering what is going on... I have this ceph server running several OSDs (ceph-osd), now when they do not get certain responses within a time limit, they commit suicide. That is a rather convoluted process where they - call abort() - which is then trapped the ABORT signal handler Try to dump the logging state Try to dump stacktrace - either call _exit() or call reraise_fatal - reraise_fatal does some logging and calls exit(1) And then the process ends up as: root 3433 0.0 4.2 699944 353716 - TsJ 11Nov19 38:10.17 ceph-osd -i 2 Where the I state make it Terminated and no more processing is consumed. But the process one way or another is not going away and keeps resources locked that prevents starting a new daemon. It stays in that state for a 1) few minutes, and then it is gone from the processtable. 2) forever (>24h) But why doesn't the process die (right away)? Killing it -9 does not help. Trying to attach gdb brings nothing. If it disappears from the processtable, somethings there is a core. Do how do I debug this? --WjW