Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Nov 2019 16:11:33 +0100
From:      Willem Jan Withagen <wjw@digiware.nl>
To:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Process in T state does not want to die.....
Message-ID:  <966f830c-bf09-3683-90da-e70aa343cc16@digiware.nl>

next in thread | raw e-mail | index | archive | help
Hi,

Probably a "dumb" question, but still I wondering what is going on...

I have this ceph server running several OSDs (ceph-osd), now when they
do not get certain responses within a time limit, they commit suicide.

That is a rather convoluted process where they
  - call abort()
  - which is then trapped the ABORT signal handler
    Try to dump the logging state
    Try to dump stacktrace
  - either call _exit()
    or call reraise_fatal
  - reraise_fatal does some logging
    and calls exit(1)

And then the process ends up as:
root 3433 0.0  4.2  699944 353716 - TsJ 11Nov19   38:10.17 ceph-osd -i 2

Where the I state make it Terminated and no more processing is consumed.
But the process one way or another is not going away and keeps resources 
locked that prevents starting a new daemon.

It stays in that state for a
  1) few minutes, and then it is gone from the processtable.
  2) forever (>24h)

But why doesn't the process die (right away)?
Killing it -9 does not help.
Trying to attach gdb brings nothing.

If it disappears from the processtable, somethings there is a core.

Do how do I debug this?

--WjW





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?966f830c-bf09-3683-90da-e70aa343cc16>