Date: Fri, 27 Oct 2017 13:37:29 +0200 From: Norbert Koch <nkoch@demig.de> To: <kostikbel@gmail.com> Cc: <freebsd-hackers@freebsd.org> Subject: Re: crerating coredump of multithreaded process Message-ID: <95ad25da-dc53-1c6a-030b-71cf9021a75b@demig.de> In-Reply-To: <20171027093311.GF2566@kib.kiev.ua> References: <e455d19c-72ac-3501-8764-415c4d154c74@demig.de> <20171027093311.GF2566@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Ok, thank you for your explanation. ***************************************** * demig Prozessautomatisierung GmbH * * * * Anschrift: Haardtstrasse 40 * * D-57076 Siegen * * Registergericht: Siegen HRB 2819 * * Geschaeftsfuehrer: Joachim Herbst, * * Winfried Held * * Telefon: +49 271 772020 * * Telefax: +49 271 74704 * * E-Mail: info@demig.de * * http://www.demig.de * ***************************************** Am 2017-10-27 um 11:33 schrieb Konstantin Belousov: > On Fri, Oct 27, 2017 at 10:44:41AM +0200, Norbert Koch wrote: >> Hello. >> >> When trying to create the coredump of a running >> process (without killing it) under FreeBSD 10.3 >> I am seeing a somewhat strange behaviour. > Try this on HEAD or stable/11. There were a lot of changes and bugfixes > in ptrace(2). > > I do not claim that the behaviour you see has changed, but 10.3 is too > diverged from the code where developers would be willing to look at. >> As I want to see the state of all threads, the q&d way >> of fork() + SIGABRT does not work for me. >> >> So, what I do is having a supervisor program waiting for SIGUSR1. >> When my application signals the wish to be coredumped >> it sends SIGSTOP to itself immediately after sending SIGUSR1. >> The supervisor then forks gcore. >> >> From what I can see using top, my application immediately starts >> again as if SIGCONT has been received while gcore hangs in wait. > SIGCONT cannot be blocked, otherwise programs could create unkillable > processes. > >> Gcore calls ptrace(PT_ATTACH) followed by waitpid(). >> So I assume that the ptrace call restarts my application >> and waitpid hangs (why?). >> >> If I manually send SIGCONT to my stopped application >> immediately before exec-ing gcore, the coredump is being >> created, but for obvious reasons not as consistent as >> I want it to be. >> >> I should add that in my application most other signals are >> blocked. Blocking (or not) SIGCONT seems to have no effect. >> >> Am I doing something wrong here? If yes, ist there >> a different/better/more elegant way of creating a consistent coredump? > What is the purpose of sending SIGSTOP to itself ? Practically, it is no > different than the action of ptrace(PT_ATTACH): all threads are parked > at some safe place in the kernel, or are forcibly moved into the kernel > mode by sending IPI if executing in userspace on other cores. To get > into the safe place in kernel, threads often need to execute some more. > IPI delivery is also not guaranteed to occur in the deterministic place > ("at next instruction boundary"), it happens as hardware reacts to it. > As you see, the process is very asynchronous, it cannot guarantee that > the final snapshot is consistent with arbitrary thread state at the > point of request, but it does represent the valid process state assuming > that the thread are executing async. > > More, ptrace(PT_ATTACH) currently operates not only by a mechanism to > similar to SIGSTOP, it really sends SIGSTOP to the debuggee. We do not > track nested SIGSTOPs, process is either stopped or runnable. So I am > not surprised that attaching to stopped process do not occur until the > stopped state established earlier passes away: the debugger waits for > the confirmation from all threads that they are parked at safe place, > but there is no because the threads are already stopped. If threads are > made runnable the acks are sent and the attach completes. > > I am explaining this to point out that trying to send SIGSTOP and > then attaching with ptrace(PT_ATTACH) is just worse than doing > ptrace(PT_ATTACH). I think you need to have supervisor either > directly execute gcore(1) without SIGSTOP, or execute ptrace(PT_ATTACH) > instead of kill(SIGSTOP), and have gcore functionality embedded into the > it. The consistency of the generated core is actually same. --=20 Dipl.-Ing. Norbert Koch Entwicklung Prozessregler
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?95ad25da-dc53-1c6a-030b-71cf9021a75b>