From owner-freebsd-threads@FreeBSD.ORG Mon Sep 29 20:09:57 2008 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F0031065690 for ; Mon, 29 Sep 2008 20:09:57 +0000 (UTC) (envelope-from tijl@ulyssis.org) Received: from mailrelay011.isp.belgacom.be (mailrelay011.isp.belgacom.be [195.238.6.178]) by mx1.freebsd.org (Postfix) with ESMTP id 2CA088FC25 for ; Mon, 29 Sep 2008 20:09:56 +0000 (UTC) (envelope-from tijl@ulyssis.org) X-Belgacom-Dynamic: yes X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AuYEADLT4EhR9S8L/2dsb2JhbACBYrowgWc Received: from 11.47-245-81.adsl-dyn.isp.belgacom.be (HELO kalimero.kotnet.org) ([81.245.47.11]) by relay.skynet.be with ESMTP; 29 Sep 2008 22:09:55 +0200 Received: from kalimero.kotnet.org (kalimero.kotnet.org [127.0.0.1]) by kalimero.kotnet.org (8.14.3/8.14.3) with ESMTP id m8TK8UnN023377; Mon, 29 Sep 2008 22:08:30 +0200 (CEST) (envelope-from tijl@ulyssis.org) From: Tijl Coosemans To: Dilip Chhetri Date: Mon, 29 Sep 2008 22:08:27 +0200 User-Agent: KMail/1.9.10 References: <48DD32D2.2060304@panasas.com> <200809262331.29353.tijl@ulyssis.org> <48E10978.2090907@panasas.com> In-Reply-To: <48E10978.2090907@panasas.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200809292208.29315.tijl@ulyssis.org> Cc: freebsd-threads@freebsd.org Subject: Re: getting stack trace for other thread on the same process : libthr X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Sep 2008 20:09:57 -0000 On Monday 29 September 2008 18:59:36 Dilip Chhetri wrote: > Tijl Coosemans wrote: >> On Friday 26 September 2008 21:06:58 Dilip Chhetri wrote: >> >>> Question >>> -------- >>> My program is linked with libthr in FreeBSD-7.0. The program has >>> in the order of 20 threads, and a designated monitoring thread at >>> some point wants to know what are other/stuck threads doing. This >>> needs to be done by printing stack backtrace for the thread to >>> stdout. >>> >>> I understand pthread_t structure has pointer to the target >>> thread's stack, but to get the trace I need to know value of >>> stack-pointer register and base-pointer register. I looked at the >>> code and I don't find any mechanism by which I could read the >>> target threads register context (because it all resides within >>> kernel thread structure). Further code study reveals that >>> kernel_thread->td_frame contains the register context for a thread, >>> but is valid only when the thread is executing/sleeping inside the >>> kernel. >>> >>> Is there anything I'm missing here ? Is there an easy way to >>> traverse stack for some thread with in the same process. >>> >>> I considered/considering following approaches, >>> a) use PTRACE >>> ruled out, because you can't trace the process from within the >>> same process >>> >>> b) somehow temporarily stop the target-thread and read td_frame by >>> traversing kernel data structure through /dev/kmem. After doing >>> stack traversal resume the target thread. >>> >>> >>> Detailed problem background >>> -------------------------- >>> We have this process X with ~20 threads, each processing some >>> requests. One of them is designated as monitoring/dispatcher >>> thread. When a new request arrives, dispatcher thread tries to >>> queue the task to idle thread. But if all threads are busy >>> processing requests, the dispatcher thread is supposed to print the >>> stack back trace for each of the busy thread. This is our >>> *debugging* mechanism to find potential fault-points. >>> >>> In FreeBSD-4.6.2, we hacked libc_r:pthread_t to achieve our goal. >>> But in FreeBSD-7.0, we decided to use libthr and hack doesn't seem >>> to be easy. >>> >>> Target setup >>> ------------ >>> * SMP : around 8 CPU >>> * process : it's going to be run as root and have around ~20 >>> threads >> >> You could try registering a signal handler for SIGUSR1 that prints a >> stack backtrace using the stack pointer in the sigcontext and then >> call pthread_kill(SIGUSR1) on whichever thread you want a backtrace >> of. > > Thanks, but as I mentioned it's a network based program and it may be > sleeping/stuck in syscall for some packets, in this case pthread_kill > will not work because signals are delivered only when you return from > syscall (that's what I haved learned from old UNIX books in my > college). Those kind of syscalls are usually interruptable though. Depending on the SA_RESTART flag they are then either aborted and return EINTR or restarted (or return partial success). See the sigaction(2) manpage.