From owner-freebsd-threads@FreeBSD.ORG Mon Sep 29 21:43:25 2008 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3FA701065691 for ; Mon, 29 Sep 2008 21:43:25 +0000 (UTC) (envelope-from dchhetri@panasas.com) Received: from laguna.int.panasas.com (gw-ca.panasas.com [66.104.249.162]) by mx1.freebsd.org (Postfix) with ESMTP id A53008FC15 for ; Mon, 29 Sep 2008 21:43:24 +0000 (UTC) (envelope-from dchhetri@panasas.com) Received: from [172.17.132.94] ([172.17.132.94]) by laguna.int.panasas.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 29 Sep 2008 14:42:07 -0700 Message-ID: <48E14BF0.1050108@panasas.com> Date: Mon, 29 Sep 2008 14:43:12 -0700 From: Dilip Chhetri User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Tijl Coosemans References: <48DD32D2.2060304@panasas.com> <200809262331.29353.tijl@ulyssis.org> <48E10978.2090907@panasas.com> <200809292208.29315.tijl@ulyssis.org> In-Reply-To: <200809292208.29315.tijl@ulyssis.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 29 Sep 2008 21:42:07.0947 (UTC) FILETIME=[38B561B0:01C9227C] Cc: freebsd-threads@freebsd.org Subject: Re: getting stack trace for other thread on the same process : libthr X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Sep 2008 21:43:25 -0000 Tijl Coosemans wrote: > On Monday 29 September 2008 18:59:36 Dilip Chhetri wrote: > >>Tijl Coosemans wrote: >> >>>On Friday 26 September 2008 21:06:58 Dilip Chhetri wrote: >>> >>> >>>>Question >>>>-------- >>>> My program is linked with libthr in FreeBSD-7.0. The program has >>>>in the order of 20 threads, and a designated monitoring thread at >>>>some point wants to know what are other/stuck threads doing. This >>>>needs to be done by printing stack backtrace for the thread to >>>>stdout. >>>> >>>> I understand pthread_t structure has pointer to the target >>>>thread's stack, but to get the trace I need to know value of >>>>stack-pointer register and base-pointer register. I looked at the >>>>code and I don't find any mechanism by which I could read the >>>>target threads register context (because it all resides within >>>>kernel thread structure). Further code study reveals that >>>>kernel_thread->td_frame contains the register context for a thread, >>>>but is valid only when the thread is executing/sleeping inside the >>>>kernel. >>>> >>>> Is there anything I'm missing here ? Is there an easy way to >>>>traverse stack for some thread with in the same process. >>>> >>>> I considered/considering following approaches, >>>>a) use PTRACE >>>> ruled out, because you can't trace the process from within the >>>> same process >>>> >>>>b) somehow temporarily stop the target-thread and read td_frame by >>>> traversing kernel data structure through /dev/kmem. After doing >>>> stack traversal resume the target thread. >>>> >>>> >>>>Detailed problem background >>>>-------------------------- >>>> We have this process X with ~20 threads, each processing some >>>>requests. One of them is designated as monitoring/dispatcher >>>>thread. When a new request arrives, dispatcher thread tries to >>>>queue the task to idle thread. But if all threads are busy >>>>processing requests, the dispatcher thread is supposed to print the >>>>stack back trace for each of the busy thread. This is our >>>>*debugging* mechanism to find potential fault-points. >>>> >>>> In FreeBSD-4.6.2, we hacked libc_r:pthread_t to achieve our goal. >>>>But in FreeBSD-7.0, we decided to use libthr and hack doesn't seem >>>>to be easy. >>>> >>>>Target setup >>>>------------ >>>> * SMP : around 8 CPU >>>> * process : it's going to be run as root and have around ~20 >>>> threads >>> >>>You could try registering a signal handler for SIGUSR1 that prints a >>>stack backtrace using the stack pointer in the sigcontext and then >>>call pthread_kill(SIGUSR1) on whichever thread you want a backtrace >>>of. >> >>Thanks, but as I mentioned it's a network based program and it may be >>sleeping/stuck in syscall for some packets, in this case pthread_kill >>will not work because signals are delivered only when you return from >>syscall (that's what I haved learned from old UNIX books in my >>college). > > > Those kind of syscalls are usually interruptable though. Depending on > the SA_RESTART flag they are then either aborted and return EINTR or > restarted (or return partial success). See the sigaction(2) manpage. thanks. I will give that a try, maybe it will work 90% of the time for us. Thats much better than having nothing or something that is too complicated to implement. Thanks once again.