From owner-freebsd-hackers@FreeBSD.ORG Tue Jun 16 13:40:41 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C1B731065670 for ; Tue, 16 Jun 2009 13:40:41 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 932A58FC0A for ; Tue, 16 Jun 2009 13:40:41 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 487B546B92; Tue, 16 Jun 2009 09:40:41 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 25E628A074; Tue, 16 Jun 2009 09:40:40 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Tue, 16 Jun 2009 08:19:57 -0400 User-Agent: KMail/1.9.7 References: <200906151353.06630.mel.flynn+fbsd.hackers@mailing.thruhere.net> In-Reply-To: <200906151353.06630.mel.flynn+fbsd.hackers@mailing.thruhere.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200906160819.57658.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Tue, 16 Jun 2009 09:40:40 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Mel Flynn Subject: Re: How best to debug locking/scheduler problems X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Jun 2009 13:40:42 -0000 On Monday 15 June 2009 5:53:05 pm Mel Flynn wrote: > Hi, > > I'm trying to get to the bottom of a bug with getpeername() and certain kde4 > applications which is probably as low-level as the libthr and the scheduler. > > From browsing various related files in sys/kern it seems KTR is a good bet to > get the information needed, yet it isn't really well supported in userland. > For one, I've got no clue other then logging console output(?) how to retrieve > the lock info or filter it in userland from reading ktr(9) and alq(9). Gdb is > useless as the process doesn't give the information gdb wants and gdb just > hangs in wait. ktrace also does not provide anything as there are no more > syscalls being made, so I'll have to get to the bottom of this by tracing and > filtering. > > Short description of the problem: > a process never gets out of mi_switch and remains locked even init tries to > shut it down. > > % procstat -t 4283 > > PID TID COMM TDNAME CPU PRI STATE WCHAN > 4283 100215 kdeinit4 - 0 128 lock *unp_mtx > % procstat -k 4283 > > PID TID COMM TDNAME KSTACK > 4283 100215 kdeinit4 - mi_switch turnstile_wait > _mtx_lock_sleep uipc_peeraddr kern_getpeername getpeername syscall > Xint0x80_syscall > % ps -ww 4283 > PID TT STAT TIME COMMAND > 4283 ?? T 0:00.38 kdeinit4: kdeinit4: kio_http http > local:/tmp/ksocket-mel/klauncherxJ1635.slave-socket local:/tmp/ksocket- > mel/plasmayC1653.slave-socket (kdeinit4) > > %ls -l /tmp/ksocket-mel/ > > total 2 > -rw-rw-r-- 1 mel wheel 62 Jun 14 22:55 KSMserver__0 > srw------- 1 mel wheel 0 Jun 14 22:55 kdeinit4__0 > srwxrwxr-x 1 mel wheel 0 Jun 14 22:55 klauncherxJ1635.slave-socket You can use kgdb and the scripts at www.freebsd.org/~jhb/gdb. Simply run 'kgdb' as root and do 'lcd /folder/with/scripts' and 'source gdb6'. You can then do 'lockchain 4283' to find who holds the lock this thread is blocked on and what state they are in. You can use 'show lockchain 4283' in DDB for similar info as well. -- John Baldwin