Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Jun 2009 08:15:26 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Mel Flynn <mel.flynn+fbsd.hackers@mailing.thruhere.net>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: How best to debug locking/scheduler problems
Message-ID:  <200906170815.26650.jhb@freebsd.org>
In-Reply-To: <200906161501.45411.mel.flynn%2Bfbsd.hackers@mailing.thruhere.net>
References:  <200906151353.06630.mel.flynn%2Bfbsd.hackers@mailing.thruhere.net> <200906161502.42741.jhb@freebsd.org> <200906161501.45411.mel.flynn%2Bfbsd.hackers@mailing.thruhere.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 16 June 2009 7:01:45 pm Mel Flynn wrote:
> On Tuesday 16 June 2009 11:02:42 John Baldwin wrote:
> > On Tuesday 16 June 2009 1:52:23 pm Mel Flynn wrote:
> > > Hi John,
> > >
> > > On Tuesday 16 June 2009 04:19:57 John Baldwin wrote:
> > > > On Monday 15 June 2009 5:53:05 pm Mel Flynn wrote:
> > > > >   PID    TID COMM             TDNAME           KSTACK
> > > > >  4283 100215 kdeinit4         -                mi_switch
> > > > > turnstile_wait _mtx_lock_sleep uipc_peeraddr kern_getpeername
> > > > > getpeername syscall Xint0x80_syscall
> > > > > % ps -ww 4283
> > > > >   PID  TT  STAT      TIME COMMAND
> > > > >  4283  ??  T      0:00.38 kdeinit4: kdeinit4: kio_http http
> > > > > local:/tmp/ksocket-mel/klauncherxJ1635.slave-socket
> > > > > local:/tmp/ksocket- mel/plasmayC1653.slave-socket (kdeinit4)
> > > > >
> > > > > %ls -l /tmp/ksocket-mel/
> > > > >
> > > > > total 2
> > > > > -rw-rw-r--  1 mel  wheel  62 Jun 14 22:55 KSMserver__0
> > > > > srw-------  1 mel  wheel   0 Jun 14 22:55 kdeinit4__0
> > > > > srwxrwxr-x  1 mel  wheel   0 Jun 14 22:55
> > > > > klauncherxJ1635.slave-socket
> > > >
> > > > You can use kgdb and the scripts at www.freebsd.org/~jhb/gdb.  Simply
> > > > run 'kgdb' as root and do 'lcd /folder/with/scripts' and 'source gdb6'.
> > > > You can then do 'lockchain 4283' to find who holds the lock this thread
> > > > is blocked on and what state they are in.
> > >
> > > Looks like a deadlock:
> > >
> > > (kgdb) lockchain 4283
> > >  thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx"
> > >  thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx"
> > >  thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx"
> > >  thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx"
> > >  thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx"
> > >  thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx"
> > >  thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx"
> > >  thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx"
> > >  thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx"
> > >  thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx"
> > >  thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx"
> > >  thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx"
> > >  thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx"
> > >  thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx"
> > >  thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx"
> > >  thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx"
> > >  thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx"
> > >  thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx"
> > >  thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx"
> > >  thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx"
> > >  DEADLOCK
> > >
> > > Looking through the scripts now to see how I can get more info on the
> > > call chain and hoping I don't panic the machine ;). It is quite random to
> > > reproduce.
> >
> > In kgdb you can simply do 'tid 100122' followed by 'bt' and 'tid 100215'
> > followed by 'bt'.
> 
> Cool, thanks for helping John. Of course it pretty much shows me what procstat 
> -k shows and can't get any info on the userland part, but I can fully inspect 
> the locks and threads.
> 
> Both threads are in TDS_INHIBITED state, and blocked on:
> (kgdb) frame 0
> #0  sched_switch (td=0xc5971240, newtd=0xc4d39900, flags=259)
>     at /usr/src/sys/kern/sched_ule.c:1864
> 1864                    cpuid = PCPU_GET(cpuid);

That doesn't really tell us anything except that it isn't running.  We know
it is actually blocked on a lock, and we need the full stack trace to see
where the two threads were trying to acquire the locks, hence 'bt'.  '
procstat -k' output would be fine, too.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200906170815.26650.jhb>