Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Jun 2009 08:30:29 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-stable@freebsd.org
Cc:        pluknet <pluknet@gmail.com>
Subject:   Re: 6.2 sporadically locks up
Message-ID:  <200906160830.29721.jhb@freebsd.org>
In-Reply-To: <a31046fc0906160323s3e4ec60bxb585bb29f9f3a02a@mail.gmail.com>
References:  <a31046fc0906160323s3e4ec60bxb585bb29f9f3a02a@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 16 June 2009 6:23:47 am pluknet wrote:
> Hi all.
> 
> This is one of livelocks we have on a weekly basis.
> Yes, we do still use ULE scheduler on 6.2 and not moved to 7 yet.
> Any thought?
> 
> db> ps
>  pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
> 70304 69700 69670     0  R                           sh
> 70303 70292 93818  3572  RL      CPU 2               chrsh
> 70302 70294 93818  3572  R                           crond
> 70299 93818 93818     0  R       CPU 1               crond
> 70298 93818 93818     0  R                           crond
> 70294 93818 93818  3572  S       piperd   0xd1d8d330 crond
> 70292 93818 93818  3572  R                           crond
> 70284 70279 70040 10229  S       biord    0xdbe2e4e8 perl5.8.8
> 70283 70278 93818 10229  SL      biord    0xdbd70710 exim-4.63-0
> 70279 70040 70040 10229  S       wait     0xc9005860 sh
> 70278 69996 93818 10229  S       wait     0xcaf4ac90 sh
> 70191  4680  4680  9738  S       select   0xc0a12944 httpd
> 70190  4796  4796 10008  R                           httpd
> 70188  5043  5043 30532  RL                          httpd
> 70043 69999 70043  3572  Ss      select   0xc0a12944 wget
> 70042 70000 70042  3572  Ss      select   0xc0a12944 wget
> 70041 70001 70041  3572  Ss      select   0xc0a12944 wget
> 70040 69996 70040 10229  Ss      piperd   0xca35e990 perl5.8.8
> 70039 70002 70039  3572  Ss      select   0xc0a12944 wget

This is not a full listing so one cannot assume it is a deadlock.

> db> show lockchain Giant
> thread -3420549 (pid 434, ) ??? (0xc099cb0c)

You would use 'show lock' or perhaps 'show turnstile' with specific lock 
variables.  'show lockchain' needs a TID or PID.

> db> show allpcpu
> cpuid        = 0
> curthread    = 0xc7cfec80: pid 18 "swi4: clock sio"
> 
> cpuid        = 1
> curthread    = 0xc99f9960: pid 70299 "crond"
> 
> cpuid        = 2
> curthread    = 0xc99f9af0: pid 70303 "chrsh"
> 
> cpuid        = 3
> curthread    = 0xd087d320: pid 69700 "sh"
> 
> cpuid        = 4
> curthread    = 0xc98f84b0: pid 69604 "httpd"
> 
> cpuid        = 5
> curthread    = 0xcaebe190: pid 69598 "httpd"
> 
> cpuid        = 6
> curthread    = 0xc7cfe960: pid 27 "irq17: bce1 aacu0"
> 
> cpuid        = 7
> curthread    = 0xc837fe10: pid 69711 "arcconf"

This is far more useful output than the truncated 'ps'.  From this, all of the 
CPUs are busy (in at least some deadlocks, all the CPUs would be idle 
instead).  There are several deadlocks fixed since 6.2 that I am aware of, 
but this doesn't look like any of those.  I'm not sure why you aren't getting 
useful stack traces of running threads.  Perhaps DDB in 6.2 doesn't know to 
look in stoppcbs[].  Hmm, looks like 6.2 only does that if you are using 
KDB_STOP_NMI.  Are you using that kernel option?  If not, you probably want 
to.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200906160830.29721.jhb>