FreeBSD Mail Archives

Date:      Thu, 09 Sep 2004 12:08:26 -0700
From:      Julian Elischer <julian@elischer.org>
To:        Andrew Gallatin <gallatin@cs.duke.edu>
Cc:        freebsd-threads@freebsd.org
Subject:   Re: Unkillable KSE threaded proc
Message-ID:  <4140AA2A.90605@elischer.org>
In-Reply-To: <16704.40876.708925.425911@grasshopper.cs.duke.edu>
References:  <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu>


thanks,
I'm flooded with work for a couple of days..

it looks as if one ofthe threads (0xc1b614b0) has called exit, 
whichmeans it is in thread_single()
waiting for all the other threads to suicide, but at least one of them 
doen't want to..

Two of them (0xc1b61320 and 0xc2b6ce10) are refusing to finish up and exit
because they need the proc lock, which is owned by a fourth one.. 
(0xc1b617d0)

the fourth one has just preempted itself with some other thread 
(3244003328  whatever that is in
hex (0xC15B9000))  do you still have the 'ps'?
what is thread (0xC15B9000)?

the thread that holds teh lock is the first one below..
[skip below for further comments.]



interestingly
Andrew Gallatin wrote:

>Julian Elischer writes:
> > 
> > I think it is
> > 
> > show thread (address)
>
>FWIW, I think db_trace(thread addr, -1) seems to work better.
>When I enter ddb, currproc is init, so show thread
>seems to show garbage.
>
>
>db> ps                                 
>  pid   proc     uarea   uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
>
>   thread 0xc1b617d0 ksegrp 0xc18779a0 [CPU 1]
>   thread 0xc1b614b0 ksegrp 0xc18779a0 [SUSP]
>   thread 0xc1b61320 ksegrp 0xc18779a0 [LOCK process lock c1b13200]
>   thread 0xc2b6ce10 ksegrp 0xc1a270e0 [LOCK process lock c1b13200]
>
>db> call db_trace_thread(0xc1b617d0, -1)
>sched_switch(3249936336,3244003328,3244003328,468695918,1992661338) at sched_switch+216
>mi_switch(2,3244003328,3244003668,3244003328,3867700060) at mi_switch+455
>maybe_preempt(3244003328,252,0,3867700072,3226402603) at maybe_preempt+153
>sched_add(70,3867700092,3226402999,3246881184,3867189248) at sched_add+259
>end() at 3246881184
>0
>  
>

odd that teh stack trace stops there?? that in itself is wierd..
I don't understand why the thread is marked as currently running on 
CPU1. it called sched_switch that should have saved its state
and put it on teh run queue (and marked it as such) so its state should 
be RUNQ.
unless it has got into some infinite loop there, either going in or out 
of the switchout.
it would be interesting to see the actual instruction pointer.. notice 
that preemption is involved...

john may also have an idea..  (CC'd)_



>db> call db_trace_thread(0xc1b614b0, -1)
>sched_switch(3249935536,3249936336,0,2929115342,3959095726) at sched_switch+216
>mi_switch(1,3249936336,0,0,0) at mi_switch+455
>thread_single(1,423437840,7706937,1737258498,3243666960) at thread_single+471
>exit1(3249935536,9,3867675836,3867675876,3226344614) at exit1+277
>expand_name(3249935536,9,256,0,0) at expand_name
>postsig(9,3867675976,2,3243701424,0) at postsig+516
>ast(3867675976) at ast+1508
>doreti_ast() at doreti_ast+23
>0
>
>db> call db_trace_thread(0xc1b61320, -1)
>sched_switch(3249935136,0,0,2147060238,4154263705) at sched_switch+216
>mi_switch(1,0,3249936336,3228346184,0) at mi_switch+455
>turnstile_wait(3249615360,3248629164,3249936336,3248629056,3249935136) at turnstile_wait+825
>_mtx_lock_sleep(3248629164,3249935136,0,0,0) at _mtx_lock_sleep+290
>kse_release(3249935136,3867663636,4,3249935136,3867663676) at kse_release+322
>syscall(47,47,47,134562304,0) at syscall+764
>Xint0x80_syscall() at Xint0x80_syscall+31
>--- syscall (383, FreeBSD ELF32, kse_release), eip = 671759695, esp = 135876488, ebp = 135876548 ---
>0
>
>db> call db_trace_thread(0xc2b6ce10, -1)
>sched_switch(3266760208,0,0,2564282502,2143396982) at sched_switch+216
>mi_switch(1,0,3266760208,3244171108,3228328544) at mi_switch+455
>turnstile_wait(3249615360,3248629164,3249936336,3248629056,3266760208) at turnstile_wait+825
>_mtx_lock_sleep(3248629164,3266760208,0,0,0) at _mtx_lock_sleep+290
>kse_release(3266760208,3901611284,4,3266760208,3901611324) at kse_release+322
>syscall(47,47,3215917103,1,129) at syscall+764
>Xint0x80_syscall() at Xint0x80_syscall+31
>--- syscall (383, FreeBSD ELF32, kse_release), eip = 671759695, esp = 3215978288, ebp = 3215978380 ---
>0
>
>
> > but if yuo can get a coredump it would be best..
> > in ddb do:
> > call doadump
> > 
> > in this case it looks like  thread 0xc1f2aaf0 has called exit() and is 
> > waiting for the others to exit..
> > I wonder if the lock is the answer.. it woul dbe good to follow the link 
> > in the mutex in the proc structure at 0xc1a2d8c0
> > to see which thread OWNS it..
>
>I'm following it from 0xc1a22540 for today's lockup:
>
>(kgdb) p $proc->p_mtx
>$3 = {
>  mtx_object = {
>    lo_class = 0xc069e55c, 
>    lo_name = 0xc067788d "process lock", 
>    lo_type = 0xc067788d "process lock", 
>    lo_flags = 0x430000, 
>    lo_list = {
>      tqe_next = 0x0, 
>      tqe_prev = 0x0
>    }, 
>    lo_witness = 0x0
>  }, 
>  mtx_lock = 0xc1b617d2, 
>  mtx_recurse = 0x0
>}
>
>
>0xc1b617d2 is almost the same as the thread id of the
>first thread (0xc1b617d0)..
>
>I've still got the dump, so if you need more info please let me know.
>
>Drew
>_______________________________________________
>freebsd-threads@freebsd.org mailing list
>http://lists.freebsd.org/mailman/listinfo/freebsd-threads
>To unsubscribe, send any mail to "freebsd-threads-unsubscribe@freebsd.org"
>  
>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4140AA2A.90605>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation