Date: Tue, 16 Jun 2009 19:03:34 +0400 From: pluknet <pluknet@gmail.com> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: 6.2 sporadically locks up Message-ID: <a31046fc0906160803n284604bcs741e6b038079ed12@mail.gmail.com> In-Reply-To: <200906160830.29721.jhb@freebsd.org> References: <a31046fc0906160323s3e4ec60bxb585bb29f9f3a02a@mail.gmail.com> <200906160830.29721.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
2009/6/16 John Baldwin <jhb@freebsd.org>: > On Tuesday 16 June 2009 6:23:47 am pluknet wrote: >> Hi all. >> >> This is one of livelocks we have on a weekly basis. >> Yes, we do still use ULE scheduler on 6.2 and not moved to 7 yet. >> Any thought? >> >> db> ps >> pid ppid pgrp uid state wmesg wchan cmd >> 70304 69700 69670 0 R sh >> 70303 70292 93818 3572 RL CPU 2 chrsh >> 70302 70294 93818 3572 R crond >> 70299 93818 93818 0 R CPU 1 crond >> 70298 93818 93818 0 R crond >> 70294 93818 93818 3572 S piperd 0xd1d8d330 crond >> 70292 93818 93818 3572 R crond >> 70284 70279 70040 10229 S biord 0xdbe2e4e8 perl5.8.8 >> 70283 70278 93818 10229 SL biord 0xdbd70710 exim-4.63-0 >> 70279 70040 70040 10229 S wait 0xc9005860 sh >> 70278 69996 93818 10229 S wait 0xcaf4ac90 sh >> 70191 4680 4680 9738 S select 0xc0a12944 httpd >> 70190 4796 4796 10008 R httpd >> 70188 5043 5043 30532 RL httpd >> 70043 69999 70043 3572 Ss select 0xc0a12944 wget >> 70042 70000 70042 3572 Ss select 0xc0a12944 wget >> 70041 70001 70041 3572 Ss select 0xc0a12944 wget >> 70040 69996 70040 10229 Ss piperd 0xca35e990 perl5.8.8 >> 70039 70002 70039 3572 Ss select 0xc0a12944 wget > > This is not a full listing so one cannot assume it is a deadlock. Ok, usually that listing doesn't show anything interesting in this sort of lockup. I'll share a full ps output next time (sure, rather soon). > >> db> show lockchain Giant >> thread -3420549 (pid 434, ) ??? (0xc099cb0c) > > You would use 'show lock' or perhaps 'show turnstile' with specific lock > variables. 'show lockchain' needs a TID or PID. Ok. As for turnstile, it showed nothing at all, hence omitted. > >> db> show allpcpu >> cpuid = 0 >> curthread = 0xc7cfec80: pid 18 "swi4: clock sio" >> >> cpuid = 1 >> curthread = 0xc99f9960: pid 70299 "crond" >> >> cpuid = 2 >> curthread = 0xc99f9af0: pid 70303 "chrsh" >> >> cpuid = 3 >> curthread = 0xd087d320: pid 69700 "sh" >> >> cpuid = 4 >> curthread = 0xc98f84b0: pid 69604 "httpd" >> >> cpuid = 5 >> curthread = 0xcaebe190: pid 69598 "httpd" >> >> cpuid = 6 >> curthread = 0xc7cfe960: pid 27 "irq17: bce1 aacu0" >> >> cpuid = 7 >> curthread = 0xc837fe10: pid 69711 "arcconf" > > This is far more useful output than the truncated 'ps'. From this, all of the > CPUs are busy (in at least some deadlocks, all the CPUs would be idle > instead). There are several deadlocks fixed since 6.2 that I am aware of, > but this doesn't look like any of those. I'm not sure why you aren't getting > useful stack traces of running threads. I'll do next time. I thought it would be similar to bt PID output and simply didn't include. As for allpcpu, I often see the picture, when one CPU runs the "irq17: bce1 aacu0" thread and another one runs arcconf. I wonder if that might be a source of bad locking or races, or.. The arcconf utility uses ioctl that goes into aac/aacu(4) internals. > Perhaps DDB in 6.2 doesn't know to > look in stoppcbs[]. Hmm, looks like 6.2 only does that if you are using > KDB_STOP_NMI. Are you using that kernel option? If not, you probably want > to. No, I'm not. Will that add an additional visible overhead on a running system? > > -- > John Baldwin > Thank you. -- wbr, pluknet
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a31046fc0906160803n284604bcs741e6b038079ed12>
