Date: Sun, 06 Dec 2009 22:55:04 +0200 From: Andriy Gapon <avg@icyb.net.ua> To: Attilio Rao <attilio@freebsd.org> Cc: freebsd-current@freebsd.org Subject: Re: process stuck in stat/../cache_lookup: ktorrent, zfs Message-ID: <4B1C1A28.6030909@icyb.net.ua> In-Reply-To: <3bbf2fe10912061104j53ef5be2yb1019699308b0473@mail.gmail.com> References: <4B1B9600.4080709@icyb.net.ua> <4B1BBEC4.7040906@icyb.net.ua> <3bbf2fe10912061104j53ef5be2yb1019699308b0473@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
on 06/12/2009 21:04 Attilio Rao said the following: > 2009/12/6 Andriy Gapon <avg@icyb.net.ua>: >> on 06/12/2009 13:31 Andriy Gapon said the following: >>> System is recent 9-current, amd64. >>> I see that sometimes ktorrent gets stuck during heavy download (multiple files >>> in parallel, high speed). It is completely unresponsive and not killable even >>> with SIGKILL. >> [snip] >>> #0 sched_switch (td=0xffffff012a6c5700, newtd=0xffffff0001533380, >>> flags=Variable "flags" is not available. >>> ) at /usr/src/sys/kern/sched_ule.c:1865 >>> #1 0xffffffff80374baf in mi_switch (flags=260, newtd=0x0) at >>> /usr/src/sys/kern/kern_synch.c:449 >>> #2 0xffffffff803a795b in sleepq_switch (wchan=Variable "wchan" is not available. >>> ) at /usr/src/sys/kern/subr_sleepqueue.c:509 >>> #3 0xffffffff803a8645 in sleepq_wait (wchan=0xffffff0105b457f8, pri=80) at >>> /usr/src/sys/kern/subr_sleepqueue.c:588 >>> #4 0xffffffff80351184 in __lockmgr_args (lk=0xffffff0105b457f8, flags=2097408, >>> ilk=0xffffff0105b45820, wmesg=Variable "wmesg" is not available. >>> ) at /usr/src/sys/kern/kern_lock.c:216 >> So some more data: >> (kgdb) fr 4 >> >> #4 0xffffffff80351184 in __lockmgr_args (lk=0xffffff0105b457f8, flags=2097408, >> ilk=0xffffff0105b45820, wmesg=Variable "wmesg" is not available. >> ) at /usr/src/sys/kern/kern_lock.c:216 >> 216 sleepq_wait(&lk->lock_object, pri); >> (kgdb) p *lk >> $8 = {lock_object = {lo_name = 0xffffffff80ad55b6 "zfs", lo_flags = 91947008, >> lo_data = 0, lo_witness = 0x0}, lk_lock = 3, lk_timo = 51, lk_pri = 80} >> (kgdb) p/x flags >> $9 = 0x200100 >> (kgdb) p/x lk->lock_object.lo_flags >> $12 = 0x57b0000 >> >> Apparently sleeplk is inlined into __lockmgr_args. >> >> So it looks like this is a LK_SHARED|LK_INTERLOCK lockmgr call which has not >> taken any easy path and ended up in sleepq_wait, but wakeup never comes for it, >> perhaps missed? > > I think that a 'missed wakeup' is a too fast (and wrong) conclusion. > here the problem is that the lock is held in shared mode (lk->lk_lock > = 3) so you would need to know what happened to the owners once they > got the lock. > The only way you can do that, though, is with shared acquisitions, > then you should try to reproduce it with WITNESS on. > Once you have such datas we could digg further. Attilio, no conclusions on my part so far, just guesses. But what I think that we see is that a shared lock operation made it to sleeplk, and that must mean that the lock was originally exclusively held. It's hard to see how lk_lock could have ended up with both LK_SHARE|LK_SHARED_WAITERS set in this scenario. I will try to reproduce this with WITNESS kernel, but that will have to wait until Tuesday or longer. I do hope that it is reproducible with WITNESS. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B1C1A28.6030909>