From owner-freebsd-net@FreeBSD.ORG Mon Feb 22 11:34:58 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C5F9F106566C for ; Mon, 22 Feb 2010 11:34:58 +0000 (UTC) (envelope-from minotaur@crete.org.ua) Received: from relay.padonki.org.ua (relay.padonki.org.ua [193.0.227.26]) by mx1.freebsd.org (Postfix) with ESMTP id 803C28FC08 for ; Mon, 22 Feb 2010 11:34:58 +0000 (UTC) Received: from minotaur by relay.padonki.org.ua with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1NjWYw-000PuU-5v; Mon, 22 Feb 2010 13:34:54 +0200 Date: Mon, 22 Feb 2010 13:34:54 +0200 From: Alexander Shikoff To: "Bjoern A. Zeeb" Message-ID: <20100222113454.GA99461@crete.org.ua> References: <20100217132632.GA756@crete.org.ua> <4B7D5D95.20007@gmx.com> <86bpflqr5b.fsf@zhuzha.ua1> <20100220112639.L27327@maildrop.int.zabbadoz.net> <20100220115850.T27327@maildrop.int.zabbadoz.net> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-u Content-Disposition: inline In-Reply-To: <20100220115850.T27327@maildrop.int.zabbadoz.net> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: Alexander Shikoff Cc: freebsd-net@freebsd.org, Mikolaj Golub Subject: Re: mpd has hung X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Alexander Shikoff List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Feb 2010 11:34:58 -0000 On Sat, Feb 20, 2010 at 12:04:35PM +0000, Bjoern A. Zeeb wrote: > On Sat, 20 Feb 2010, Bjoern A. Zeeb wrote: > > > On Fri, 19 Feb 2010, Mikolaj Golub wrote: > > > >> On Thu, 18 Feb 2010 17:32:37 +0200 Nikos Vassiliadis wrote: > >> > >>> On 2/17/2010 3:26 PM, Alexander Shikoff wrote: > >>>> Hello All, > >>>> > >>>> I have mpd 5.3 running on 8.0-RC1 as PPPoE server (now only 5 clients). > >>>> Today mpd process hung and I cannot kill it with -9 signal, and I cannot > >>>> access it's console via telnet. > >>>> > >>>> State of process in `top` output is STOP: > >>>> 73551 root 2 44 0 29588K 5692K STOP 6 0:32 0.00% mpd5 > >>>> > >>>> # procstat -kk 73551 > >>>> PID TID COMM TDNAME KSTACK > >>>> 73551 100233 mpd5 - mi_switch+0x16f > >>>> sleepq_wait+0x42 _cv_wait+0x111 flowtable_flush+0x51 if_detach+0x2f2 > >>>> ng_iface_shutdown+0x1e ng_rmnode+0x167 ng_apply_item+0xef7 > >>>> ng_snd_item+0x2ce ngc_send+0x1d2 sosend_generic+0x3f6 kern_sendit+0x13d > >>>> sendit+0xdc sendto+0x4d syscall+0x1da Xfast_syscall+0xe1 > >>>> 73551 100502 mpd5 - mi_switch+0x16f > >>>> thread_suspend_switch+0xc6 thread_single+0x1b6 exit1+0x72 sigexit+0x7c > >>>> postsig+0x306 ast+0x279 doreti_ast+0x1f > >>>> > >>>> Is there a way to stop a process without rebooting a whole system? > >>>> Thanks in advance! > >>>> > >>>> P.S. I'm ready for experiments with it before tonight, but I cannot > >>>> force system to crash in order to get crash dump right now. > >>>> > >>> > >>> It's probably too late now, but are you sure that nobody pressed > >>> CTLR-Z while in the mpd console??? > >>> > >>> CTLR-Z will send SIGSTOP to the process and the process will > >>> stop. While stopped, all processing stops(including receiving > >>> SIGKILL, you cannot kill it, and the signals are queued). You > >>> have to send SIGCONT for the process to continue. > >> > >> We were discussing this problem with Alexander in another > >> (Russian/Ukrainian > >> speaking) maillist. And it looks like the problem is the following. > >> > >> mpd5 thread was detaching ng interface and when doing flowtable_flush() it > >> slept in cv_wait waiting for flowclean_cycles variable to be updated. It > >> should have been awaken by flowcleaner thread but this thread got stuck in > >> endless loop, supposedly in flowtable_clean_vnet()/flowtable_free_stale(), > >> I > >> think because of inconsistent state of some lists (iface?) due to if_detach > >> being in progress. > > > > I have patches that are out for review. > > I am not sure if they apply cleanly as they are broken out of the tail > side of a larger patchset. > > If you are not using VIMAGEs you could ignore the ones I marked with (*). > > http://people.freebsd.org/~bz/20100216-10-ft-cv.diff > http://people.freebsd.org/~bz/20100216-11-ft-debugging.diff > http://people.freebsd.org/~bz/20100216-12-ft-cleanup.diff (*) > http://people.freebsd.org/~bz/20100216-13-ft-ll-cleanup.diff > http://people.freebsd.org/~bz/20100216-18-ft-free.diff (*) > > If you are still seeing the hang and have DDB support in your kernel, > then break into the debugger and save the complete output of > ddb> ps > before rebooting. I cannot make tests right now because of that box in production. I need some time to remove all traffic from it. -- MINO-RIPE