From owner-freebsd-net@FreeBSD.ORG Sat Feb 20 12:05:07 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BD711106566B for ; Sat, 20 Feb 2010 12:05:07 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.cksoft.de (mail.cksoft.de [IPv6:2001:4068:10::3]) by mx1.freebsd.org (Postfix) with ESMTP id 6D4D58FC14 for ; Sat, 20 Feb 2010 12:05:07 +0000 (UTC) Received: from localhost (amavis.fra.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id CCF0A41C6DB; Sat, 20 Feb 2010 13:05:06 +0100 (CET) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([192.168.74.103]) by localhost (amavis.fra.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id hikAxLY4Iuws; Sat, 20 Feb 2010 13:05:06 +0100 (CET) Received: by mail.cksoft.de (Postfix, from userid 66) id 21A9541C6B4; Sat, 20 Feb 2010 13:05:06 +0100 (CET) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id 1C1464448EC; Sat, 20 Feb 2010 12:04:36 +0000 (UTC) Date: Sat, 20 Feb 2010 12:04:35 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: Mikolaj Golub In-Reply-To: <20100220112639.L27327@maildrop.int.zabbadoz.net> Message-ID: <20100220115850.T27327@maildrop.int.zabbadoz.net> References: <20100217132632.GA756@crete.org.ua> <4B7D5D95.20007@gmx.com> <86bpflqr5b.fsf@zhuzha.ua1> <20100220112639.L27327@maildrop.int.zabbadoz.net> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: mpd has hung X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Feb 2010 12:05:07 -0000 On Sat, 20 Feb 2010, Bjoern A. Zeeb wrote: > On Fri, 19 Feb 2010, Mikolaj Golub wrote: > >> On Thu, 18 Feb 2010 17:32:37 +0200 Nikos Vassiliadis wrote: >> >>> On 2/17/2010 3:26 PM, Alexander Shikoff wrote: >>>> Hello All, >>>> >>>> I have mpd 5.3 running on 8.0-RC1 as PPPoE server (now only 5 clients). >>>> Today mpd process hung and I cannot kill it with -9 signal, and I cannot >>>> access it's console via telnet. >>>> >>>> State of process in `top` output is STOP: >>>> 73551 root 2 44 0 29588K 5692K STOP 6 0:32 0.00% mpd5 >>>> >>>> # procstat -kk 73551 >>>> PID TID COMM TDNAME KSTACK >>>> 73551 100233 mpd5 - mi_switch+0x16f >>>> sleepq_wait+0x42 _cv_wait+0x111 flowtable_flush+0x51 if_detach+0x2f2 >>>> ng_iface_shutdown+0x1e ng_rmnode+0x167 ng_apply_item+0xef7 >>>> ng_snd_item+0x2ce ngc_send+0x1d2 sosend_generic+0x3f6 kern_sendit+0x13d >>>> sendit+0xdc sendto+0x4d syscall+0x1da Xfast_syscall+0xe1 >>>> 73551 100502 mpd5 - mi_switch+0x16f >>>> thread_suspend_switch+0xc6 thread_single+0x1b6 exit1+0x72 sigexit+0x7c >>>> postsig+0x306 ast+0x279 doreti_ast+0x1f >>>> >>>> Is there a way to stop a process without rebooting a whole system? >>>> Thanks in advance! >>>> >>>> P.S. I'm ready for experiments with it before tonight, but I cannot >>>> force system to crash in order to get crash dump right now. >>>> >>> >>> It's probably too late now, but are you sure that nobody pressed >>> CTLR-Z while in the mpd console??? >>> >>> CTLR-Z will send SIGSTOP to the process and the process will >>> stop. While stopped, all processing stops(including receiving >>> SIGKILL, you cannot kill it, and the signals are queued). You >>> have to send SIGCONT for the process to continue. >> >> We were discussing this problem with Alexander in another >> (Russian/Ukrainian >> speaking) maillist. And it looks like the problem is the following. >> >> mpd5 thread was detaching ng interface and when doing flowtable_flush() it >> slept in cv_wait waiting for flowclean_cycles variable to be updated. It >> should have been awaken by flowcleaner thread but this thread got stuck in >> endless loop, supposedly in flowtable_clean_vnet()/flowtable_free_stale(), >> I >> think because of inconsistent state of some lists (iface?) due to if_detach >> being in progress. > > I have patches that are out for review. I am not sure if they apply cleanly as they are broken out of the tail side of a larger patchset. If you are not using VIMAGEs you could ignore the ones I marked with (*). http://people.freebsd.org/~bz/20100216-10-ft-cv.diff http://people.freebsd.org/~bz/20100216-11-ft-debugging.diff http://people.freebsd.org/~bz/20100216-12-ft-cleanup.diff (*) http://people.freebsd.org/~bz/20100216-13-ft-ll-cleanup.diff http://people.freebsd.org/~bz/20100216-18-ft-free.diff (*) If you are still seeing the hang and have DDB support in your kernel, then break into the debugger and save the complete output of ddb> ps before rebooting. Regards, Bjoern -- Bjoern A. Zeeb It will not break if you know what you are doing.