From owner-freebsd-net@FreeBSD.ORG Sat Feb 20 11:30:08 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4FD5D1065670 for ; Sat, 20 Feb 2010 11:30:08 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.cksoft.de (mail.cksoft.de [IPv6:2001:4068:10::3]) by mx1.freebsd.org (Postfix) with ESMTP id DA4358FC08 for ; Sat, 20 Feb 2010 11:30:07 +0000 (UTC) Received: from localhost (amavis.fra.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id E287141C75B; Sat, 20 Feb 2010 12:30:06 +0100 (CET) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([192.168.74.103]) by localhost (amavis.fra.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id WL9vcMXADnz0; Sat, 20 Feb 2010 12:30:06 +0100 (CET) Received: by mail.cksoft.de (Postfix, from userid 66) id 1605541C712; Sat, 20 Feb 2010 12:30:06 +0100 (CET) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id 18E904448EC; Sat, 20 Feb 2010 11:27:22 +0000 (UTC) Date: Sat, 20 Feb 2010 11:27:21 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: Mikolaj Golub In-Reply-To: <86bpflqr5b.fsf@zhuzha.ua1> Message-ID: <20100220112639.L27327@maildrop.int.zabbadoz.net> References: <20100217132632.GA756@crete.org.ua> <4B7D5D95.20007@gmx.com> <86bpflqr5b.fsf@zhuzha.ua1> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: mpd has hung X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Feb 2010 11:30:08 -0000 On Fri, 19 Feb 2010, Mikolaj Golub wrote: > On Thu, 18 Feb 2010 17:32:37 +0200 Nikos Vassiliadis wrote: > >> On 2/17/2010 3:26 PM, Alexander Shikoff wrote: >>> Hello All, >>> >>> I have mpd 5.3 running on 8.0-RC1 as PPPoE server (now only 5 clients). >>> Today mpd process hung and I cannot kill it with -9 signal, and I cannot >>> access it's console via telnet. >>> >>> State of process in `top` output is STOP: >>> 73551 root 2 44 0 29588K 5692K STOP 6 0:32 0.00% mpd5 >>> >>> # procstat -kk 73551 >>> PID TID COMM TDNAME KSTACK >>> 73551 100233 mpd5 - mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 flowtable_flush+0x51 if_detach+0x2f2 ng_iface_shutdown+0x1e ng_rmnode+0x167 ng_apply_item+0xef7 ng_snd_item+0x2ce ngc_send+0x1d2 sosend_generic+0x3f6 kern_sendit+0x13d sendit+0xdc sendto+0x4d syscall+0x1da Xfast_syscall+0xe1 >>> 73551 100502 mpd5 - mi_switch+0x16f thread_suspend_switch+0xc6 thread_single+0x1b6 exit1+0x72 sigexit+0x7c postsig+0x306 ast+0x279 doreti_ast+0x1f >>> >>> Is there a way to stop a process without rebooting a whole system? >>> Thanks in advance! >>> >>> P.S. I'm ready for experiments with it before tonight, but I cannot >>> force system to crash in order to get crash dump right now. >>> >> >> It's probably too late now, but are you sure that nobody pressed >> CTLR-Z while in the mpd console??? >> >> CTLR-Z will send SIGSTOP to the process and the process will >> stop. While stopped, all processing stops(including receiving >> SIGKILL, you cannot kill it, and the signals are queued). You >> have to send SIGCONT for the process to continue. > > We were discussing this problem with Alexander in another (Russian/Ukrainian > speaking) maillist. And it looks like the problem is the following. > > mpd5 thread was detaching ng interface and when doing flowtable_flush() it > slept in cv_wait waiting for flowclean_cycles variable to be updated. It > should have been awaken by flowcleaner thread but this thread got stuck in > endless loop, supposedly in flowtable_clean_vnet()/flowtable_free_stale(), I > think because of inconsistent state of some lists (iface?) due to if_detach > being in progress. I have patches that are out for review. -- Bjoern A. Zeeb It will not break if you know what you are doing.