From owner-freebsd-stable@freebsd.org Tue Oct 27 14:02:19 2015 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 27DB2A1EB73 for ; Tue, 27 Oct 2015 14:02:19 +0000 (UTC) (envelope-from zara.kanaeva@ggi.uni-tuebingen.de) Received: from mx08.uni-tuebingen.de (mx08.uni-tuebingen.de [134.2.5.218]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "smtpserv.uni-tuebingen.de", Issuer "Global-UNITUE-CA 01" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C34371E42 for ; Tue, 27 Oct 2015 14:02:18 +0000 (UTC) (envelope-from zara.kanaeva@ggi.uni-tuebingen.de) Received: from webmail1.uni-tuebingen.de (webmail1.uni-tuebingen.de [134.2.5.194]) by mx08.uni-tuebingen.de (8.14.3/8.14.3) with ESMTP id t9RDgipU010733 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 27 Oct 2015 14:42:44 +0100 Received: from webmail1.uni-tuebingen.de (localhost [127.0.0.1]) by webmail1.uni-tuebingen.de (Postfix) with ESMTPS id 9F5B1114CBEE4 for ; Tue, 27 Oct 2015 14:42:42 +0100 (CET) Received: from roceehbwz.rue23.uni-tuebingen.de (roceehbwz.rue23.uni-tuebingen.de [134.2.216.120]) by webmail.uni-tuebingen.de (Horde Framework) with HTTP; Tue, 27 Oct 2015 14:42:42 +0100 Date: Tue, 27 Oct 2015 14:42:42 +0100 Message-ID: <20151027144242.Horde.3Xc1_RqzaVMAZ12X6OPXfdN@webmail.uni-tuebingen.de> From: Zara Kanaeva To: freebsd-stable@freebsd.org Subject: Re: Stuck processes in unkillable (STOP) state, listen queue overflow In-Reply-To: <562F4D98.9060200@fsn.hu> User-Agent: Horde Application Framework 5 Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-purgate-type: clean X-purgate-Ad: Categorized by eleven eXpurgate (R) http://www.eleven.de X-purgate: clean X-purgate: This mail is considered clean (visit http://www.eleven.de for further information) X-purgate-size: 6614 X-purgate-ID: 154962::1445953364-00000C17-64E2A6D4/0/0 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Oct 2015 14:02:19 -0000 Hello, I have the same experience with apache and mapserver. It happens on physical machine and ends with spontaneous reboot. This machine is updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps this machine doesn't have enough RAM (only 8GB), but I think that must not be a reason for a spontaneous reboot. I had no such behavior with the same machine and FREEBSD 9.0 RELEASE on it (I am not 100% sure, I have yet no possibility to test it). Regards, Z. Kanaeva. Zitat von "Nagy, Attila" : > Hi, > > Recently I've started to see a lot of cases, where the log is full > with "listen queue overflow" messages and the process behind the > network socket is unavailable. > When I open a TCP to it, it opens but nothing happens (for example I > get no SMTP banner from postfix, nor I get a log entry about the new > connection). > > I've seen this with Java programs, postfix and redis, basically > everything which opens a TCP and listens on the machine. > > For example, I have a redis process, which listens on 6381. When I > telnet into it, the TCP opens, but the program doesn't respond. > When I kill it, nothing happens. Even with kill -9 yields only this state: > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAN > 776 redis 2 20 0 24112K 2256K STOP 3 16:56 > 0.00% redis- > > When I tcpdrop the connections of the process, tcpdrop reports > success for the first time and failure for the second (No such > process), but the connections remain: > # sockstat -4 | grep 776 > redis redis-serv 776 6 tcp4 *:6381 *:* > redis redis-serv 776 9 tcp4 *:16381 *:* > redis redis-serv 776 10 tcp4 127.0.0.1:16381 127.0.0.1:10460 > redis redis-serv 776 11 tcp4 127.0.0.1:16381 127.0.0.1:35795 > redis redis-serv 776 13 tcp4 127.0.0.1:30027 127.0.0.1:16379 > redis redis-serv 776 14 tcp4 127.0.0.1:58802 127.0.0.1:16384 > redis redis-serv 776 17 tcp4 127.0.0.1:16381 127.0.0.1:24354 > redis redis-serv 776 18 tcp4 127.0.0.1:16381 127.0.0.1:56999 > redis redis-serv 776 19 tcp4 127.0.0.1:16381 127.0.0.1:39488 > redis redis-serv 776 20 tcp4 127.0.0.1:6381 127.0.0.1:39491 > # sockstat -4 | grep 776 | awk '{print "tcpdrop "$6" "$7}' | /bin/sh > tcpdrop: getaddrinfo: * port 6381: hostname nor servname provided, > or not known > tcpdrop: getaddrinfo: * port 16381: hostname nor servname provided, > or not known > tcpdrop: 127.0.0.1 16381 127.0.0.1 10460: No such process > tcpdrop: 127.0.0.1 16381 127.0.0.1 35795: No such process > tcpdrop: 127.0.0.1 30027 127.0.0.1 16379: No such process > tcpdrop: 127.0.0.1 58802 127.0.0.1 16384: No such process > tcpdrop: 127.0.0.1 16381 127.0.0.1 24354: No such process > tcpdrop: 127.0.0.1 16381 127.0.0.1 56999: No such process > tcpdrop: 127.0.0.1 16381 127.0.0.1 39488: No such process > tcpdrop: 127.0.0.1 6381 127.0.0.1 39491: No such process > # sockstat -4 | grep 776 > redis redis-serv 776 6 tcp4 *:6381 *:* > redis redis-serv 776 9 tcp4 *:16381 *:* > redis redis-serv 776 10 tcp4 127.0.0.1:16381 127.0.0.1:10460 > redis redis-serv 776 11 tcp4 127.0.0.1:16381 127.0.0.1:35795 > redis redis-serv 776 13 tcp4 127.0.0.1:30027 127.0.0.1:16379 > redis redis-serv 776 14 tcp4 127.0.0.1:58802 127.0.0.1:16384 > redis redis-serv 776 17 tcp4 127.0.0.1:16381 127.0.0.1:24354 > redis redis-serv 776 18 tcp4 127.0.0.1:16381 127.0.0.1:56999 > redis redis-serv 776 19 tcp4 127.0.0.1:16381 127.0.0.1:39488 > redis redis-serv 776 20 tcp4 127.0.0.1:6381 127.0.0.1:39491 > > $ procstat -k 776 > PID TID COMM TDNAME KSTACK > 776 100725 redis-server - mi_switch > sleepq_timedwait_sig _sleep kern_kevent sys_kevent amd64_syscall > Xfast_syscall > 776 100744 redis-server - mi_switch > thread_suspend_switch thread_single exit1 sigexit postsig ast > doreti_ast > > I can do nothing to get out from this state, only reboot helps. > > The OS is stable/10@r289313, but I could observe this behaviour with > earlier releases too. > > The dmesg is full with lines like these: > sonewconn: pcb 0xfffff8004dc54498: Listen queue overflow: 193 > already in queue awaiting acceptance (3142 occurrences) > sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 > already in queue awaiting acceptance (3068 occurrences) > sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 > already in queue awaiting acceptance (3057 occurrences) > sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 > already in queue awaiting acceptance (3037 occurrences) > sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 > already in queue awaiting acceptance (3015 occurrences) > sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 > already in queue awaiting acceptance (3035 occurrences) > > I guess this is the effect of the process freeze, not the cause (the > listen queue fills up because the app can't handle the incoming > connections). > > I'm not sure it matters, but some of the machines (and the above) > runs on an ESX hypervisor (but as far as I can remember, I could see > this on physical machines too, but I'm not sure about that). > Also -so far- I could only see this where some "exotic" stuff ran, > like a java or erlang based server (opendj, elasticsearch and > rabbitmq). > > Also not sure about which triggers this. I've never seen this after > some hours of uptime, at least some days or a week must've been > passed to get stuck like the above. > > Any ideas about this? > > Thanks, > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- Dipl.-Inf. Zara Kanaeva Heidelberger Akademie der Wissenschaften Forschungsstelle "The role of culture in early expansions of humans" an der Universität Tübingen Geographisches Institut Universität Tübingen Ruemelinstr. 19-23 72070 Tuebingen Tel.: +49-(0)7071-2972132 e-mail: zara.kanaeva@geographie.uni-tuebingen.de ------- - Theory is when you know something but it doesn't work. - Practice is when something works but you don't know why. - Usually we combine theory and practice: Nothing works and we don't know why.