Date: Tue, 27 Oct 2015 17:25:01 +0100 From: "Nagy, Attila" <bra@fsn.hu> To: Zara Kanaeva <zara.kanaeva@ggi.uni-tuebingen.de>, freebsd-stable@freebsd.org Subject: Re: Stuck processes in unkillable (STOP) state, listen queue overflow Message-ID: <562FA55D.6050503@fsn.hu> In-Reply-To: <20151027144242.Horde.3Xc1_RqzaVMAZ12X6OPXfdN@webmail.uni-tuebingen.de> References: <20151027144242.Horde.3Xc1_RqzaVMAZ12X6OPXfdN@webmail.uni-tuebingen.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi,
(following topposting)
I have seen this with 16 and 32 GiB of RAM, but anyways, it shouldn't
matter.
Do you use zfs? Although it doesn't seem to be stuck on IO...
On 10/27/15 14:42, Zara Kanaeva wrote:
> Hello,
>
> I have the same experience with apache and mapserver. It happens on
> physical machine and ends with spontaneous reboot. This machine is
> updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps
> this machine doesn't have enough RAM (only 8GB), but I think that must
> not be a reason for a spontaneous reboot.
>
> I had no such behavior with the same machine and FREEBSD 9.0 RELEASE
> on it (I am not 100% sure, I have yet no possibility to test it).
>
> Regards, Z. Kanaeva.
>
> Zitat von "Nagy, Attila" <bra@fsn.hu>:
>
>> Hi,
>>
>> Recently I've started to see a lot of cases, where the log is full
>> with "listen queue overflow" messages and the process behind the
>> network socket is unavailable.
>> When I open a TCP to it, it opens but nothing happens (for example I
>> get no SMTP banner from postfix, nor I get a log entry about the new
>> connection).
>>
>> I've seen this with Java programs, postfix and redis, basically
>> everything which opens a TCP and listens on the machine.
>>
>> For example, I have a redis process, which listens on 6381. When I
>> telnet into it, the TCP opens, but the program doesn't respond.
>> When I kill it, nothing happens. Even with kill -9 yields only this
>> state:
>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME
>> WCPU COMMAN
>> 776 redis 2 20 0 24112K 2256K STOP 3 16:56
>> 0.00% redis-
>>
>> When I tcpdrop the connections of the process, tcpdrop reports
>> success for the first time and failure for the second (No such
>> process), but the connections remain:
>> # sockstat -4 | grep 776
>> redis redis-serv 776 6 tcp4 *:6381 *:*
>> redis redis-serv 776 9 tcp4 *:16381 *:*
>> redis redis-serv 776 10 tcp4 127.0.0.1:16381 127.0.0.1:10460
>> redis redis-serv 776 11 tcp4 127.0.0.1:16381 127.0.0.1:35795
>> redis redis-serv 776 13 tcp4 127.0.0.1:30027 127.0.0.1:16379
>> redis redis-serv 776 14 tcp4 127.0.0.1:58802 127.0.0.1:16384
>> redis redis-serv 776 17 tcp4 127.0.0.1:16381 127.0.0.1:24354
>> redis redis-serv 776 18 tcp4 127.0.0.1:16381 127.0.0.1:56999
>> redis redis-serv 776 19 tcp4 127.0.0.1:16381 127.0.0.1:39488
>> redis redis-serv 776 20 tcp4 127.0.0.1:6381 127.0.0.1:39491
>> # sockstat -4 | grep 776 | awk '{print "tcpdrop "$6" "$7}' | /bin/sh
>> tcpdrop: getaddrinfo: * port 6381: hostname nor servname provided, or
>> not known
>> tcpdrop: getaddrinfo: * port 16381: hostname nor servname provided,
>> or not known
>> tcpdrop: 127.0.0.1 16381 127.0.0.1 10460: No such process
>> tcpdrop: 127.0.0.1 16381 127.0.0.1 35795: No such process
>> tcpdrop: 127.0.0.1 30027 127.0.0.1 16379: No such process
>> tcpdrop: 127.0.0.1 58802 127.0.0.1 16384: No such process
>> tcpdrop: 127.0.0.1 16381 127.0.0.1 24354: No such process
>> tcpdrop: 127.0.0.1 16381 127.0.0.1 56999: No such process
>> tcpdrop: 127.0.0.1 16381 127.0.0.1 39488: No such process
>> tcpdrop: 127.0.0.1 6381 127.0.0.1 39491: No such process
>> # sockstat -4 | grep 776
>> redis redis-serv 776 6 tcp4 *:6381 *:*
>> redis redis-serv 776 9 tcp4 *:16381 *:*
>> redis redis-serv 776 10 tcp4 127.0.0.1:16381 127.0.0.1:10460
>> redis redis-serv 776 11 tcp4 127.0.0.1:16381 127.0.0.1:35795
>> redis redis-serv 776 13 tcp4 127.0.0.1:30027 127.0.0.1:16379
>> redis redis-serv 776 14 tcp4 127.0.0.1:58802 127.0.0.1:16384
>> redis redis-serv 776 17 tcp4 127.0.0.1:16381 127.0.0.1:24354
>> redis redis-serv 776 18 tcp4 127.0.0.1:16381 127.0.0.1:56999
>> redis redis-serv 776 19 tcp4 127.0.0.1:16381 127.0.0.1:39488
>> redis redis-serv 776 20 tcp4 127.0.0.1:6381 127.0.0.1:39491
>>
>> $ procstat -k 776
>> PID TID COMM TDNAME KSTACK
>> 776 100725 redis-server - mi_switch
>> sleepq_timedwait_sig _sleep kern_kevent sys_kevent amd64_syscall
>> Xfast_syscall
>> 776 100744 redis-server - mi_switch
>> thread_suspend_switch thread_single exit1 sigexit postsig ast doreti_ast
>>
>> I can do nothing to get out from this state, only reboot helps.
>>
>> The OS is stable/10@r289313, but I could observe this behaviour with
>> earlier releases too.
>>
>> The dmesg is full with lines like these:
>> sonewconn: pcb 0xfffff8004dc54498: Listen queue overflow: 193 already
>> in queue awaiting acceptance (3142 occurrences)
>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 already
>> in queue awaiting acceptance (3068 occurrences)
>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 already
>> in queue awaiting acceptance (3057 occurrences)
>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 already
>> in queue awaiting acceptance (3037 occurrences)
>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 already
>> in queue awaiting acceptance (3015 occurrences)
>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 already
>> in queue awaiting acceptance (3035 occurrences)
>>
>> I guess this is the effect of the process freeze, not the cause (the
>> listen queue fills up because the app can't handle the incoming
>> connections).
>>
>> I'm not sure it matters, but some of the machines (and the above)
>> runs on an ESX hypervisor (but as far as I can remember, I could see
>> this on physical machines too, but I'm not sure about that).
>> Also -so far- I could only see this where some "exotic" stuff ran,
>> like a java or erlang based server (opendj, elasticsearch and rabbitmq).
>>
>> Also not sure about which triggers this. I've never seen this after
>> some hours of uptime, at least some days or a week must've been
>> passed to get stuck like the above.
>>
>> Any ideas about this?
>>
>> Thanks,
>> _______________________________________________
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to
>> "freebsd-stable-unsubscribe@freebsd.org"
>
>
>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?562FA55D.6050503>
