Date: Tue, 2 Sep 2003 18:26:53 -0500 From: "Burton M. Strauss III" <Burton@ntopsupport.com> To: <FreeBSD-gnats-submit@FreeBSD.org> Subject: kern/56339: select() call (poll() too) hangs, yet call works perfectly (no hang) under gdb Message-ID: <JIEPJGFPFMFIGBNCPKGGOEMBEEAA.Burton@ntopsupport.com> Resent-Message-ID: <200309022330.h82NUIZZ064022@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 56339 >Category: kern >Synopsis: select() call (poll() too) hangs, yet call works perfectly >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Sep 02 16:30:17 PDT 2003 >Closed-Date: >Last-Modified: >Originator: Burton M. Strauss III >Release: FreeBSD 4.8-RELEASE i386 >Organization: private citizen >Environment: System: FreeBSD owl.gateway.2wire.net 4.8-RELEASE FreeBSD 4.8-RELEASE #0: Thu Apr 3 10:53:38 GMT 2003 root@free bsd-stable.sentex.ca:/usr/obj/usr/src/sys/GENERIC i386 >Description: A normal user program (multi-threaded) hangs on a select() call. Changing the call to a poll() still hangs. Under gdb, the program works perfectly. Here is the code: while(myGlobals.capturePackets != FLAG_NTOPSTATE_TERM) { traceEvent(CONST_TRACE_INFO, "DEBUG: Select(ing) %d....", topSock); memcpy(&mask, &mask_copy, sizeof(fd_set)); rc = select(topSock+1, &mask, 0, 0, NULL /* Infinite */); traceEvent(CONST_TRACE_INFO, "DEBUG: select returned: %d", rc); if(rc > 0) { handleSingleWebConnection(&mask); } } (traceEvent becomes a call to syslog). The log message shows the call to select, but it never returns. This is true, even if I change the timeout from infinite to, say 10s. Same behavior is seen on FreeBSD 5.1. Same behavior if you convert select() to poll(), (example below), the call never returns. while(myGlobals.capturePackets != FLAG_NTOPSTATE_TERM) { for (i=0; i<pollFdsCount; i++) { pollFds[i].revents = 0; } traceEvent(CONST_TRACE_INFO, "DEBUG: poll(0x%X, %d, 10000)", &pollFds[0], pollFdsCount); rc = poll(&pollFds[0], pollFdsCount, 10000); traceEvent(CONST_TRACE_INFO, "DEBUG: poll returned: %d", rc); ... } # netstat -a Shows the socket is in LISTEN state: Active Internet connections (including servers) Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp4 0 0 *.3000 *.* LISTEN The application IS running: 12:36 owl [FreeBSD 4.8] user=root pwd=~ # ps -U ntop PID TT STAT TIME COMMAND 61359 ?? Ss 0:02.06 /usr/bin/ntop -i sis0 @/etc/ntop.conf -d --use-syslog=local3 -t 5 12:36 owl [FreeBSD 4.8] user=root pwd=~ # ps -U ntop PID TT STAT TIME COMMAND 61359 ?? Ss 0:02.32 /usr/bin/ntop -i sis0 @/etc/ntop.conf -d --use-syslog=local3 -t 5 If you connect to the running program and check the various threads: (gdb) info thread 8 process 60269, thread 8 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 7 process 60269, thread 7 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 6 process 60269, thread 6 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 5 process 60269, thread 5 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 4 process 60269, thread 4 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 3 process 60269, thread 3 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 2 process 60269, thread 2 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 * 1 process 60269, thread 1 0x2830358c in __sys_read () from /usr/lib/libc_r.so.4 gives: (gdb) thread 1 --- this is the libpcap thread (gdb) info stack #0 0x2830358c in __sys_read () from /usr/lib/libc_r.so.4 #1 0x282ff9c8 in _read () from /usr/lib/libc_r.so.4 #2 0x282ffa22 in read () from /usr/lib/libc_r.so.4 #3 0x28549de2 in pcap_read () from /usr/lib/libpcap.so.2 #4 0x2854997f in pcap_dispatch () from /usr/lib/libpcap.so.2 #5 0x280fbb19 in pcapDispatch (_i=0x0) at ntop.c:81 #6 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4 #7 0x0 in ?? () (gdb) thread 2 -- this is the hung thread [Switching to thread 2 (process 60269, thread 2)] #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 (gdb) info stack #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 #1 0x2830263d in _thread_kern_sched_state () from /usr/lib/libc_r.so.4 #2 0x282c5050 in _poll () from /usr/lib/libc_r.so.4 #3 0x282c50ae in poll () from /usr/lib/libc_r.so.4 #4 0x280ccb8b in handleWebConnections (notUsed=0x0) at webInterface.c:5351 #5 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4 #6 0x0 in ?? () -- threads 3, 4, 5 and 7 periodically wake up, do their processing and sleep (gdb) thread 3 [Switching to thread 3 (process 60269, thread 3)] #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 (gdb) info stack #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 #1 0x2830263d in _thread_kern_sched_state () from /usr/lib/libc_r.so.4 #2 0x2831fa98 in _nanosleep () from /usr/lib/libc_r.so.4 #3 0x282fcd46 in __sleep () from /usr/lib/libc_r.so.4 #4 0x282c39f1 in sleep () from /usr/lib/libc_r.so.4 #5 0x28110251 in ntop_sleep (secs=13) at util.c:2950 #6 0x2877469f in rrdMainLoop (notUsed=0x0) at rrdPlugin.c:1412 #7 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4 #8 0x0 in ?? () (gdb) thread 4 [Switching to thread 4 (process 60269, thread 4)] #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 (gdb) info stack #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 #1 0x283026aa in _thread_kern_sched_state_unlock () from /usr/lib/libc_r.so.4 #2 0x28304325 in pthread_cond_wait () from /usr/lib/libc_r.so.4 #3 0x2810db62 in waitCondvar (condvarId=0x2811d4e4) at util.c:1415 #4 0x280f0587 in dequeueAddress (notUsed=0x0) at address.c:546 #5 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4 #6 0x0 in ?? () (gdb) thread 5 [Switching to thread 5 (process 60269, thread 5)] #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 (gdb) info stack #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 #1 0x2830263d in _thread_kern_sched_state () from /usr/lib/libc_r.so.4 #2 0x2831fa98 in _nanosleep () from /usr/lib/libc_r.so.4 #3 0x282fcd46 in __sleep () from /usr/lib/libc_r.so.4 #4 0x282c39f1 in sleep () from /usr/lib/libc_r.so.4 #5 0x28110251 in ntop_sleep (secs=60) at util.c:2950 #6 0x280fcb50 in scanIdleLoop (notUsed=0x0) at ntop.c:592 #7 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4 #8 0x0 in ?? () (gdb) thread 6 [Switching to thread 6 (process 60269, thread 6)] #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 (gdb) info stack #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 #1 0x283026aa in _thread_kern_sched_state_unlock () from /usr/lib/libc_r.so.4 #2 0x28304585 in pthread_cond_timedwait () from /usr/lib/libc_r.so.4 #3 0x282ebee1 in _thread_gc () from /usr/lib/libc_r.so.4 #4 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4 #5 0x0 in ?? () (gdb) thread 7 [Switching to thread 7 (process 60269, thread 7)] #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 (gdb) info stack #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 #1 0x283026aa in _thread_kern_sched_state_unlock () from /usr/lib/libc_r.so.4 #2 0x28304325 in pthread_cond_wait () from /usr/lib/libc_r.so.4 #3 0x2810db62 in waitCondvar (condvarId=0x2811d4d8) at util.c:1415 #4 0x28101ce9 in dequeuePacket (notUsed=0x0) at pbuf.c:1694 #5 0x282c60a8 in _thread_start () from /usr/lib/libc_r.so.4 #6 0x0 in ?? () (gdb) thread 8 -- this is the main() - it wakes up, checks if the children are busy and goes back to sleep. [Switching to thread 8 (process 60269, thread 8)] #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 (gdb) info stack #0 0x28301e7b in _thread_kern_sched () from /usr/lib/libc_r.so.4 #1 0x2830263d in _thread_kern_sched_state () from /usr/lib/libc_r.so.4 #2 0x2831fa98 in _nanosleep () from /usr/lib/libc_r.so.4 #3 0x282fcd46 in __sleep () from /usr/lib/libc_r.so.4 #4 0x282c39f1 in sleep () from /usr/lib/libc_r.so.4 #5 0x28110251 in ntop_sleep (secs=10) at util.c:2950 #6 0x804ce6a in main (argc=8, argv=0xbfbffb34) at main.c:1186 #7 0x804abf2 in _start () (gdb) >How-To-Repeat: No specifics - it just happens in our code every time. >Fix: >Release-Note: >Audit-Trail: >Unformatted: X-send-pr-version: 3.113 X-GNATS-Notify: (no hang) under gdb
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?JIEPJGFPFMFIGBNCPKGGOEMBEEAA.Burton>