From owner-freebsd-emulation@FreeBSD.ORG Wed Oct 13 09:23:28 2004 Return-Path: Delivered-To: freebsd-emulation@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A36C016A4CE for ; Wed, 13 Oct 2004 09:23:28 +0000 (GMT) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2495043D4C for ; Wed, 13 Oct 2004 09:23:26 +0000 (GMT) (envelope-from avg@icyb.net.ua) Received: from [212.40.38.87] (oddity.topspin.kiev.ua [212.40.38.87]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA09076 for ; Wed, 13 Oct 2004 12:23:22 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <416CF40A.3070008@icyb.net.ua> Date: Wed, 13 Oct 2004 12:23:22 +0300 From: Andriy Gapon User-Agent: Mozilla Thunderbird 0.7.3 (X11/20040831) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-emulation@freebsd.org Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: linux threaded application hang X-BeenThere: freebsd-emulation@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Development of Emulators of other operating systems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Oct 2004 09:23:28 -0000 This is probably not a problem of our linux emulation, but I think this is most likely place for me to get help. Here's what I have: linux_base-7.1_7 linux_devtools-7.1_3 I have installed IBM WebSphere MQ: MQSeriesJava-5.3.0-1 MQSeriesSDK-5.3.0-1 MQSeriesRuntime-5.3.0-1 MQSeriesServer-5.3.0-1 MQSeriesClient-5.3.0-1 MQSeriesJava-U486878-5.3.0-4 MQSeriesRuntime-U486878-5.3.0-4 MQSeriesSDK-U486878-5.3.0-4 MQSeriesServer-U486878-5.3.0-4 MQSeriesClient-U486878-5.3.0-4 I am trying to execute `setmqcap -1` command and it hangs, seemingly because of inter-thread communication problem. Here's ktrace | linux_kdump of the ineteresting part: #main thread creates a new thread/process with pid 5716 5715 setmqcap CALL linux_clone(0xf00,0x805b3a0) 5715 setmqcap RET linux_clone 5716/0x1654 5716 setmqcap RET linux_fork 0 #writes/reads are made to a pipe and probably are some sort of #inter-thread communication in old linux threads #fd 4 in 5715 is writing end and fd 3 in 5716 is reading end of the pipe 5715 setmqcap CALL write(0x4,0xbfbfd900,0x94) 5715 setmqcap RET write 148/0x94 5716 setmqcap CALL linux_rt_sigprocmask(0x2,0x805b304,0,0x8) 5716 setmqcap RET linux_rt_sigprocmask 0 5715 setmqcap CALL linux_rt_sigprocmask(0x2,0,0xbfbfd9d4,0x8) 5715 setmqcap RET linux_rt_sigprocmask 0 5716 setmqcap CALL read(0x3,0x805b264,0x94) 5716 setmqcap GIO fd 3 read 148 bytes 5716 setmqcap RET read 148/0x94 5715 setmqcap CALL write(0x4,0xbfbfd9c0,0x94) 5715 setmqcap GIO fd 4 wrote 148 bytes 5715 setmqcap RET write 148/0x94 #child threas performs timed poll on a pipe before reading from it #0x7d0 == 2000ms == 2s 5716 setmqcap CALL poll(0x805b25c,0x1,0x7d0) 5716 setmqcap RET poll 1 5715 setmqcap CALL linux_rt_sigprocmask(0x2,0,0xbfbfd910,0x8) 5715 setmqcap RET linux_rt_sigprocmask 0 5716 setmqcap CALL getppid 5716 setmqcap RET getppid 5715/0x1653 #it seems that main thread starts to wait on some condition #until it is awaken by child thread by a signal 5715 setmqcap CALL linux_rt_sigsuspend(0xbfbfd910,0x8) 5716 setmqcap CALL read(0x3,0x805b264,0x94) 5716 setmqcap GIO fd 3 read 148 bytes 5716 setmqcap RET read 148/0x94 5716 setmqcap CALL linux_mmap(0x805b194) 5716 setmqcap RET linux_mmap -1086394368/0xbf3ef000 5716 setmqcap CALL mprotect(0xbf3ef000,0x1000,0) 5716 setmqcap RET mprotect 0 #child thread creates another thread, that's probably not important 5716 setmqcap CALL linux_clone(0xf21,0xbf3ffbf8) 5716 setmqcap RET linux_clone 5717/0x1655 5717 setmqcap RET linux_fork 0 #child thread wants to wake up main thread (pid == 5715 == 0x1653) #by sending signal 0x20 == 32 (== linux SIGRTMIN ?) 5716 setmqcap CALL linux_kill(0x1653,0x20) #but it seems to fail.. or not ? #I am not sure how to interpret "RESTART" here 5716 setmqcap RET linux_kill RESTART 5717 setmqcap CALL linux_getpid 5717 setmqcap RET linux_getpid 5717/0x1655 5716 setmqcap CALL poll(0x805b25c,0x1,0x7d0) 5717 setmqcap CALL linux_rt_sigprocmask(0x2,0xbf3ffca4,0,0x8) 5717 setmqcap RET linux_rt_sigprocmask 0 5717 setmqcap CALL linux_sched_setscheduler(0x1655,0,0xbf3ffd28) 5717 setmqcap RET linux_sched_setscheduler 0 5717 setmqcap CALL linux_getpid 5717 setmqcap RET linux_getpid 5717/0x1655 5717 setmqcap CALL linux_getpid 5717 setmqcap RET linux_getpid 5717/0x1655 5717 setmqcap CALL linux_sched_setscheduler(0x1655,0,0xbf3ff7a8) 5717 setmqcap RET linux_sched_setscheduler 0 5717 setmqcap CALL linux_rt_sigprocmask(0,0xbf3feecc,0xbf3ff5ac,0x8) 5717 setmqcap RET linux_rt_sigprocmask 0 5717 setmqcap CALL linux_rt_sigaction(0x1c,0xbf3fefcc,0xbf3fef3c,0x8) 5717 setmqcap RET linux_rt_sigaction 0 5717 setmqcap CALL linux_rt_sigprocmask(0x2,0,0xbf3feb3c,0x8) 5717 setmqcap RET linux_rt_sigprocmask 0 #second child thread starts to wait for a signal 5717 setmqcap CALL linux_rt_sigsuspend(0xbf3feb3c,0x8) #first child thread continues polling for read communication pipe #but there is nothing there because 5715 is still suspended 5716 setmqcap RET poll 0 5716 setmqcap CALL getppid 5716 setmqcap RET getppid 5715/0x1653 5716 setmqcap CALL poll(0x805b25c,0x1,0x7d0) 5716 setmqcap RET poll 0 5716 setmqcap CALL getppid 5716 setmqcap RET getppid 5715/0x1653 5716 setmqcap CALL poll(0x805b25c,0x1,0x7d0) 5716 setmqcap RET poll 0 5716 setmqcap CALL getppid 5716 setmqcap RET getppid 5715/0x1653 5716 setmqcap CALL poll(0x805b25c,0x1,0x7d0) 5716 setmqcap RET poll 0 5716 setmqcap CALL getppid 5716 setmqcap RET getppid 5715/0x1653 5716 setmqcap CALL poll(0x805b25c,0x1,0x7d0) #this keeps going on forever (until program is interrupted) If my understanding of how old linux threads work is correct, then it seems that the problem is that thread 5715 is never waken up from linux_rt_sigsuspend. I am not sure if this is because that signal 32 was never delivered or something more fishy is going on there: I have attached with gdb to main thread process and here's some data: (gdb) bt #0 0x481c69b6 in __sigsuspend (set=0xbfbfd8b0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45 #1 0x48186d45 in __pthread_wait_for_restart_signal (self=0x481905a0) at pthread.c:978 #2 0x48186e1d in __pthread_create_2_1 (thread=0xbfbfdaf4, attr=0xbfbfdf80, start_routine=0x480b6b80 , arg=0x8058438) at restart.h:34 #3 0x480b5683 in xcsCreateThreadWithHandle () at eval.c:41 #4 0x480861ed in xehStartAsySignalMonitor () at eval.c:41 #5 0x480824d0 in xehInitialize () at eval.c:41 #6 0x480928b5 in InitPrivateServices () at eval.c:41 #7 0x48094531 in xcsInitialize () at eval.c:41 #8 0x08049a01 in main () at eval.c:41 #9 0x481b4336 in __libc_start_main (main=0x80496d0
, argc=1, ubp_av=0xbfbfe744, init=0x8049108 <_init>, fini=0x80540a0 <_fini>, rtld_fini=0x4806325c <_dl_fini>, stack_end=0xbfbfe73c) at ../sysdeps/generic/libc-start.c:129 (gdb) p set $1 = (sigset_t *) 0x482c87f0 (gdb) p *set $2 = {__val = {1250168, 1208405312, 1208363760, 1209738430, 1209738446, 1210126092, 1209738478, 1209738494, 1210131344, 1209738526, 1209738542, 1209738558, 1209738574, 1209738590, 1209554352, 1209738622, 1209738638, 1209738654, 1209738670, 1209738686, 1209738702, 1209738718, 1209738734, 1209738750, 1209738766, 1209738782, 1209738798, 1210090948, 1209738830, 1209738846, 1209738862, 1209738878}} if I understand correctly linux sigset_t then mask for signal 32 should be bit 31 (highest) of __val[0]: 1250168 == 100110001001101111000 e.g. it seems that the signal was blocked. Unfortunately I was not able to attach to other threads, but this looks insane, especially considering this: (gdb) fr 1 #1 0x48186d45 in __pthread_wait_for_restart_signal (self=0x481905a0) at pthread.c:978 978 pthread.c: No such file or directory. in pthread.c (gdb) p mask $3 = {__val = {0, 0, 0, 0, 0, 1208347125, 0, 3217021232, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 16777216, 0, 1209525676, 102, 1209536638, 1209529244, 1209524044, 1208404144, 7, 1210900512, 3, 1210877936}} (gdb) p __pthread_sig_restart $4 = 32 (gdb) p &mask $5 = (sigset_t *) 0xbfbfd890 How can &mask in frame 1 be different from set in frame 0 ? P.S. I tried setting LD_ASSUME_KERNEL to various values from 2.2.5 to 2.4.2, but it didn't change anything. -- Andriy Gapon