Date: 21 Oct 2002 22:31:48 +0200 From: Linus Kendall <linus@angliaab.se> To: Peter Pentchev <roam@ringlet.net> Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: PThreads problem Message-ID: <1035232308.24315.37.camel@bilbo> In-Reply-To: <20021021194453.GB377@straylight.oblivion.bg> References: <1035200159.24315.13.camel@bilbo> <20021021124520.GS389@straylight.oblivion.bg> <1035206648.24315.20.camel@bilbo> <20021021134834.GA41198@straylight.oblivion.bg> <20021021135045.GB41198@straylight.oblivion.bg> <1035218026.24330.33.camel@bilbo> <20021021194453.GB377@straylight.oblivion.bg>
next in thread | previous in thread | raw e-mail | index | archive | help
m=E5n 2002-10-21 klockan 21.44 skrev Peter Pentchev: > On Mon, Oct 21, 2002 at 06:33:46PM +0200, Linus Kendall wrote: > > Answer inline below. > >=20 > > m?n 2002-10-21 klockan 15.50 skrev Peter Pentchev: > > > On Mon, Oct 21, 2002 at 04:48:34PM +0300, Peter Pentchev wrote: > > > > On Mon, Oct 21, 2002 at 03:24:08PM +0200, Linus Kendall wrote: > > > > > m?n 2002-10-21 klockan 14.45 skrev Peter Pentchev: > > > > > > On Mon, Oct 21, 2002 at 01:35:59PM +0200, Linus Kendall wrote: > > > > > > > Hi, > > > > > > >=20 > > > > > > > I'm trying to port a heavily threaded application from Linux = (Debian > > > > > > > 3.0, 2.4.19) to > > > > > > > FreeBSD (4.6-RELEASE). The program compiles successfully usin= g gcc with > > > > > > > -pthreads. But, when I try to run the application I get the f= ollowing > > > > > > > error after a while (after spawning 11 threads): > > > > > > >=20 > > > > > > > Fatal error 'siglongjmp()ing between thread contexts is undef= ined by > > > > > > > POSIX 1003.1' at line ? in file > > > > > > > /usr/src/lib/libc_r/uthread/uthread_jmp.c (errno =3D ?) > > > > > > > Abort trap - core dumped > > > > > > >=20 > [snip] > > > > This is interesting; can you produce a simple testcase? If not, I = will > > > > be able to take a look at it some time later today or tomorrow, but= not > > > > right now :( > >=20 > > I'm not sure if I've really got time to produce a testcase. As I've > > understood the main cause of the crash was that in *BSD the signals > > are sent to each thread but in Linux they're sent to the process. >=20 > Okay, I can see what the problem is; however, I have absolutely no idea > how it is to be solved :( >=20 > The DNS resolution routines of libcurl use alarm() as a timeout > mechanism for the system DNS resolving functions. To enforce the > timeout even when the resolver functions are automatically restarted > after the SIGALRM signal, libcurl attempts to set a jump buffer in the > thread doing the DNS lookup, and to siglongjmp() to it from the SIGALRM > handler. >=20 > This works just fine on Linux, where each thread executes as a separate > process; the signal is correctly delivered to the thread which invoked > alarm(), and, consequently, exactly the one that set the jump buffer in > the first place. >=20 > On FreeBSD, however, the signal is delivered merely to the currently > executing thread; if the resolver routines are currently in the process > of sending or receiving data on a network socket, the currently > executing thread may very well not be the one that has requested the > resolving, and so siglongjmp() may be called from a thread which is NOT > the one the jump buffer has been set in. As the abort error message > states, this is behavior not covered by any standards, and, I dare say, > not very easy to implement at all, so it is currently unimplemented in > FreeBSD. For a standards reference, the SUSv2 siglongjmp() manpage at > http://www.opengroup.org/onlinepubs/007908799/xsh/siglongjmp.html > explicitly states at the end of the DESCRIPTION section: >=20 > The effect of a call to siglongjmp() where initialisation of the jmp_bu= f > structure was not performed in the calling thread is undefined. >=20 > > Blocking all signals resulted in an application which executed but > > still I got problems with slow responses from libcurl >=20 > As I understand it, the only reason for SIGALRM to make a difference > would be a situation where a DNS query times out, at least by libcurl's > standards. Is your application trying to do such lookups? >=20 > If anybody is interested, I am attaching a short proof-of-concept > program which starts up two threads, then waits for a signal handler to > hit. If the longjmp() call is commented out, it displays the thread ID > of the thread which received the signal - almost always the main thread, > the one listed as 'me' in the list output at the program start, and most > definitely not the last thread to call setjmp(), as that would be 't2'. > If the longjmp() call is uncommented, the signal handler executing in > the 'me' thread will longjmp() to a buffer initialized in the 't2' > thread, and the program will abort with your error message with a 100% > failure (or would that be success in proving the concept?) rate. >=20 > People knowledgeable about threads: would there be a way to fix that > problem? I don't know.. something like examining the jump buffer, then > activating the thread that is stored there, and resuming the currently > executing thread at the point where it was interrupted by the signal? > Without looking at the code, I can guess that most probably the answer > would be a short burst of hysterical laughter :) Still.. one may hope.. > :) That was very thorough, thanks! Now I at least have a notion of what=20 is going on. Since this is slightly urgent I guess a hack into the libcurl source code to try to remove the sigalarms would do the trick (in my case). In the general case it seems like there's a rather big problem here as libcurl's behavior cannot really work together with the FreeBSD implementation of threads. /Linus. > G'luck, > Peter >=20 > --=20 > Peter Pentchev roam@ringlet.net roam@FreeBSD.org > PGP key: http://people.FreeBSD.org/~roam/roam.key.asc > Key fingerprint FDBA FD79 C26F 3C51 C95E DF9E ED18 B68D 1619 4553 > Hey, out there - is it *you* reading me, or is it someone else? >=20 > #include <sys/types.h> >=20 > #include <pthread.h> > #include <setjmp.h> > #include <signal.h> > #include <stdio.h> > #include <unistd.h> >=20 > pthread_mutex_t mtxQ; > int q[16]; > pthread_t tq[16]; > size_t qcnt; > sigjmp_buf jmpbuf; >=20 > static void > sigalarm(int f) > { >=20 > pthread_mutex_lock(&mtxQ); > q[qcnt] =3D f; > tq[qcnt] =3D pthread_self(); > qcnt++; > pthread_mutex_unlock(&mtxQ); > // siglongjmp(jmpbuf, 5); > } >=20 > static void * > thr(void *arg) > { >=20 > sigsetjmp(jmpbuf, 0); > sleep((int)arg); > return (NULL); > } >=20 > int > main(void) > { > pthread_t t1, t2; > size_t i; > struct sigaction sa; >=20 > sigsetjmp(jmpbuf, 0); > pthread_mutex_init(&mtxQ, NULL); > printf("me =3D %ld\n", (long)pthread_self()); > pthread_create(&t1, NULL, thr, (void *)4); > printf("t1 =3D %ld\n", (long)t1); > pthread_create(&t2, NULL, thr, (void *)5); > printf("t2 =3D %ld\n", (long)t2); > memset(&sa, 0, sizeof(sa)); > sa.sa_handler =3D sigalarm; > sigemptyset(&sa.sa_mask); > sigaddset(&sa.sa_mask, SIGALRM); > sigaction(SIGALRM, &sa, NULL); > alarm(1); > printf("qcnt =3D %u\n", qcnt); > sleep(3); > printf("qcnt =3D %u\n", qcnt); > sleep(3); > printf("qcnt =3D %u\n", qcnt); > sleep(3); > printf("qcnt =3D %u\n", qcnt); > for (i =3D 0; i < qcnt; i++) > printf("%2d\t%d\t%ld\n", i, q[i], (long)tq[i]); > return (0); > } To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1035232308.24315.37.camel>