Date: Mon, 20 Feb 2006 11:13:54 GMT From: Ivan Getta <ivan@wicomtechnologies.com> To: freebsd-gnats-submit@FreeBSD.org Subject: threads/93592: Loss of wakeup in pthread_cond_timedwait(), libpthread (KSE), Intel Hyper-Threading Technology (HTT), FreeBSD Production Release 6.0 Message-ID: <200602201113.k1KBDs9b028040@www.freebsd.org> Resent-Message-ID: <200602201120.k1KBK3Wb087921@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 93592 >Category: threads >Synopsis: Loss of wakeup in pthread_cond_timedwait(), libpthread (KSE), Intel Hyper-Threading Technology (HTT), FreeBSD Production Release 6.0 >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-threads >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Feb 20 11:20:03 GMT 2006 >Closed-Date: >Last-Modified: >Originator: Ivan Getta >Release: FreeBSD 6.0-RELEASE >Organization: Wicom Technologies >Environment: FreeBSD rabbit.wicom.kiev.ua 6.0-RELEASE-p1 FreeBSD 6.0-RELEASE-p1 #2: Mon Jan 9 20:29:47 EET 2006 rus_s@rabbit.wicom.kiev.ua:/usr/src/sys/i386/compile/RABBIT_A i386 hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz FreeBSD d33.wicomtechnologies.com 5.3-RELEASE-p15 FreeBSD 5.3-RELEASE-p15 #2: Mon May 16 20:08:23 EEST 2005 rus_s@d33.wicomtechnologies.com:/usr/src/sys/i386/compile/D33_POLL i386 hw.model: Intel(R) Pentium(R) 4 CPU 3.00GHz >Description: A problem was detected and repeated by means of a test-program (see source code below). The test-rogram runs 40 threads which wait in a loop on the same condition variable with timeout. Condition never signals, threads are waking up on timeout and wait again. The problem is, that thread may not wake up on timeout and sleeps infinite after a call to pthread_cond_timedwait(). The problem was detected whith hyper-threading (HT) turned on (machdep.hyperthreading_allowed=1) and exists both on Dell PE-750 (rabbit.wicom.kiev.ua 6.0-RELEASE-p1) as well as on another server with Intel SE7210TP1-E MB (d33.wicomtechnologies.com 5.3-RELEASE-p15). Threads do not fall into the infinite sleep when HT is off. The problem was detected when the test-program uses libpthread(KSE). When linked with other threading library (libc_r, libthr) the test-program does not detect any problems and works fine. >How-To-Repeat: Compile following C-source code and run it on Intel machine with HT enabled. Wait for ~1 hour (or less) while some threads will fall into infinite sleep. When detected infinite sleeping thread the program prints message like: "*** ERROR: Thread 7 is dead! (chkpoint=3)". C-source code: #include <stdio.h> #include <errno.h> #include <stdlib.h> #include <pthread.h> #include <sys/time.h> #define NUM_THREADS 40 #define MAX_WAIT 10 pthread_cond_t cv; pthread_mutex_t mut; int thrcount = 0; typedef struct _thrarg_t { int n; pthread_t tid; struct timespec ts_exit; int chkpoint; int dead; } thrarg_t; thrarg_t args[NUM_THREADS]; void delay(int ms) { struct timeval tv; tv.tv_sec = ms / 1000; // seconds tv.tv_usec = (ms % 1000) * 1000; // microseconds select(0, NULL, NULL, NULL, &tv); } void* thread_proc(void* arg) { int rc; thrarg_t* parg = (thrarg_t*)arg; unsigned long msec = 100 + (parg->n * 10); unsigned long sec = (msec / 1000); unsigned long nsec = ((msec % 1000) * 1000000); struct timespec ts; printf("Thread %d started (wait=%lu)\n", parg->n, msec); for ( ; ; ) { parg->chkpoint = 1; // // Lock the mutex // rc = pthread_mutex_lock(&mut); if (rc != 0) { printf("pthread_mutex_lock() returned %d\n", rc); break; } parg->chkpoint = 2; // // Prepare timeout value // clock_gettime(CLOCK_REALTIME, &ts); ts.tv_sec += sec; ts.tv_nsec += nsec; if (ts.tv_nsec >= 1000000000) { ts.tv_sec++; ts.tv_nsec -= 1000000000; } parg->ts_exit.tv_sec = ts.tv_sec; parg->ts_exit.tv_nsec = ts.tv_nsec; parg->chkpoint = 3; rc = pthread_cond_timedwait(&cv, &mut, &ts); if ((rc != 0) && (rc != ETIMEDOUT)) { printf("pthread_cond_timedwait() returned %d\n", rc); pthread_mutex_unlock(&mut); break; } else { if (parg->dead == 1) { printf("Thread %d alive!\n", parg->n); parg->dead = 0; thrcount++; } } parg->chkpoint = 4; // // Unlock the mutex // rc = pthread_mutex_unlock(&mut); if (rc != 0) { printf("pthread_mutex_unlock() returned %d\n", rc); break; } } printf("Thread %d terminated.\n", parg->n); parg->chkpoint = 5; thrcount--; // // Terminate the thread // pthread_exit(NULL); } int main(int argc, char* argv[]) { int rc, j; time_t t0, t1; rc = pthread_cond_init(&cv, NULL); if (rc != 0) { printf("pthread_cond_init() returned %d\n", rc); exit(1); } rc = pthread_mutex_init(&mut, NULL); if (rc != 0) { printf("pthread_mutex_init() returned %d\n", rc); exit(1); } for (j = 0; j < NUM_THREADS; j++) { args[j].n = j; args[j].ts_exit.tv_sec = time(NULL); args[j].ts_exit.tv_nsec = 0; args[j].chkpoint = 0; args[j].dead = 0; rc = pthread_create(&args[j].tid, NULL, thread_proc, (void*)&args[j]); delay(50); thrcount++; } t0 = 0; while (thrcount > 0) { struct timespec ts_now; t1 = time(NULL); if ((t1 - t0) >= (60 * 5)) { printf("Running threads: %d | %s", thrcount, ctime(&t1)); t0 = t1; } clock_gettime(CLOCK_REALTIME, &ts_now); for (j = 0; j < NUM_THREADS; j++) { time_t t_max; if (args[j].dead == 0) { pthread_mutex_lock(&mut); t_max = args[j].ts_exit.tv_sec + MAX_WAIT; if (ts_now.tv_sec > t_max) { printf("*** ERROR: Thread %d is dead! (chkpoint=%d)\n", j, args[j].chkpoint); printf("ARG: tv_sec=%lu tv_nsec=%lu\n", (unsigned long)args[j].ts_exit.tv_sec, (unsigned long)args[j].ts_exit.tv_nsec); printf("NOW: tv_sec=%lu tv_nsec=%lu\n\n", (unsigned long)ts_now.tv_sec, (unsigned long)ts_now.tv_nsec); args[j].dead = 1; thrcount--; } pthread_mutex_unlock(&mut); } } delay(1000); } pthread_cond_destroy(&cv); pthread_mutex_destroy(&mut); return 0; } >Fix: Probably KSE bug, hyper-threading related bug >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200602201113.k1KBDs9b028040>