Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Feb 2006 11:13:54 GMT
From:      Ivan Getta <ivan@wicomtechnologies.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   threads/93592: Loss of wakeup in pthread_cond_timedwait(), libpthread (KSE), Intel Hyper-Threading Technology (HTT), FreeBSD Production Release 6.0
Message-ID:  <200602201113.k1KBDs9b028040@www.freebsd.org>
Resent-Message-ID: <200602201120.k1KBK3Wb087921@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         93592
>Category:       threads
>Synopsis:       Loss of wakeup in pthread_cond_timedwait(), libpthread (KSE),  Intel Hyper-Threading Technology (HTT), FreeBSD Production Release 6.0
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-threads
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Feb 20 11:20:03 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator:     Ivan Getta
>Release:        FreeBSD 6.0-RELEASE
>Organization:
Wicom Technologies
>Environment:
FreeBSD rabbit.wicom.kiev.ua 6.0-RELEASE-p1 FreeBSD 6.0-RELEASE-p1 #2: Mon Jan  9 20:29:47 EET 2006     
rus_s@rabbit.wicom.kiev.ua:/usr/src/sys/i386/compile/RABBIT_A  i386              

hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz


FreeBSD d33.wicomtechnologies.com 5.3-RELEASE-p15 FreeBSD 5.3-RELEASE-p15 #2: Mon May 16 20:08:23 EEST 2005     
rus_s@d33.wicomtechnologies.com:/usr/src/sys/i386/compile/D33_POLL  i386

hw.model: Intel(R) Pentium(R) 4 CPU 3.00GHz
>Description:
A problem was detected and repeated by means of a test-program (see source
code below). The test-rogram runs 40 threads which wait in a loop on the same condition variable with timeout. Condition never signals, threads are waking
up on timeout and wait again. 
The problem is, that thread may not wake up on timeout and sleeps infinite after a call to pthread_cond_timedwait().

The problem was detected whith hyper-threading (HT) turned on (machdep.hyperthreading_allowed=1) and exists both on Dell PE-750 (rabbit.wicom.kiev.ua 6.0-RELEASE-p1) as well as on another server with Intel SE7210TP1-E MB (d33.wicomtechnologies.com 5.3-RELEASE-p15).
Threads do not fall into the infinite sleep when HT is off.

The problem was detected when the test-program uses libpthread(KSE). When linked
with other threading library (libc_r, libthr) the test-program does not detect
any problems and works fine.              
>How-To-Repeat:
Compile following C-source code and run it on Intel machine with HT enabled.

Wait for ~1 hour (or less) while some threads will fall into infinite sleep.

When detected infinite sleeping thread the program prints message like:
"*** ERROR: Thread 7 is dead! (chkpoint=3)".

C-source code:

#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/time.h>

#define NUM_THREADS 40
#define MAX_WAIT    10

pthread_cond_t cv;
pthread_mutex_t mut;

int thrcount = 0;

typedef struct _thrarg_t
{
    int n;
    pthread_t tid;
    struct timespec ts_exit;
    int chkpoint;
    int dead;

} thrarg_t;

thrarg_t args[NUM_THREADS];

void delay(int ms)
{
    struct timeval tv;
    tv.tv_sec = ms / 1000;              // seconds
    tv.tv_usec = (ms % 1000) * 1000;    // microseconds

    select(0, NULL, NULL, NULL, &tv);
}

void* thread_proc(void* arg)
{
    int rc;    
    thrarg_t* parg = (thrarg_t*)arg;
    
    unsigned long msec = 100 + (parg->n * 10);
    unsigned long sec = (msec / 1000);
    unsigned long nsec = ((msec % 1000) * 1000000);

    struct timespec ts;

    printf("Thread %d started (wait=%lu)\n",
        parg->n, msec);

    for ( ; ; )
    {
        parg->chkpoint = 1;

        //
        // Lock the mutex
        //
        rc = pthread_mutex_lock(&mut);

        if (rc != 0)
        {
            printf("pthread_mutex_lock() returned %d\n", rc);
            break;
        }

        parg->chkpoint = 2;

        //
        // Prepare timeout value
        //
        clock_gettime(CLOCK_REALTIME, &ts);

        ts.tv_sec += sec;
        ts.tv_nsec += nsec;

        if (ts.tv_nsec >= 1000000000)
        {
            ts.tv_sec++;
            ts.tv_nsec -= 1000000000;
        }

        parg->ts_exit.tv_sec = ts.tv_sec;
        parg->ts_exit.tv_nsec = ts.tv_nsec;

        parg->chkpoint = 3;

        rc = pthread_cond_timedwait(&cv, &mut, &ts);

        if ((rc != 0) && (rc != ETIMEDOUT))
        {
            printf("pthread_cond_timedwait() returned %d\n", rc);
            pthread_mutex_unlock(&mut);
            break;
        }
        else
        {
            if (parg->dead == 1)
            {
                printf("Thread %d alive!\n", parg->n);
                
                parg->dead = 0;
                thrcount++;
            }
        }

        parg->chkpoint = 4;

        //
        // Unlock the mutex
        //
        rc = pthread_mutex_unlock(&mut);

        if (rc != 0)
        {
            printf("pthread_mutex_unlock() returned %d\n", rc);
            break;
        }
    }

    printf("Thread %d terminated.\n", parg->n);

    parg->chkpoint = 5;

    thrcount--;

    //
    // Terminate the thread
    //
    pthread_exit(NULL);
}

int main(int argc, char* argv[])
{
    int rc, j;
    time_t t0, t1;

    rc = pthread_cond_init(&cv, NULL);

    if (rc != 0)
    {
        printf("pthread_cond_init() returned %d\n", rc);
        exit(1);
    }

    rc = pthread_mutex_init(&mut, NULL);
    
    if (rc != 0)
    {
        printf("pthread_mutex_init() returned %d\n", rc);
        exit(1);
    }

    for (j = 0; j < NUM_THREADS; j++)
    {
        args[j].n = j;
        args[j].ts_exit.tv_sec = time(NULL);
        args[j].ts_exit.tv_nsec = 0;
        args[j].chkpoint = 0;
        args[j].dead = 0;

        rc = pthread_create(&args[j].tid, NULL,
            thread_proc, (void*)&args[j]);

        delay(50);

        thrcount++;
    }

    t0 = 0;

    while (thrcount > 0)
    {
        struct timespec ts_now;

        t1 = time(NULL);

        if ((t1 - t0) >= (60 * 5))
        {
            printf("Running threads: %d | %s",
                thrcount, ctime(&t1));

            t0 = t1;
        }

        clock_gettime(CLOCK_REALTIME, &ts_now);

        for (j = 0; j < NUM_THREADS; j++)
        {
            time_t t_max;

            if (args[j].dead == 0)
            {
                pthread_mutex_lock(&mut);
            
                t_max = args[j].ts_exit.tv_sec + MAX_WAIT;

                if (ts_now.tv_sec > t_max)
                {
                    printf("*** ERROR: Thread %d is dead! (chkpoint=%d)\n",
                        j, args[j].chkpoint);
                    
                    printf("ARG: tv_sec=%lu tv_nsec=%lu\n",
                        (unsigned long)args[j].ts_exit.tv_sec,
                        (unsigned long)args[j].ts_exit.tv_nsec);

                    printf("NOW: tv_sec=%lu tv_nsec=%lu\n\n",
                        (unsigned long)ts_now.tv_sec,
                        (unsigned long)ts_now.tv_nsec);

                    args[j].dead = 1;
                    thrcount--;
                }

                pthread_mutex_unlock(&mut);
            }
        }

        delay(1000);
    }

    pthread_cond_destroy(&cv);
    pthread_mutex_destroy(&mut);

    return 0;
}
              
>Fix:
Probably KSE bug, hyper-threading related bug              
>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200602201113.k1KBDs9b028040>