Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 7 Feb 2010 14:06:18 -0500
From:      Ryan Stone <rysto32@gmail.com>
To:        freebsd-ports@freebsd.org
Cc:        stas@FreeBSD.org
Subject:   TLS(and by extension all threading) completely broken in Valgrind on i386/amd64
Message-ID:  <bc2d971002071106s53356f7p30696c9abc5f2795@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
--00504502d3bc268f15047f07610a
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I've been trying out valgrind on some threaded FreeBSD applications
but they've been deadlocking at startup. =A0I've identified that the
root cause is that FreeBSD's thread local storage is not being
emulated properly by valgrind. =A0The problem on amd64 is obvious:
valgrind gives an invalid opcode error when the program tries to
execute any instruction that accesses the gs register. =A0On i386 the
problem is much more subtle.

I've attached two test applications that demonstrate the problem. =A0In
pthread_self.c, I create one thread which periodically prints
pthread_self(), and then 10 seconds later I create a second thread.
After the second thread is created, the first thread believes that it
is the second thread. =A0Here's an example invocation:


=3D=3D883=3D=3D Memcheck, a memory error detector
=3D=3D883=3D=3D Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et=
 al.
=3D=3D883=3D=3D Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyrigh=
t info
=3D=3D883=3D=3D Command: ./pthread_self
=3D=3D883=3D=3D
0x18c180
0x18c180
0x18c180
0x18c180
0x18c180
0x18c180
0x18c180
0x18c180
0x18c180
1st: 0x18c180
2nd: 0x18d390
0x18d390
        0x18d390
0x18d390
        0x18d390
0x18d390
        0x18d390
0x18d390
        0x18d390
0x18d390
        0x18d390

Note that first thread correctly prints that its pthread_t is 0x18c180
before the second thread is created, but after the second thread is
created both threads report that they are 0x18d390!  As far as I can
tell, all threads use the thread local storage of the last thread
created.  This completely breaks libthr's mutexes, as mutex.c
demonstrates.  In that test app, the main thread acquires a mutex and
then creates a new thread, then it tries to unlock the mutex.  The
unlock fails with EPERM, which is returned by pthread_mutex_unlock
when a thread tries to acquire a mutex that it does not own.  This
behaviour is likely the cause of all of the "false positives" from
helgrind.  Helgrind is correctly noting that the libthr internals are
using the same memory in different threads, because the threads think
that they are touching thread-local memory.

I've found the point in the thr_new syscall wrapper where valgrind
notes the TLS area, but I can't figure out how it uses the
information, so I'm stuck in figuring out why valgrind is getting this
wrong.  Anyone have any ideas?  I'm not subscribed to this list so
please CC me on any replies.

--00504502d3bc268f15047f07610a--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bc2d971002071106s53356f7p30696c9abc5f2795>