Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Sep 2019 19:52:20 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        freebsd-threads@freebsd.org
Subject:   assertion when destroying a process shared mutex
Message-ID:  <6f6a16a3-8eca-ceb0-4ca3-aadf2d926f81@FreeBSD.org>

next in thread | raw e-mail | index | archive | help

Fatal error 'mutex 0x800661000 own 0x80000010 is on list 0x8006591a0 0x0' at
line 153 in file /usr/src/lib/libthr/thread/thr_mutex.c (errno = 0)

This happens with a mutex initialized with PTHREAD_PROCESS_SHARED,
PTHREAD_MUTEX_ROBUST and PTHREAD_MUTEX_ERRORCHECK.
The situation that leads to the abort seems to be this:
- one process takes the lock and then crashes without releasing the lock
- some time later another process does a cleanup and attempts to destroy the mutex
That's where the assertion happens.

Specifically, it seems that the assert is tripped if there are no other
operations on the lock between the crash of one process and the destroy in the
the other process.

I wrote a small test program to demo the issue:
https://people.freebsd.org/~avg/shared_mtx.c

The state of the mutex in a crash dump is this:
(gdb) p/x *(struct pthread_mutex *)0x800661000
$6 = {m_lock = {m_owner = 0x80000010, m_flags = 0x11, m_ceilings = {0x0, 0x0},
m_rb_lnk = 0x0, m_spare = {0x0, 0x0}}, m_flags = 0x1, m_count = 0x0, m_spinloops
= 0x0, m_yieldloops = 0x0, m_ps = 0x2, m_qe = {tqe_next = 0x0,
    tqe_prev = 0x8006591a0}, m_pqe = {tqe_next = 0x0, tqe_prev = 0x0}, m_rb_prev
= 0x0}

So, it's m_qe.tqe_prev != NULL that leads to the assert.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6f6a16a3-8eca-ceb0-4ca3-aadf2d926f81>