Date: Mon, 23 Sep 2019 15:09:16 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-threads@FreeBSD.org Subject: Re: assertion when destroying a process shared mutex Message-ID: <a117f59a-6b36-bda7-b400-fae4280fce98@FreeBSD.org> In-Reply-To: <20190920173854.GJ2559@kib.kiev.ua> References: <6f6a16a3-8eca-ceb0-4ca3-aadf2d926f81@FreeBSD.org> <20190920173854.GJ2559@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On 20/09/2019 20:38, Konstantin Belousov wrote: > On Fri, Sep 20, 2019 at 07:52:20PM +0300, Andriy Gapon wrote: >> >> Fatal error 'mutex 0x800661000 own 0x80000010 is on list 0x8006591a0 0x0' at >> line 153 in file /usr/src/lib/libthr/thread/thr_mutex.c (errno = 0) >> >> This happens with a mutex initialized with PTHREAD_PROCESS_SHARED, >> PTHREAD_MUTEX_ROBUST and PTHREAD_MUTEX_ERRORCHECK. >> The situation that leads to the abort seems to be this: >> - one process takes the lock and then crashes without releasing the lock >> - some time later another process does a cleanup and attempts to destroy the mutex >> That's where the assertion happens. >> >> Specifically, it seems that the assert is tripped if there are no other >> operations on the lock between the crash of one process and the destroy in the >> the other process. >> >> I wrote a small test program to demo the issue: >> https://people.freebsd.org/~avg/shared_mtx.c >> >> The state of the mutex in a crash dump is this: >> (gdb) p/x *(struct pthread_mutex *)0x800661000 >> $6 = {m_lock = {m_owner = 0x80000010, m_flags = 0x11, m_ceilings = {0x0, 0x0}, >> m_rb_lnk = 0x0, m_spare = {0x0, 0x0}}, m_flags = 0x1, m_count = 0x0, m_spinloops >> = 0x0, m_yieldloops = 0x0, m_ps = 0x2, m_qe = {tqe_next = 0x0, >> tqe_prev = 0x8006591a0}, m_pqe = {tqe_next = 0x0, tqe_prev = 0x0}, m_rb_prev >> = 0x0} >> >> So, it's m_qe.tqe_prev != NULL that leads to the assert. > > This is only relevant for robust mutexes, otherwise the behavior is > undefined if the owner terminates without unlocking it. I believe that > in case of the kernel-assisted UMUTEX_RB_OWNERDEAD state, we should skip > mutex_assert_not_owned(), same as in enqueue_mutex(). Thank you very much! The patch does help. I think that there's probably no good way to clean up m_qe. > diff --git a/lib/libthr/thread/thr_mutex.c b/lib/libthr/thread/thr_mutex.c > index dc09f539add..57984ef6d0e 100644 > --- a/lib/libthr/thread/thr_mutex.c > +++ b/lib/libthr/thread/thr_mutex.c > @@ -474,7 +474,11 @@ _thr_mutex_destroy(pthread_mutex_t *mutex) > if (m == THR_PSHARED_PTR) { > m1 = __thr_pshared_offpage(mutex, 0); > if (m1 != NULL) { > - mutex_assert_not_owned(_get_curthread(), m1); > + if ((uint32_t)m1->m_lock.m_owner != > + UMUTEX_RB_OWNERDEAD) { > + mutex_assert_not_owned( > + _get_curthread(), m1); > + } > __thr_pshared_destroy(mutex); > } > *mutex = THR_MUTEX_DESTROYED; > -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a117f59a-6b36-bda7-b400-fae4280fce98>