FreeBSD Mail Archives

Date:      Mon, 23 Sep 2019 15:48:10 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Andriy Gapon <avg@FreeBSD.org>
Cc:        freebsd-threads@FreeBSD.org
Subject:   Re: assertion when destroying a process shared mutex
Message-ID:  <20190923124810.GP2559@kib.kiev.ua>
In-Reply-To: <a117f59a-6b36-bda7-b400-fae4280fce98@FreeBSD.org>
References:  <6f6a16a3-8eca-ceb0-4ca3-aadf2d926f81@FreeBSD.org> <20190920173854.GJ2559@kib.kiev.ua> <a117f59a-6b36-bda7-b400-fae4280fce98@FreeBSD.org>

index | next in thread | previous in thread | raw e-mail


On Mon, Sep 23, 2019 at 03:09:16PM +0300, Andriy Gapon wrote:
> On 20/09/2019 20:38, Konstantin Belousov wrote:
> > On Fri, Sep 20, 2019 at 07:52:20PM +0300, Andriy Gapon wrote:
> >>
> >> Fatal error 'mutex 0x800661000 own 0x80000010 is on list 0x8006591a0 0x0' at
> >> line 153 in file /usr/src/lib/libthr/thread/thr_mutex.c (errno = 0)
> >>
> >> This happens with a mutex initialized with PTHREAD_PROCESS_SHARED,
> >> PTHREAD_MUTEX_ROBUST and PTHREAD_MUTEX_ERRORCHECK.
> >> The situation that leads to the abort seems to be this:
> >> - one process takes the lock and then crashes without releasing the lock
> >> - some time later another process does a cleanup and attempts to destroy the mutex
> >> That's where the assertion happens.
> >>
> >> Specifically, it seems that the assert is tripped if there are no other
> >> operations on the lock between the crash of one process and the destroy in the
> >> the other process.
> >>
> >> I wrote a small test program to demo the issue:
> >> https://people.freebsd.org/~avg/shared_mtx.c
> >>
> >> The state of the mutex in a crash dump is this:
> >> (gdb) p/x *(struct pthread_mutex *)0x800661000
> >> $6 = {m_lock = {m_owner = 0x80000010, m_flags = 0x11, m_ceilings = {0x0, 0x0},
> >> m_rb_lnk = 0x0, m_spare = {0x0, 0x0}}, m_flags = 0x1, m_count = 0x0, m_spinloops
> >> = 0x0, m_yieldloops = 0x0, m_ps = 0x2, m_qe = {tqe_next = 0x0,
> >>     tqe_prev = 0x8006591a0}, m_pqe = {tqe_next = 0x0, tqe_prev = 0x0}, m_rb_prev
> >> = 0x0}
> >>
> >> So, it's m_qe.tqe_prev != NULL that leads to the assert.
> > 
> > This is only relevant for robust mutexes, otherwise the behavior is
> > undefined if the owner terminates without unlocking it.  I believe that
> > in case of the kernel-assisted UMUTEX_RB_OWNERDEAD state, we should skip
> > mutex_assert_not_owned(), same as in enqueue_mutex().
> 
> Thank you very much!
> The patch does help.
> I think that there's probably no good way to clean up m_qe.

The state of robust mutexes is mostly recovered by kernel, but kernel only
knows about the umutex part of the struct pthread_mutex.  In fact, other
parts of libthr do the same as the patch: they ignore mutex linkage if
it is robust mutex recovered after the owner death.

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190923124810.GP2559>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation