Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Jun 2016 07:29:56 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Mark Johnston <markj@FreeBSD.org>
Cc:        freebsd-current@FreeBSD.org, cem@FreeBSD.org
Subject:   Re: thread suspension when dumping core
Message-ID:  <20160607042956.GM38613@kib.kiev.ua>
In-Reply-To: <20160607041741.GA29017@wkstn-mjohnston.west.isilon.com>
References:  <20160604022347.GA1096@wkstn-mjohnston.west.isilon.com> <20160604093236.GA38613@kib.kiev.ua> <20160606171311.GC10101@wkstn-mjohnston.west.isilon.com> <20160607024610.GI38613@kib.kiev.ua> <20160607041741.GA29017@wkstn-mjohnston.west.isilon.com>

index | next in thread | previous in thread | raw e-mail

On Mon, Jun 06, 2016 at 09:17:41PM -0700, Mark Johnston wrote:
> Sure, see below. For reference:
> 
> td_flags = 0xa84c = INMEM | SINTR | CANSWAP | ASTPENDING | SBDRY | NEEDSUSPCHK
> td_pflags = 0
> td_inhibitors = 0x2 = SLEEPING
> td_locks = 0
> 
> stack:
> mi_switch+0x21e sleepq_catch_signals+0x377 sleepq_wait_sig+0xb _sleep+0x29d ...
> 
> p_flag = 0x10080080 = INMEM | STOPPED_SINGLE | HADTHREADS
> p_flag2 = 0
> 
> The thread is sleeping interruptibly. The NEEDSUSPCHK flag is set, yet the
> SLEEPABORT flag is not, so the thread can not have been sleeping when
> thread_single() was called - else sleepq_abort() would have been
> invoked and set SLEEPABORT. We are at the second sleepq_switch() call in
> sleepq_catch_signals(), and no signal was pending, so we called
> thread_suspend_check(), which returned 0 because of SBDRY. So we went to
> sleep.
> 
> I note that this couldn't have happened prior to r283320. That change
> was apparently motivated by a similar hang, but in that case the thread
> was suspended (with a vnode lock held) rather than asleep. It looks like
> our internal fix also added a change to set TDF_SBDRY around
> filesystem-specific syscalls, which often sleep interruptibly while
> holding vnode locks. But I don't think that's the problem here, as you
> noted with lf_advlock().
> 
> With r283320 reverted, P_STOPPED_SIG would not have been set, so
> thread_suspend_check() would have suspended us, allowing the core dump
> to proceed. I had thought that using SINGLE_BOUNDRY beforing coredumping
> would fix both hangs, but I guess that wouldn't help SINGLE_ALLPROC, so
> this is probably the wrong place to be solving the problem.

This looks as if we should not ignore suspension requests in
thread_suspend_check() completely in TDF_SBDRY case, but return either
EINTR or ERESTART (most likely ERESTART). Note that the goal of
TDF_SBDRY is to avoid suspending in the protected region, not to make an
impression that the suspension does not occur at all.


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160607042956.GM38613>