Date: Tue, 7 Jun 2016 07:29:56 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Mark Johnston <markj@FreeBSD.org> Cc: freebsd-current@FreeBSD.org, cem@FreeBSD.org Subject: Re: thread suspension when dumping core Message-ID: <20160607042956.GM38613@kib.kiev.ua> In-Reply-To: <20160607041741.GA29017@wkstn-mjohnston.west.isilon.com> References: <20160604022347.GA1096@wkstn-mjohnston.west.isilon.com> <20160604093236.GA38613@kib.kiev.ua> <20160606171311.GC10101@wkstn-mjohnston.west.isilon.com> <20160607024610.GI38613@kib.kiev.ua> <20160607041741.GA29017@wkstn-mjohnston.west.isilon.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 06, 2016 at 09:17:41PM -0700, Mark Johnston wrote: > Sure, see below. For reference: > > td_flags = 0xa84c = INMEM | SINTR | CANSWAP | ASTPENDING | SBDRY | NEEDSUSPCHK > td_pflags = 0 > td_inhibitors = 0x2 = SLEEPING > td_locks = 0 > > stack: > mi_switch+0x21e sleepq_catch_signals+0x377 sleepq_wait_sig+0xb _sleep+0x29d ... > > p_flag = 0x10080080 = INMEM | STOPPED_SINGLE | HADTHREADS > p_flag2 = 0 > > The thread is sleeping interruptibly. The NEEDSUSPCHK flag is set, yet the > SLEEPABORT flag is not, so the thread can not have been sleeping when > thread_single() was called - else sleepq_abort() would have been > invoked and set SLEEPABORT. We are at the second sleepq_switch() call in > sleepq_catch_signals(), and no signal was pending, so we called > thread_suspend_check(), which returned 0 because of SBDRY. So we went to > sleep. > > I note that this couldn't have happened prior to r283320. That change > was apparently motivated by a similar hang, but in that case the thread > was suspended (with a vnode lock held) rather than asleep. It looks like > our internal fix also added a change to set TDF_SBDRY around > filesystem-specific syscalls, which often sleep interruptibly while > holding vnode locks. But I don't think that's the problem here, as you > noted with lf_advlock(). > > With r283320 reverted, P_STOPPED_SIG would not have been set, so > thread_suspend_check() would have suspended us, allowing the core dump > to proceed. I had thought that using SINGLE_BOUNDRY beforing coredumping > would fix both hangs, but I guess that wouldn't help SINGLE_ALLPROC, so > this is probably the wrong place to be solving the problem. This looks as if we should not ignore suspension requests in thread_suspend_check() completely in TDF_SBDRY case, but return either EINTR or ERESTART (most likely ERESTART). Note that the goal of TDF_SBDRY is to avoid suspending in the protected region, not to make an impression that the suspension does not occur at all.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160607042956.GM38613>