Date: Tue, 7 Jun 2016 07:29:56 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Mark Johnston <markj@FreeBSD.org> Cc: freebsd-current@FreeBSD.org, cem@FreeBSD.org Subject: Re: thread suspension when dumping core Message-ID: <20160607042956.GM38613@kib.kiev.ua> In-Reply-To: <20160607041741.GA29017@wkstn-mjohnston.west.isilon.com> References: <20160604022347.GA1096@wkstn-mjohnston.west.isilon.com> <20160604093236.GA38613@kib.kiev.ua> <20160606171311.GC10101@wkstn-mjohnston.west.isilon.com> <20160607024610.GI38613@kib.kiev.ua> <20160607041741.GA29017@wkstn-mjohnston.west.isilon.com>
index | next in thread | previous in thread | raw e-mail
On Mon, Jun 06, 2016 at 09:17:41PM -0700, Mark Johnston wrote: > Sure, see below. For reference: > > td_flags = 0xa84c = INMEM | SINTR | CANSWAP | ASTPENDING | SBDRY | NEEDSUSPCHK > td_pflags = 0 > td_inhibitors = 0x2 = SLEEPING > td_locks = 0 > > stack: > mi_switch+0x21e sleepq_catch_signals+0x377 sleepq_wait_sig+0xb _sleep+0x29d ... > > p_flag = 0x10080080 = INMEM | STOPPED_SINGLE | HADTHREADS > p_flag2 = 0 > > The thread is sleeping interruptibly. The NEEDSUSPCHK flag is set, yet the > SLEEPABORT flag is not, so the thread can not have been sleeping when > thread_single() was called - else sleepq_abort() would have been > invoked and set SLEEPABORT. We are at the second sleepq_switch() call in > sleepq_catch_signals(), and no signal was pending, so we called > thread_suspend_check(), which returned 0 because of SBDRY. So we went to > sleep. > > I note that this couldn't have happened prior to r283320. That change > was apparently motivated by a similar hang, but in that case the thread > was suspended (with a vnode lock held) rather than asleep. It looks like > our internal fix also added a change to set TDF_SBDRY around > filesystem-specific syscalls, which often sleep interruptibly while > holding vnode locks. But I don't think that's the problem here, as you > noted with lf_advlock(). > > With r283320 reverted, P_STOPPED_SIG would not have been set, so > thread_suspend_check() would have suspended us, allowing the core dump > to proceed. I had thought that using SINGLE_BOUNDRY beforing coredumping > would fix both hangs, but I guess that wouldn't help SINGLE_ALLPROC, so > this is probably the wrong place to be solving the problem. This looks as if we should not ignore suspension requests in thread_suspend_check() completely in TDF_SBDRY case, but return either EINTR or ERESTART (most likely ERESTART). Note that the goal of TDF_SBDRY is to avoid suspending in the protected region, not to make an impression that the suspension does not occur at all.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160607042956.GM38613>
