From owner-freebsd-current@freebsd.org Tue Jun 7 04:30:02 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 44EDAB6D922 for ; Tue, 7 Jun 2016 04:30:02 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D3BD216DF; Tue, 7 Jun 2016 04:30:01 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u574TuDL040362 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 7 Jun 2016 07:29:57 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u574TuDL040362 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u574TuQc040361; Tue, 7 Jun 2016 07:29:56 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 7 Jun 2016 07:29:56 +0300 From: Konstantin Belousov To: Mark Johnston Cc: freebsd-current@FreeBSD.org, cem@FreeBSD.org Subject: Re: thread suspension when dumping core Message-ID: <20160607042956.GM38613@kib.kiev.ua> References: <20160604022347.GA1096@wkstn-mjohnston.west.isilon.com> <20160604093236.GA38613@kib.kiev.ua> <20160606171311.GC10101@wkstn-mjohnston.west.isilon.com> <20160607024610.GI38613@kib.kiev.ua> <20160607041741.GA29017@wkstn-mjohnston.west.isilon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160607041741.GA29017@wkstn-mjohnston.west.isilon.com> User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2016 04:30:02 -0000 On Mon, Jun 06, 2016 at 09:17:41PM -0700, Mark Johnston wrote: > Sure, see below. For reference: > > td_flags = 0xa84c = INMEM | SINTR | CANSWAP | ASTPENDING | SBDRY | NEEDSUSPCHK > td_pflags = 0 > td_inhibitors = 0x2 = SLEEPING > td_locks = 0 > > stack: > mi_switch+0x21e sleepq_catch_signals+0x377 sleepq_wait_sig+0xb _sleep+0x29d ... > > p_flag = 0x10080080 = INMEM | STOPPED_SINGLE | HADTHREADS > p_flag2 = 0 > > The thread is sleeping interruptibly. The NEEDSUSPCHK flag is set, yet the > SLEEPABORT flag is not, so the thread can not have been sleeping when > thread_single() was called - else sleepq_abort() would have been > invoked and set SLEEPABORT. We are at the second sleepq_switch() call in > sleepq_catch_signals(), and no signal was pending, so we called > thread_suspend_check(), which returned 0 because of SBDRY. So we went to > sleep. > > I note that this couldn't have happened prior to r283320. That change > was apparently motivated by a similar hang, but in that case the thread > was suspended (with a vnode lock held) rather than asleep. It looks like > our internal fix also added a change to set TDF_SBDRY around > filesystem-specific syscalls, which often sleep interruptibly while > holding vnode locks. But I don't think that's the problem here, as you > noted with lf_advlock(). > > With r283320 reverted, P_STOPPED_SIG would not have been set, so > thread_suspend_check() would have suspended us, allowing the core dump > to proceed. I had thought that using SINGLE_BOUNDRY beforing coredumping > would fix both hangs, but I guess that wouldn't help SINGLE_ALLPROC, so > this is probably the wrong place to be solving the problem. This looks as if we should not ignore suspension requests in thread_suspend_check() completely in TDF_SBDRY case, but return either EINTR or ERESTART (most likely ERESTART). Note that the goal of TDF_SBDRY is to avoid suspending in the protected region, not to make an impression that the suspension does not occur at all.