From owner-freebsd-current@freebsd.org Wed Jun 8 21:17:48 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B55DAB700F4 for ; Wed, 8 Jun 2016 21:17:48 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 49E271FF9; Wed, 8 Jun 2016 21:17:48 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 37E9C358C57; Wed, 8 Jun 2016 23:17:45 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id 16AB928494; Wed, 8 Jun 2016 23:17:45 +0200 (CEST) Date: Wed, 8 Jun 2016 23:17:44 +0200 From: Jilles Tjoelker To: Konstantin Belousov Cc: Mark Johnston , freebsd-current@FreeBSD.org, cem@FreeBSD.org Subject: Re: thread suspension when dumping core Message-ID: <20160608211744.GB56821@stack.nl> References: <20160606171311.GC10101@wkstn-mjohnston.west.isilon.com> <20160607024610.GI38613@kib.kiev.ua> <20160607041741.GA29017@wkstn-mjohnston.west.isilon.com> <20160607042956.GM38613@kib.kiev.ua> <20160607142452.GA48251@stack.nl> <20160607160155.GP38613@kib.kiev.ua> <20160607211919.GA49961@stack.nl> <20160608043055.GV38613@kib.kiev.ua> <20160608133508.GA93263@charmander> <20160608135635.GY38613@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160608135635.GY38613@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jun 2016 21:17:48 -0000 On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote: > On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote: > > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote: > > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote: > > > > I also wonder whether we may be overengineering things here. Perhaps > > > > the advlock sleep can simply turn off TDF_SBDRY. > > > Well, this was the very first patch suggested. I would be fine with that, > > > but again, out-of-tree code seems to be not quite fine with that local > > > solution. > > In our particular case, we could possibly use a similar approach. In > > general, it seems incorrect to clear TDF_SBDRY if the thread calling > > sx_sleep() has any locks held. It is easy to verify that all callers of > > lf_advlock() are safe in this respect, but this kind of auditing is > > generally hard. In fact, I believe the sx_sleep that led to the problem > > described in D2612 is the same as the one in my case. That is, the > > sleeping thread may or may not hold a vnode lock depending on context. > I do not think that in-tree code sleeps with a vnode lock held in > the lf_advlock(). Otherwise, system would hang in lock cascade by > an attempt to obtain an advisory lock. I think we can even assert > this with witness. > There is another sleep, which Jilles mentioned, in lf_purgelocks(), > called from vgone(). This sleep indeed occurs under the vnode lock, and > as such must be non-suspendable. The sleep waits until other threads > leave the lf_advlock() for the reclaimed vnode, and they should leave in > deterministic time due to issued wakeups. So this sleep is exempt from > the considerations, and TDF_SBDRY there is correct. > I am fine with either the braces around sx_sleep() in lf_advlock() to > clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent, > which adds temporal override for TDF_SBDRY with TDF_SRESTART. My > understanding is that you prefer the later. If I do not mis-represent > your position, I understand why you do prefer that. The TDF_SRESTART change does fix some more problems such as umount -f getting stuck in lf_purgelocks(). However, it introduces some subtle issues that may not necessarily be a sufficient objection. Firstly, adding this closes the door on fixing signal handling for fcntl(F_SETLKW). Per POSIX, any caught signal interrupts fcntl(F_SETLKW), even if SA_RESTART is set for the signal, and the Linux man page documents the same. Our man page has documented that SA_RESTART behaves normally with fcntl(F_SETLKW) since at least FreeBSD 2.0. This could normally be fixed via if (error == ERESTART) error = EINTR; but that is no longer possible if there are [ERESTART] errors that should still restart. Secondly, fcntl(F_SETLKW) restarting after a stop may actually be observable, contrary to what I wrote before. This is due to the fair queuing. Suppose thread A has locked byte 1 a while ago and thread B is trying to lock byte 1 and 2 right now. Then thread C will be able to lock byte 2 iff thread B has not blocked yet. If thread C will not be allowed to lock byte 2 and will block on it, the TDF_SRESTART change will cause it to be awakened if thread B is stopped. When thread B resumes, the region to be locked will be recomputed. This scenario unambiguously violates the POSIX requirement but I don't know how bad it is. Note that all these threads must be in separate processes because of fcntl locks' strange semantics. -- Jilles Tjoelker