From owner-freebsd-current@freebsd.org  Wed Jun 15 11:50:04 2016
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id BCE20A44170
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Wed, 15 Jun 2016 11:50:04 +0000 (UTC) (envelope-from pho@holm.cc)
Received: from relay01.pair.com (relay01.pair.com [209.68.5.15])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8161D175D
 for <freebsd-current@freebsd.org>; Wed, 15 Jun 2016 11:50:04 +0000 (UTC)
 (envelope-from pho@holm.cc)
Received: from x2.osted.lan (87-58-223-204-dynamic.dk.customer.tdc.net
 [87.58.223.204])
 by relay01.pair.com (Postfix) with ESMTP id 2B24AD006E7;
 Wed, 15 Jun 2016 07:50:02 -0400 (EDT)
Received: from x2.osted.lan (localhost [127.0.0.1])
 by x2.osted.lan (8.14.9/8.14.9) with ESMTP id u5FBo0PH023416
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 15 Jun 2016 13:50:00 +0200 (CEST)
 (envelope-from pho@x2.osted.lan)
Received: (from pho@localhost)
 by x2.osted.lan (8.14.9/8.14.9/Submit) id u5FBo0bI023403;
 Wed, 15 Jun 2016 13:50:00 +0200 (CEST) (envelope-from pho)
Date: Wed, 15 Jun 2016 13:50:00 +0200
From: Peter Holm <peter@holm.cc>
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: Eric Badger <eric@badgerio.us>,
 FreeBSD Current <freebsd-current@freebsd.org>
Subject: Re: Kqueue races causing crashes
Message-ID: <20160615115000.GA23198@x2.osted.lan>
References: <34035bf6-8b3c-d15c-765b-94bcc919ea2e@badgerio.us>
 <20160615081143.GS38613@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160615081143.GS38613@kib.kiev.ua>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Jun 2016 11:50:04 -0000

On Wed, Jun 15, 2016 at 11:11:43AM +0300, Konstantin Belousov wrote:
> On Tue, Jun 14, 2016 at 10:26:14PM -0500, Eric Badger wrote:
> > I believe they all have more or less the same cause. The crashes occur 
> > because we acquire a knlist lock via the KN_LIST_LOCK macro, but when we 
> > call KN_LIST_UNLOCK, the knote???s knlist reference (kn->kn_knlist) has 
> > been cleared by another thread. Thus we are unable to unlock the 
> > previously acquired lock and hold it until something causes us to crash 
> > (such as the witness code noticing that we???re returning to userland with 
> > the lock still held).
> ...
> > I believe there???s also a small window where the KN_LIST_LOCK macro 
> > checks kn->kn_knlist and finds it to be non-NULL, but by the time it 
> > actually dereferences it, it has become NULL. This would produce the 
> > ???page fault while in kernel mode??? crash.
> > 
> > If someone familiar with this code sees an obvious fix, I???ll be happy to 
> > test it. Otherwise, I???d appreciate any advice on fixing this. My first 
> > thought is that a ???struct knote??? ought to have its own mutex for 
> > controlling access to the flag fields and ideally the ???kn_knlist??? field. 
> > I.e., you would first acquire a knote???s lock and then the knlist lock, 
> > thus ensuring that no one could clear the kn_knlist variable while you 
> > hold the knlist lock. The knlist lock, however, usually comes from 
> > whichever event producing entity the knote tracks, so getting lock 
> > ordering right between the per-knote mutex and this other lock seems 
> > potentially hard. (Sometimes we call into functions in kern_event.c with 
> > the knlist lock already held, having been acquired in code outside of 
> > kern_event.c. Consider, for example, calling KNOTE_LOCKED from 
> > kern_exit.c; the PROC_LOCK macro has already been used to acquire the 
> > process lock, also serving as the knlist lock).
> This sounds as a good and correct analysis. I tried your test program
> for around a hour on 8-threads machine, but was not able to trigger the
> issue. Might be Peter have better luck reproducing them. Still, I think
> that the problem is there.
> 
> IMO we should simply avoid clearing kn_knlist in knlist_remove().  The
> member is only used to get the locking function pointers, otherwise
> code relies on KN_DETACHED flag to detect on-knlist condition.  See
> the patch below.
> 
> > 
> > Apropos of the knlist lock and its provenance: why is a lock from the 
> > event producing entity used to control access to the knlist and knote? 
> > Is it generally desirable to, for example, hold the process lock while 
> > operating on a knlist attached to that process? It???s not obvious to me 
> > that this is required or even desirable. This might suggest that a 
> > knlist should have its own lock rather than using a lock from the event 
> > producing entity, which might make addressing this problem more 
> > straightforward.
> 
> Consider the purpose of knlist. It serves as a container for all knotes
> registered on the given subsystem object, like all knotes of the socket,
> process etc which must be fired on event. See the knote() code. The
> consequence is that the subsystem which fires knote() typically already
> holds a lock protecting its own state. As result, it is natural to
> protect the list of the knotes to activate on subsystem event, by the
> subsystem lock.
> 
> diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
> index 0614903..3f45dca 100644
> --- a/sys/kern/kern_event.c

There is not much gdb info here; I'll try to rebuild kgdb.

https://people.freebsd.org/~pho/stress/log/kostik900.txt

The number of CPUs seems important to this test. Four works for me.

- Peter