From owner-freebsd-current@freebsd.org  Wed Jun 15 17:39:53 2016
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 08719A4775D
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Wed, 15 Jun 2016 17:39:53 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from sender163-mail.zoho.com (sender163-mail.zoho.com
 [74.201.84.163])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id EC2CD1149
 for <freebsd-current@freebsd.org>; Wed, 15 Jun 2016 17:39:52 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from mail.zoho.com by mx.zohomail.com
 with SMTP id 1466012382593652.1472860069562;
 Wed, 15 Jun 2016 10:39:42 -0700 (PDT)
Date: Wed, 15 Jun 2016 10:39:42 -0700
From: Matthew Macy <mmacy@nextbsd.org>
To: "Peter Holm" <peter@holm.cc>
Cc: "Konstantin Belousov" <kostikbel@gmail.com>, 
 "Eric Badger" <eric@badgerio.us>, 
 "freebsd-current" <freebsd-current@freebsd.org>
Message-ID: <1555525b518.c9c704c026886.2375886287356557279@nextbsd.org>
In-Reply-To: <20160615115000.GA23198@x2.osted.lan>
References: <34035bf6-8b3c-d15c-765b-94bcc919ea2e@badgerio.us>
 <20160615081143.GS38613@kib.kiev.ua> <20160615115000.GA23198@x2.osted.lan>
Subject: Re: Kqueue races causing crashes
MIME-Version: 1.0
User-Agent: Zoho Mail
X-Mailer: Zoho Mail
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Jun 2016 17:39:53 -0000


       =20

       =20
            You can use dwarf4 if you use GDB from ports=C2=A0---- On Wed, =
15 Jun 2016 04:50:00 -0700  Peter Holm<peter@holm.cc> wrote ----On Wed, Jun=
 15, 2016 at 11:11:43AM +0300, Konstantin Belousov wrote: > On Tue, Jun 14,=
 2016 at 10:26:14PM -0500, Eric Badger wrote: > > I believe they all have m=
ore or less the same cause. The crashes occur  > > because we acquire a knl=
ist lock via the KN_LIST_LOCK macro, but when we  > > call KN_LIST_UNLOCK, =
the knote???s knlist reference (kn->kn_knlist) has  > > been cleared by ano=
ther thread. Thus we are unable to unlock the  > > previously acquired lock=
 and hold it until something causes us to crash  > > (such as the witness c=
ode noticing that we???re returning to userland with  > > the lock still he=
ld). > ... > > I believe there???s also a small window where the KN_LIST_LO=
CK macro  > > checks kn->kn_knlist and finds it to be non-NULL, but by the =
time it  > > actually dereferences it, it has become NULL. This would produ=
ce the  > > ???page fault while in kernel mode??? crash. > >  > > If someon=
e familiar with this code sees an obvious fix, I???ll be happy to  > > test=
 it. Otherwise, I???d appreciate any advice on fixing this. My first  > > t=
hought is that a ???struct knote??? ought to have its own mutex for  > > co=
ntrolling access to the flag fields and ideally the ???kn_knlist??? field. =
 > > I.e., you would first acquire a knote???s lock and then the knlist loc=
k,  > > thus ensuring that no one could clear the kn_knlist variable while =
you  > > hold the knlist lock. The knlist lock, however, usually comes from=
  > > whichever event producing entity the knote tracks, so getting lock  >=
 > ordering right between the per-knote mutex and this other lock seems  > =
> potentially hard. (Sometimes we call into functions in kern_event.c with =
 > > the knlist lock already held, having been acquired in code outside of =
 > > kern_event.c. Consider, for example, calling KNOTE_LOCKED from  > > ke=
rn_exit.c; the PROC_LOCK macro has already been used to acquire the  > > pr=
ocess lock, also serving as the knlist lock). > This sounds as a good and c=
orrect analysis. I tried your test program > for around a hour on 8-threads=
 machine, but was not able to trigger the > issue. Might be Peter have bett=
er luck reproducing them. Still, I think > that the problem is there. >  > =
IMO we should simply avoid clearing kn_knlist in knlist_remove().  The > me=
mber is only used to get the locking function pointers, otherwise > code re=
lies on KN_DETACHED flag to detect on-knlist condition.  See > the patch be=
low. >  > >  > > Apropos of the knlist lock and its provenance: why is a lo=
ck from the  > > event producing entity used to control access to the knlis=
t and knote?  > > Is it generally desirable to, for example, hold the proce=
ss lock while  > > operating on a knlist attached to that process? It???s n=
ot obvious to me  > > that this is required or even desirable. This might s=
uggest that a  > > knlist should have its own lock rather than using a lock=
 from the event  > > producing entity, which might make addressing this pro=
blem more  > > straightforward. >  > Consider the purpose of knlist. It ser=
ves as a container for all knotes > registered on the given subsystem objec=
t, like all knotes of the socket, > process etc which must be fired on even=
t. See the knote() code. The > consequence is that the subsystem which fire=
s knote() typically already > holds a lock protecting its own state. As res=
ult, it is natural to > protect the list of the knotes to activate on subsy=
stem event, by the > subsystem lock. >  > diff --git a/sys/kern/kern_event.=
c b/sys/kern/kern_event.c > index 0614903..3f45dca 100644 > --- a/sys/kern/=
kern_event.c  There is not much gdb info here; I'll try to rebuild kgdb.  h=
ttps://people.freebsd.org/~pho/stress/log/kostik900.txt  The number of CPUs=
 seems important to this test. Four works for me.  - Peter ________________=
_______________________________ freebsd-current@freebsd.org mailing list ht=
tps://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, se=
nd any mail to "freebsd-current-unsubscribe@freebsd.org"=20
       =20
       =20

   =20
   =20