From owner-freebsd-current@freebsd.org  Wed Jun 15 19:35:15 2016
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 07F1DA44381
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Wed, 15 Jun 2016 19:35:15 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from sender163-mail.zoho.com (sender163-mail.zoho.com
 [74.201.84.163])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id ED3A21440
 for <freebsd-current@freebsd.org>; Wed, 15 Jun 2016 19:35:14 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from mail.zoho.com by mx.zohomail.com
 with SMTP id 1466019299461487.1023295811839;
 Wed, 15 Jun 2016 12:34:59 -0700 (PDT)
Date: Wed, 15 Jun 2016 12:34:59 -0700
From: Matthew Macy <mmacy@nextbsd.org>
To: "Konstantin Belousov" <kostikbel@gmail.com>
Cc: "Peter Holm" <peter@holm.cc>, "Eric Badger" <eric@badgerio.us>, 
 "freebsd-current" <freebsd-current@freebsd.org>
Message-ID: <155558f403d.1142f02ed53991.7543987576640729131@nextbsd.org>
In-Reply-To: <20160615174524.GF38613@kib.kiev.ua>
References: <34035bf6-8b3c-d15c-765b-94bcc919ea2e@badgerio.us>
 <20160615081143.GS38613@kib.kiev.ua>
 <20160615115000.GA23198@x2.osted.lan>
 <1555525b518.c9c704c026886.2375886287356557279@nextbsd.org>
 <20160615174524.GF38613@kib.kiev.ua>
Subject: Re: Kqueue races causing crashes
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Priority: Medium
User-Agent: Zoho Mail
X-Mailer: Zoho Mail
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Jun 2016 19:35:15 -0000


 ---- On Wed, 15 Jun 2016 10:45:24 -0700 Konstantin Belousov <kostikbel@gma=
il.com> wrote ----=20
 > On Wed, Jun 15, 2016 at 10:39:42AM -0700, Matthew Macy wrote:=20
 > > =20
 > >         =20
 > > =20
 > >         =20
 > >             You can use dwarf4 if you use GDB from ports=20
 > How would it help ?=20

The following statement to a  native speaker would imply that GDB is the pr=
oblem: "There is not much gdb info here; I'll try to rebuild kgdb."

If in fact %rip has been smashed that's a bit like saying "the light doesn'=
t show anything on the table, I'll replace the light bulb" - when in fact t=
here isn't anything on the table. =20

 > Problem for kgdb is that %rip is zero, due to function pointer being set=
=20
 > to NULL in a destroyed knlist.  Either version of kgdb would not find=20
 > neither code nor unwind annotations for zero address.=20
 > =20
 > But the issue is understood and=20

Yes. Since the initial e-mail.


> we are working on the version of fix.=20

I'm glad you're on it.

-M


 > =20
 >  ---- On Wed, 15 Jun 2016 04:50:00 -0700  Peter Holm<peter@holm.cc> wrot=
e ----On Wed, Jun 15, 2016 at 11:11:43AM +0300, Konstantin Belousov wrote: =
> On Tue, Jun 14, 2016 at 10:26:14PM -0500, Eric Badger wrote: > > I believ=
e they all have more or less the same cause. The crashes occur  > > because=
 we acquire a knlist lock via the KN_LIST_LOCK macro, but when we  > > call=
 KN_LIST_UNLOCK, the knote???s knlist reference (kn->kn_knlist) has  > > be=
en cleared by another thread. Thus we are unable to unlock the  > > previou=
sly acquired lock and hold it until something causes us to crash  > > (such=
 as the witness code noticing that we???re returning to userland with  > > =
the lock still held). > ... > > I believe there???s also a small window whe=
re the KN_LIST_LOCK macro  > > checks kn->kn_knlist and finds it to be non-=
NULL, but by the time it  > > actually dereferences it, it has become NULL.=
 This would produce the  > > ???page fault while in kernel mode??? crash. >=
 >  > > If someone familiar with this code sees an obvious fix, I???ll be h=
appy to  > > test it. Otherwise, I???d appreciate any advice on fixing this=
. My first  > > thought is that a ???struct knote??? ought to have its own =
mutex for  > > controlling access to the flag fields and ideally the ???kn_=
knlist??? field.  > > I.e., you would first acquire a knote???s lock and th=
en the knlist lock,  > > thus ensuring that no one could clear the kn_knlis=
t variable while you  > > hold the knlist lock. The knlist lock, however, u=
sually comes from  > > whichever event producing entity the knote tracks, s=
o getting lock  > > ordering right between the per-knote mutex and this oth=
er lock seems  > > potentially hard. (Sometimes we call into functions in k=
ern_event.c with  > > the knlist lock already held, having been acquired in=
 code outside of  > > kern_event.c. Consider, for example, calling KNOTE_LO=
CKED from  > > kern_exit.c; the PROC_LOCK macro has already been used to ac=
quire the  > > process lock, also serving =20
 > >         =20
 > >         =20
 > > =20
 > >     =20
 > >     =20
 > > =20
 >=20