From owner-freebsd-current@freebsd.org Wed Jun 15 17:45:33 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 95667A47918 for ; Wed, 15 Jun 2016 17:45:33 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3DADB16BE for ; Wed, 15 Jun 2016 17:45:33 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u5FHjOgY042028 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 15 Jun 2016 20:45:25 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u5FHjOgY042028 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u5FHjOEm042025; Wed, 15 Jun 2016 20:45:24 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 15 Jun 2016 20:45:24 +0300 From: Konstantin Belousov To: Matthew Macy Cc: Peter Holm , Eric Badger , freebsd-current Subject: Re: Kqueue races causing crashes Message-ID: <20160615174524.GF38613@kib.kiev.ua> References: <34035bf6-8b3c-d15c-765b-94bcc919ea2e@badgerio.us> <20160615081143.GS38613@kib.kiev.ua> <20160615115000.GA23198@x2.osted.lan> <1555525b518.c9c704c026886.2375886287356557279@nextbsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <1555525b518.c9c704c026886.2375886287356557279@nextbsd.org> User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Jun 2016 17:45:33 -0000 On Wed, Jun 15, 2016 at 10:39:42AM -0700, Matthew Macy wrote: >=20 > =20 >=20 > =20 > You can use dwarf4 if you use GDB from ports How would it help ? Problem for kgdb is that %rip is zero, due to function pointer being set to NULL in a destroyed knlist. Either version of kgdb would not find neither code nor unwind annotations for zero address. But the issue is understood and we are working on the version of fix. =9A---- On Wed, 15 Jun 2016 04:50:00 -0700 Peter Holm wrote= ----On Wed, Jun 15, 2016 at 11:11:43AM +0300, Konstantin Belousov wrote: >= On Tue, Jun 14, 2016 at 10:26:14PM -0500, Eric Badger wrote: > > I believe= they all have more or less the same cause. The crashes occur > > because = we acquire a knlist lock via the KN_LIST_LOCK macro, but when we > > call = KN_LIST_UNLOCK, the knote???s knlist reference (kn->kn_knlist) has > > bee= n cleared by another thread. Thus we are unable to unlock the > > previous= ly acquired lock and hold it until something causes us to crash > > (such = as the witness code noticing that we???re returning to userland with > > t= he lock still held). > ... > > I believe there???s also a small window wher= e the KN_LIST_LOCK macro > > checks kn->kn_knlist and finds it to be non-N= ULL, but by the time it > > actually dereferences it, it has become NULL. = This would produce the > > ???page fault while in kernel mode??? crash. > = > > > If someone familiar with this code sees an obvious fix, I???ll be ha= ppy to > > test it. Otherwise, I???d appreciate any advice on fixing this.= My first > > thought is that a ???struct knote??? ought to have its own m= utex for > > controlling access to the flag fields and ideally the ???kn_k= nlist??? field. > > I.e., you would first acquire a knote???s lock and the= n the knlist lock, > > thus ensuring that no one could clear the kn_knlist= variable while you > > hold the knlist lock. The knlist lock, however, us= ually comes from > > whichever event producing entity the knote tracks, so= getting lock > > ordering right between the per-knote mutex and this othe= r lock seems > > potentially hard. (Sometimes we call into functions in ke= rn_event.c with > > the knlist lock already held, having been acquired in = code outside of > > kern_event.c. Consider, for example, calling KNOTE_LOC= KED from > > kern_exit.c; the PROC_LOCK macro has already been used to acq= uire the > > process lock, also serving=20 > =20 > =20 >=20 > =20 > =20 >=20