From owner-freebsd-current@freebsd.org Mon Jun 27 21:00:04 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DF04FB85857 for ; Mon, 27 Jun 2016 21:00:04 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-oi0-x22e.google.com (mail-oi0-x22e.google.com [IPv6:2607:f8b0:4003:c06::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A214D2285 for ; Mon, 27 Jun 2016 21:00:04 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-oi0-x22e.google.com with SMTP id f189so217796768oig.3 for ; Mon, 27 Jun 2016 14:00:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-transfer-encoding; bh=zv6JN9HAdO0ZUCg6eJvQ2wORaoNezV9tHZeTfIpbJ1I=; b=HRN1q2mra4o9RvpMNcDpGmAgp3hK12N8txBoV+a/v6VEkovaQDhxY0bxAIyH60aMer vUVAemfoducBZNkcP5hdPLyoWrk4IDV9UZMxvDrRP71TvbWs1w6PXyTRpz1zTsNEt5j0 pTAprwsOoOigwZVbGG2ZHyDUDW4HfJC3tXEhn+qoHYiMsd2g0ZwAJR09n4bB13yzrniz 1jMnGKxfs00oyfJjgmxI7j6zTG/QUyBcKGwOQsEXGZ+2G+nTb50/QDMkJzq8vA0sFZQC XI/y/mioABeIzu9rqvyX6mcr4PTSVpYNJds4TwvehNg9gV8c2G+M/IXjVWAitJ92M0Ks 3Npg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-transfer-encoding; bh=zv6JN9HAdO0ZUCg6eJvQ2wORaoNezV9tHZeTfIpbJ1I=; b=cszZuAO45XugjtA49MBKn89U+wl3Ks4CLj8E0/8+bFtgMMJJDSOuFqAjBvgi7wK4Ls y24+kfVo/O4jIQt35eWZf1rBAXIq10Chr5lkuOAhOewuG7pVW5yPQLqBTAw70JUH01Fn QrA6X3K2ouIPKmF3pXnCIidLFotke9hDFWxVyMCP4EyMLbGCSdKCw7WAsrNiKGd8YsPD LMYpYCt0KaUTCwBzIQSmcjFqLEoruB/k6G1XfR7g5LONVAVQwRsPZePMM26xKiB1BNAl DbJS0pFLH5Z8MKEPPGOJFVjGcErH5rOdHh9UZKdZGaBw5e2leOc/D2M3aCSjAnknMy1R h29w== X-Gm-Message-State: ALyK8tJodPBFLgCJcfJ8AjEoYEoBSTCZYfA08BPbAft7mT1gg4KAuH0pGXQkWnljweU001eVwmOACEwoo6VyTQ== X-Received: by 10.157.32.79 with SMTP id n73mr2147546ota.108.1467061203808; Mon, 27 Jun 2016 14:00:03 -0700 (PDT) MIME-Version: 1.0 Sender: asomers@gmail.com Received: by 10.202.168.149 with HTTP; Mon, 27 Jun 2016 14:00:02 -0700 (PDT) In-Reply-To: <155558f403d.1142f02ed53991.7543987576640729131@nextbsd.org> References: <34035bf6-8b3c-d15c-765b-94bcc919ea2e@badgerio.us> <20160615081143.GS38613@kib.kiev.ua> <20160615115000.GA23198@x2.osted.lan> <1555525b518.c9c704c026886.2375886287356557279@nextbsd.org> <20160615174524.GF38613@kib.kiev.ua> <155558f403d.1142f02ed53991.7543987576640729131@nextbsd.org> From: Alan Somers Date: Mon, 27 Jun 2016 15:00:02 -0600 X-Google-Sender-Auth: QVfntyWcCYEIcykLMunDy3JTFwY Message-ID: Subject: Re: Kqueue races causing crashes To: Matthew Macy Cc: Konstantin Belousov , Peter Holm , Eric Badger , freebsd-current Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jun 2016 21:00:05 -0000 I opened PR210641 to track this after I hit it on i386 during the sys/kqueue/kqueue_test:main ATF test. I hit the panic two times in 9 tries. -Alan On Wed, Jun 15, 2016 at 1:34 PM, Matthew Macy wrote: > > > > ---- On Wed, 15 Jun 2016 10:45:24 -0700 Konstantin Belousov wrote ---- > > On Wed, Jun 15, 2016 at 10:39:42AM -0700, Matthew Macy wrote: > > > > > > > > > > > > > > > You can use dwarf4 if you use GDB from ports > > How would it help ? > > The following statement to a native speaker would imply that GDB is the = problem: "There is not much gdb info here; I'll try to rebuild kgdb." > > If in fact %rip has been smashed that's a bit like saying "the light does= n't show anything on the table, I'll replace the light bulb" - when in fact= there isn't anything on the table. > > > Problem for kgdb is that %rip is zero, due to function pointer being s= et > > to NULL in a destroyed knlist. Either version of kgdb would not find > > neither code nor unwind annotations for zero address. > > > > But the issue is understood and > > Yes. Since the initial e-mail. > > >> we are working on the version of fix. > > I'm glad you're on it. > > -M > > > > > > > ---- On Wed, 15 Jun 2016 04:50:00 -0700 Peter Holm wr= ote ----On Wed, Jun 15, 2016 at 11:11:43AM +0300, Konstantin Belousov wrote= : > On Tue, Jun 14, 2016 at 10:26:14PM -0500, Eric Badger wrote: > > I beli= eve they all have more or less the same cause. The crashes occur > > becau= se we acquire a knlist lock via the KN_LIST_LOCK macro, but when we > > ca= ll KN_LIST_UNLOCK, the knote???s knlist reference (kn->kn_knlist) has > > = been cleared by another thread. Thus we are unable to unlock the > > previ= ously acquired lock and hold it until something causes us to crash > > (su= ch as the witness code noticing that we???re returning to userland with > = > the lock still held). > ... > > I believe there???s also a small window w= here the KN_LIST_LOCK macro > > checks kn->kn_knlist and finds it to be no= n-NULL, but by the time it > > actually dereferences it, it has become NUL= L. This would produce the > > ???page fault while in kernel mode??? crash.= > > > > If someone fami > liar with this code sees an obvious fix, I???ll be happy to > > test it= . Otherwise, I???d appreciate any advice on fixing this. My first > > thou= ght is that a ???struct knote??? ought to have its own mutex for > > contr= olling access to the flag fields and ideally the ???kn_knlist??? field. > = > I.e., you would first acquire a knote???s lock and then the knlist lock, = > > thus ensuring that no one could clear the kn_knlist variable while you= > > hold the knlist lock. The knlist lock, however, usually comes from >= > whichever event producing entity the knote tracks, so getting lock > > = ordering right between the per-knote mutex and this other lock seems > > p= otentially hard. (Sometimes we call into functions in kern_event.c with > = > the knlist lock already held, having been acquired in code outside of > = > kern_event.c. Consider, for example, calling KNOTE_LOCKED from > > kern_= exit.c; the PROC_LOCK macro has already been used to acquire the > > proce= ss lock, also serving > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= "