Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Jun 2016 15:00:02 -0600
From:      Alan Somers <asomers@freebsd.org>
To:        Matthew Macy <mmacy@nextbsd.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, Peter Holm <peter@holm.cc>, Eric Badger <eric@badgerio.us>,  freebsd-current <freebsd-current@freebsd.org>
Subject:   Re: Kqueue races causing crashes
Message-ID:  <CAOtMX2i-BLMZizfshEF%2BM7BiuS8UWWgxBYog6gHmviZExSfy6Q@mail.gmail.com>
In-Reply-To: <155558f403d.1142f02ed53991.7543987576640729131@nextbsd.org>
References:  <34035bf6-8b3c-d15c-765b-94bcc919ea2e@badgerio.us> <20160615081143.GS38613@kib.kiev.ua> <20160615115000.GA23198@x2.osted.lan> <1555525b518.c9c704c026886.2375886287356557279@nextbsd.org> <20160615174524.GF38613@kib.kiev.ua> <155558f403d.1142f02ed53991.7543987576640729131@nextbsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
I opened PR210641 to track this after I hit it on i386 during the
sys/kqueue/kqueue_test:main ATF test.  I hit the panic two times in 9
tries.
-Alan

On Wed, Jun 15, 2016 at 1:34 PM, Matthew Macy <mmacy@nextbsd.org> wrote:
>
>
>
>  ---- On Wed, 15 Jun 2016 10:45:24 -0700 Konstantin Belousov <kostikbel@g=
mail.com> wrote ----
>  > On Wed, Jun 15, 2016 at 10:39:42AM -0700, Matthew Macy wrote:
>  > >
>  > >
>  > >
>  > >
>  > >             You can use dwarf4 if you use GDB from ports
>  > How would it help ?
>
> The following statement to a  native speaker would imply that GDB is the =
problem: "There is not much gdb info here; I'll try to rebuild kgdb."
>
> If in fact %rip has been smashed that's a bit like saying "the light does=
n't show anything on the table, I'll replace the light bulb" - when in fact=
 there isn't anything on the table.
>
>  > Problem for kgdb is that %rip is zero, due to function pointer being s=
et
>  > to NULL in a destroyed knlist.  Either version of kgdb would not find
>  > neither code nor unwind annotations for zero address.
>  >
>  > But the issue is understood and
>
> Yes. Since the initial e-mail.
>
>
>> we are working on the version of fix.
>
> I'm glad you're on it.
>
> -M
>
>
>
>  >
>  >  ---- On Wed, 15 Jun 2016 04:50:00 -0700  Peter Holm<peter@holm.cc> wr=
ote ----On Wed, Jun 15, 2016 at 11:11:43AM +0300, Konstantin Belousov wrote=
: > On Tue, Jun 14, 2016 at 10:26:14PM -0500, Eric Badger wrote: > > I beli=
eve they all have more or less the same cause. The crashes occur  > > becau=
se we acquire a knlist lock via the KN_LIST_LOCK macro, but when we  > > ca=
ll KN_LIST_UNLOCK, the knote???s knlist reference (kn->kn_knlist) has  > > =
been cleared by another thread. Thus we are unable to unlock the  > > previ=
ously acquired lock and hold it until something causes us to crash  > > (su=
ch as the witness code noticing that we???re returning to userland with  > =
> the lock still held). > ... > > I believe there???s also a small window w=
here the KN_LIST_LOCK macro  > > checks kn->kn_knlist and finds it to be no=
n-NULL, but by the time it  > > actually dereferences it, it has become NUL=
L. This would produce the  > > ???page fault while in kernel mode??? crash.=
 > >  > > If someone fami
>  liar with this code sees an obvious fix, I???ll be happy to  > > test it=
. Otherwise, I???d appreciate any advice on fixing this. My first  > > thou=
ght is that a ???struct knote??? ought to have its own mutex for  > > contr=
olling access to the flag fields and ideally the ???kn_knlist??? field.  > =
> I.e., you would first acquire a knote???s lock and then the knlist lock, =
 > > thus ensuring that no one could clear the kn_knlist variable while you=
  > > hold the knlist lock. The knlist lock, however, usually comes from  >=
 > whichever event producing entity the knote tracks, so getting lock  > > =
ordering right between the per-knote mutex and this other lock seems  > > p=
otentially hard. (Sometimes we call into functions in kern_event.c with  > =
> the knlist lock already held, having been acquired in code outside of  > =
> kern_event.c. Consider, for example, calling KNOTE_LOCKED from  > > kern_=
exit.c; the PROC_LOCK macro has already been used to acquire the  > > proce=
ss lock, also serving
>  > >
>  > >
>  > >
>  > >
>  > >
>  > >
>  >
>
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org=
"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2i-BLMZizfshEF%2BM7BiuS8UWWgxBYog6gHmviZExSfy6Q>