Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 1 Jan 2015 00:56:13 +0200
From:      Ivan Klymenko <fidaj@ukr.net>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: [RFC] kern/kern_timeout.c rewrite in progress
Message-ID:  <20150101005613.4f788b0c@nonamehost.local>
In-Reply-To: <54A1B38C.1000709@selasky.org>
References:  <54A1B38C.1000709@selasky.org>

next in thread | previous in thread | raw e-mail | index | archive | help
=D0=92 Mon, 29 Dec 2014 21:03:24 +0100
Hans Petter Selasky <hps@selasky.org> =D0=BF=D0=B8=D1=88=D0=B5=D1=82:

> Hi,
>=20
> I recently came across a class of errors which lead me into=20
> investigating the "kern/kern_timeout.c" and its subsystem. From what
> I can see new features like the SMP awareness has been "added"
> instead of fully "integrated". When going into the cornercases I've
> uncovered that the internal callout statemachine can sometimes report
> wrong values via its callout_active() and callout_pending() bits to
> its clients, which in turn can make the clients behave badly. I
> further did an investigation on how the safety of callout migration
> between CPU's is maintained. When I looked into the code and found
> stuff like "volatile" and "while()" loops to figure which CPU a
> callout belongs I understood that such logic completely undermines
> the cleverness found in the turnstiles of mutexes and decided to go
> through all of the logic inside "kern_timeout.c". Also static code
> analysis is harder when we don't use the basic mutexes and condition
> variables available in the kernel.
>=20
> First of all we need to make some driving rules for everyone:
>=20
> 1) A new feature called direct callbacks which execute the timer=20
> callbacks from the fast interrupt handler was added. All these
> callbacks _must_ be associated with a regular spinlocks, to maintain
> a safe callout_drain(). Else they should only be executed on CPU0.
>=20
> 2) All Giant locked callbacks should only execute on CPU0 to avoid=20
> congestion.
>=20
> 3) Callbacks using read-only locks for its callback should also only=20
> execute on CPU0 to avoid multiple instances pending for completion on=20
> multiple CPU's, because read-only locks can be entered multiple
> times. From what I can see, there are currently no consumers of this
> feature in the kernel.
>=20
...

panic: spin lock held too long
http://paste.org.ru/?acf7io



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150101005613.4f788b0c>