Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Jan 2017 22:14:44 +0100
From:      Hans Petter Selasky <hps@selasky.org>
To:        Mark Johnston <markj@FreeBSD.org>, freebsd-hackers@FreeBSD.org
Subject:   Re: draining high-frequency callouts
Message-ID:  <68472d52-f4a0-a4cc-ec66-d39ee2f065ee@selasky.org>
In-Reply-To: <20170110205711.GA86449@wkstn-mjohnston.west.isilon.com>
References:  <20170110205711.GA86449@wkstn-mjohnston.west.isilon.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 01/10/17 21:57, Mark Johnston wrote:
> I'm occasionally seeing an assertion failure in softclock_call_cc() when
> running DTrace tests on a system with hz=10000. The assertion
> (c->c_flags & CALLOUT_ACTIVE) != 0 is failing while a thread is
> concurrently draining the callout, which runs at a high frequency. At
> the time of the panic, that thread is spinning on the per-CPU callout
> lock after having been awoken from "codrain", and CALLOUT_PENDING is
> set on the callout. The callout is direct, i.e., it is executed in hard
> interrupt context.
>
> I think this is what's happening:
> - callout_drain() is called while the callout is executing but after the
>   callout has rescheduled itself, and goes to sleep after having cleared
>   CALLOUT_ACTIVE.
> - softclock_call_cc() wakes up the callout_drain() caller, but the
>   callout fires again before the caller is scheduled.
> - the second softclock_call_cc() call sees that CALLOUT_ACTIVE is
>   cleared and panics.
>
> Is there anything that prevents this scenario? Is it really correct to
> leave CALLOUT_ACTIVE cleared when the per-CPU callout lock must be
> dropped in order to acquire a sleepqueue lock?
>

Hi Mark,

First of all, thank you for reporting this bug. I believe this bug is 
not reproducible with the hps_head project branch, which has the exact 
same KASSERT() in softclock_call_cc(), but uses a different callout 
drain logic:

https://svnweb.freebsd.org/base/projects/hps_head/sys/kern/kern_timeout.c?revision=309221&view=markup#l1261

 From my understanding of kern/kern_timeout.c, you are right that the 
"goto again" due to a LOR between the sleepqueue spinlock and the 
callout subsystem spinlocks, leaves a race open. I've long thought that 
the callout_drain() and callout_reset() functions in -head have become 
obfuscated after the introduction of per-CPU callouts and deserves a 
rewrite.

https://svnweb.freebsd.org/base/projects/hps_head

--HPS



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?68472d52-f4a0-a4cc-ec66-d39ee2f065ee>