From owner-freebsd-current@freebsd.org  Thu Sep  3 20:39:37 2015
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 35CAC9CA322
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Thu,  3 Sep 2015 20:39:37 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 52E7DC80;
 Thu,  3 Sep 2015 20:39:35 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA04869;
 Thu, 03 Sep 2015 23:39:33 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1ZXbID-0008eO-Bx; Thu, 03 Sep 2015 23:39:33 +0300
Subject: Re: panic: Stray timeout
To: Hans Petter Selasky <hps@selasky.org>,
 Konstantin Belousov <kostikbel@gmail.com>
References: <55E2DA91.6080206@FreeBSD.org> <20150830161603.GI2072@kib.kiev.ua>
 <55E354DA.9030302@FreeBSD.org> <55E484F3.8090303@FreeBSD.org>
 <55E4A692.9010602@selasky.org>
Cc: FreeBSD Current <freebsd-current@FreeBSD.org>, rrs@FreeBSD.org
From: Andriy Gapon <avg@FreeBSD.org>
Message-ID: <55E8AFB4.1080101@FreeBSD.org>
Date: Thu, 3 Sep 2015 23:38:12 +0300
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101
 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <55E4A692.9010602@selasky.org>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Sep 2015 20:39:37 -0000

On 31/08/2015 22:10, Hans Petter Selasky wrote:
> Hi Andriy,
> 
> On 08/31/15 18:46, Andriy Gapon wrote:
>> On 30/08/2015 22:09, Andriy Gapon wrote:
>>> On 30/08/2015 19:16, Konstantin Belousov wrote:
>>>> This is strange, I do not think that could be a right explanation of this
>>>> issue.  The taskqueue callout is initialized with the mutex, which means
>>>> that the callout_stop() caller
>>>> - must own the mutex;
>>>> - is synchronous with the callout.
>>>> In other words, callout cannot be running when taskqueue_cancel_timeout()
>>>> calls callout_stop(), it only can be dequeued but the callout function
>>>> is not yet called.  If callout_stop() is performed meantime, between
>>>> dropping callout_cpu lock, and locking the mutex, it must be not run.
>>>
>>> Thank you for the explanation.  I am not familiar with the code and I
>>> misinterpreted the manual page and thought that callout_stop() might be unable
>>> to stop the callout even if it was initialized with the mutex.  I see my mistake
>>> now.
>>
>> I had to look at the code, of course.
>> Could the following logic actually be buggy?
>>
>> int
>> callout_reset_sbt_on(struct callout *c, sbintime_t sbt, sbintime_t precision,
>>      void (*ftn)(void *), void *arg, int cpu, int flags)
>> {
>> ...
>>          if (cc_exec_curr(cc, direct) == c) {
>>                  /*
>>                   * We're being asked to reschedule a callout which is
>>                   * currently in progress.  If there is a lock then we
>>                   * can cancel the callout if it has not really started.
>>                   */
>>                  if (c->c_lock != NULL && cc_exec_cancel(cc, direct))
>>                          cancelled = cc_exec_cancel(cc, direct) = true;
>>
>> I have a suspicion that the condition should check for !cc_exec_cancel:
>> - it seems that cc_exec_cancel is set to true when the callback execution starts
>> - logically, it wouldn't make sense to check if X is true and then set it to true
>>
>> The code is quite hard to understand in a short time, so I could be
>> misinterpreting what's going on, but I'd like to check my suspicion.
> 
> While touching this topic, is the following code more readable or not (see the
> callout_restart_async function):
> 
> https://svnweb.freebsd.org/base/projects/hps_head/sys/kern/kern_timeout.c?revision=287248&view=markup

That code is a bit easier to read indeed.  But it's quite similar -
unsurprisingly, I guess.

>> If this is a bug indeed, then it might make the following scenario possible:
>>
>> - softclock() is being executed in thread T1 and the control flow is in
>> softclock_call_cc() after CC_UNLOCK() and before lc_lock(c_lock)
>> - thread T2 is executing callout_reset_sbt_on on the same callout as T1
>> - so, I think, T2 will see cc_exec_cancel == false and thus, because of the
>> suspected bug, cc_exec_cancel will remain false
>> - T2 will proceed to callout_cc_add() and will happily (re-)schedule the callout
>> - after T2 releases the callout lock, T1 will continue in softclock_call_cc()
>> and because cc_exec_cancel is false the callout's callback function will get
>> called
> 
> Correct.
> 
>>
>> In effect, there will be an extra call of the callout.
>> If that's so that could explain the triggered assertion in
>> taskqueue_timeout_func().
>>
>> P.S. I am going to add an assertion in callout_cc_add that CALLOUT_ACTIVE is not
>> set and I will see if I can get anything interesting from it.
>>
>> P.P.S. On a further look, it seems that there is a bug indeed and it's probably
>> just a typo that got introduced in r278469:
>> -               if (c->c_lock != NULL && !cc->cc_exec_entity[direct].cc_cancel)
>> -                       cancelled = cc->cc_exec_entity[direct].cc_cancel = true;
>> -               if (cc->cc_exec_entity[direct].cc_waiting) {
>> +               if (c->c_lock != NULL && cc_exec_cancel(cc, direct))
>> +                       cancelled = cc_exec_cancel(cc, direct) = true;
>> +               if (cc_exec_waiting(cc, direct)) {
>>
> 
> Looks like a bug to me trying to make the code more readable removed a not in
> there!

Yes, indeed.  Thank you for reviewing!

-- 
Andriy Gapon