Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Jan 2021 10:02:56 +0100
From:      Hans Petter Selasky <hps@selasky.org>
To:        Ryan Stone <rysto32@gmail.com>, Alexander Motin <mav@freebsd.org>
Cc:        src-committers <src-committers@freebsd.org>, svn-src-all <svn-src-all@freebsd.org>, svn-src-head <svn-src-head@freebsd.org>
Subject:   Re: svn commit: r322272 - head/sys/compat/linuxkpi/common/src
Message-ID:  <f785d26a-e907-7473-0777-7b65ae99b7c2@selasky.org>
In-Reply-To: <CAFMmRNxNZEEWQ%2BnaVtjq%2Bk6OXzszCT6sDDrMW3GLWYB5VFx6uw@mail.gmail.com>
References:  <201708081936.v78JaY4m074051@repo.freebsd.org> <CAFMmRNxNZEEWQ%2BnaVtjq%2Bk6OXzszCT6sDDrMW3GLWYB5VFx6uw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 1/6/21 9:44 PM, Ryan Stone wrote:
> On Tue, Aug 8, 2017 at 3:36 PM Alexander Motin <mav@freebsd.org> wrote:
>>
>> Author: mav
>> Date: Tue Aug  8 19:36:34 2017
>> New Revision: 322272
>> URL: https://svnweb.freebsd.org/changeset/base/322272
>>
>> Log:
>>    Fix few issues of LinuxKPI workqueue.
>>
>>    LinuxKPI workqueue wrappers reported "successful" cancellation for works
>>    already completed in normal way.  This change brings reported status and
>>    real cancellation fact into sync.  This required for drm-next operation.
>>
>>    Reviewed by:  hselasky (earlier version)
>>    Sponsored by: iXsystems, Inc.
>>    Differential Revision:        https://reviews.freebsd.org/D11904
>>
> 
>> @@ -266,13 +267,14 @@ linux_delayed_work_timer_fn(void *arg)
>>                  [WORK_ST_IDLE] = WORK_ST_IDLE,          /* NOP */
>>                  [WORK_ST_TIMER] = WORK_ST_TASK,         /* start queueing task */
>>                  [WORK_ST_TASK] = WORK_ST_TASK,          /* NOP */
>> -               [WORK_ST_EXEC] = WORK_ST_TASK,          /* queue task another time */
>> -               [WORK_ST_CANCEL] = WORK_ST_IDLE,        /* complete cancel */
>> +               [WORK_ST_EXEC] = WORK_ST_EXEC,          /* NOP */
>> +               [WORK_ST_CANCEL] = WORK_ST_TASK,        /* failed to cancel */
>>          };
>>          struct delayed_work *dwork = arg;
>>
>>          switch (linux_update_state(&dwork->work.state, states)) {
>>          case WORK_ST_TIMER:
>> +       case WORK_ST_CANCEL:
>>                  linux_delayed_work_enqueue(dwork);
>>                  break;
>>          default:
> 
> I believe that this hunk introduced a regression into workqueue.
> Consider the following scenario:
> 
> 1. A delayed_work struct in the WORK_ST_TIMER state.
> 2. Thread A calls mod_delayed_work()
> 3. Thread B (a callout thread) simultaneously calls
> linux_delayed_work_timer_fn()
> 
> The following sequence of events is possible:
> 
> A: Call linux_cancel_delayed_work()
> A: Change state from TIMER TO CANCEL
> B: Change state from CANCEL to TASK
> B: taskqueue_enqueue() the task
> A: taskqueue_cancel() the task
> A: Call linux_queue_delayed_work_on().  This is a no-op because the
> state is WORK_ST_TASK.
> 
> As a result, the delayed_work struct will never be invoked.  This is
> causing address resolution in ib_addr.c to stop permanently, as it
> never tries to reschedule a task that it thinks is already scheduled.
> 
> Do you have a recommendation?  Should we unconditionally
> taskqueue_enqueue() when in the WORK_ST_TASK state and
> linux_queue_delayed_work_on() is called?  That is harmless for a
> pending task but will break the deadlock if the race is lost.
> 

Hi Ryan,

Do you have a patch to fix this?

--HPS



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f785d26a-e907-7473-0777-7b65ae99b7c2>