Date: Wed, 6 Jan 2021 15:44:51 -0500 From: Ryan Stone <rysto32@gmail.com> To: Alexander Motin <mav@freebsd.org> Cc: src-committers <src-committers@freebsd.org>, svn-src-all <svn-src-all@freebsd.org>, svn-src-head <svn-src-head@freebsd.org> Subject: Re: svn commit: r322272 - head/sys/compat/linuxkpi/common/src Message-ID: <CAFMmRNxNZEEWQ%2BnaVtjq%2Bk6OXzszCT6sDDrMW3GLWYB5VFx6uw@mail.gmail.com> In-Reply-To: <201708081936.v78JaY4m074051@repo.freebsd.org> References: <201708081936.v78JaY4m074051@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Aug 8, 2017 at 3:36 PM Alexander Motin <mav@freebsd.org> wrote: > > Author: mav > Date: Tue Aug 8 19:36:34 2017 > New Revision: 322272 > URL: https://svnweb.freebsd.org/changeset/base/322272 > > Log: > Fix few issues of LinuxKPI workqueue. > > LinuxKPI workqueue wrappers reported "successful" cancellation for works > already completed in normal way. This change brings reported status and > real cancellation fact into sync. This required for drm-next operation. > > Reviewed by: hselasky (earlier version) > Sponsored by: iXsystems, Inc. > Differential Revision: https://reviews.freebsd.org/D11904 > > @@ -266,13 +267,14 @@ linux_delayed_work_timer_fn(void *arg) > [WORK_ST_IDLE] = WORK_ST_IDLE, /* NOP */ > [WORK_ST_TIMER] = WORK_ST_TASK, /* start queueing task */ > [WORK_ST_TASK] = WORK_ST_TASK, /* NOP */ > - [WORK_ST_EXEC] = WORK_ST_TASK, /* queue task another time */ > - [WORK_ST_CANCEL] = WORK_ST_IDLE, /* complete cancel */ > + [WORK_ST_EXEC] = WORK_ST_EXEC, /* NOP */ > + [WORK_ST_CANCEL] = WORK_ST_TASK, /* failed to cancel */ > }; > struct delayed_work *dwork = arg; > > switch (linux_update_state(&dwork->work.state, states)) { > case WORK_ST_TIMER: > + case WORK_ST_CANCEL: > linux_delayed_work_enqueue(dwork); > break; > default: I believe that this hunk introduced a regression into workqueue. Consider the following scenario: 1. A delayed_work struct in the WORK_ST_TIMER state. 2. Thread A calls mod_delayed_work() 3. Thread B (a callout thread) simultaneously calls linux_delayed_work_timer_fn() The following sequence of events is possible: A: Call linux_cancel_delayed_work() A: Change state from TIMER TO CANCEL B: Change state from CANCEL to TASK B: taskqueue_enqueue() the task A: taskqueue_cancel() the task A: Call linux_queue_delayed_work_on(). This is a no-op because the state is WORK_ST_TASK. As a result, the delayed_work struct will never be invoked. This is causing address resolution in ib_addr.c to stop permanently, as it never tries to reschedule a task that it thinks is already scheduled. Do you have a recommendation? Should we unconditionally taskqueue_enqueue() when in the WORK_ST_TASK state and linux_queue_delayed_work_on() is called? That is harmless for a pending task but will break the deadlock if the race is lost.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFMmRNxNZEEWQ%2BnaVtjq%2Bk6OXzszCT6sDDrMW3GLWYB5VFx6uw>