Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 9 Jun 2017 16:56:03 -0700
From:      Jonathan Looney <jonlooney@gmail.com>
To:        John Baldwin <jhb@freebsd.org>, Konstantin Belousov <kostikbel@gmail.com>
Cc:        "Jonathan T. Looney" <jtl@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r319720 - head/sys/dev/vt
Message-ID:  <CADrOrmvH9vL6X7yZ2-djDAV92%2Be4W84z21-y2O4RFfxei8oT%2BQ@mail.gmail.com>
In-Reply-To: <7306919.ixyIA96xWQ@ralph.baldwin.cx>
References:  <201706082047.v58KlI51079003@repo.freebsd.org> <7306919.ixyIA96xWQ@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi John, Konstantin,

This crash occurs during system startup when we are trying to switch from
having each write to the vt device do an immediate flush to using a
callout-based asynchronous flushing mechanism.

It appears the crash was caused by having the VDF_ASYNC flag set while the
vd_timer_armed flag was 0. The fix is to make sure that vd_timer_armed is 1
before the VDF_ASYNC flag is set. It is my understanding that the acquire
semantics in the atomic_add_acq_int() call should ensure that the write to
vd_timer_armed occurs before the load, bitwise-or, and store associated
with `vd->vd_flags |= VDF_ASYNC`. Ensuring that ordering (or, at least the
store ordering) is all that is really necessary to stop the crash from
occurring.

(A more thorough analysis is available in the PR [217408], which I forgot
to include in the commit metadata.)

To answer Konstantin's question, the VDF_ASYNC and vd_timer_armed flags are
different. The VDF_ASYNC flag indicates that we want to use async flushing.
The vd_timer_armed flag indicates that the callout is actually armed to
flush at some point soon, so a thread that writes to the vt device doesn't
need to worry about scheduling the callout.

I'm not claiming that this fixes all bugs in this area. (In fact, I
specifically disclaim this.) But, it does stop the crash from occurring.

If you still feel there are better mechanisms to achieve the desired
ordering, please let me know and I'll be happy to fix and/or improve this.

Jonathan

On Thu, Jun 8, 2017 at 2:49 PM, John Baldwin <jhb@freebsd.org> wrote:

> On Thursday, June 08, 2017 08:47:18 PM Jonathan T. Looney wrote:
> > Author: jtl
> > Date: Thu Jun  8 20:47:18 2017
> > New Revision: 319720
> > URL: https://svnweb.freebsd.org/changeset/base/319720
> >
> > Log:
> >   With EARLY_AP_STARTUP enabled, we are seeing crashes in
> softclock_call_cc()
> >   during bootup. Debugging information shows that softclock_call_cc() is
> >   trying to execute the vt_consdev.vd_timer callout, and the callout
> >   structure contains a NULL c_func.
> >
> >   This appears to be due to a race between vt_upgrade() running
> >   callout_reset() and vt_resume_flush_timer() calling callout_schedule().
> >
> >   Fix the race by ensuring that vd_timer_armed is always set before
> >   attempting to (re)schedule the callout.
> >
> >   Discussed with:     emaste
> >   MFC after:  2 weeks
> >   Sponsored by:       Netflix
> >   Differential Revision:      https://reviews.freebsd.org/D9828
>
> This should probably be using atomic_thread_fence_foo() in conjunction with
> a simple 'vd->vd_timer_armed = 1' assignment instead of abusing
> atomic_add_acq_int().  Unfortunately atomic_thread_fence_*() aren't yet
> documented in atomic(9). :(  The commit message that added them is below
> though:
>
> ------------------------------------------------------------------------
> r285283 | kib | 2015-07-08 11:12:24 -0700 (Wed, 08 Jul 2015) | 22 lines
>
> Add the atomic_thread_fence() family of functions with intent to
> provide a semantic defined by the C11 fences with corresponding
> memory_order.
>
> atomic_thread_fence_acq() gives r | r, w, where r and w are read and
> write accesses, and | denotes the fence itself.
>
> atomic_thread_fence_rel() is r, w | w.
>
> atomic_thread_fence_acq_rel() is the combination of the acquire and
> release in single operation.  Note that reads after the acq+rel fence
> could be made visible before writes preceeding the fence.
>
> atomic_thread_fence_seq_cst() orders all accesses before/after the
> fence, and the fence itself is globally ordered against other
> sequentially consistent atomic operations.
>
> Reviewed by:    alc
> Discussed with: bde
> Sponsored by:   The FreeBSD Foundation
> MFC after:      3 weeks
>
> ------------------------------------------------------------------------
>
> That said, it is hard to see how a bare acquire barrier is really
> sufficient for anything.  Acquire barriers generally must be paired with
> a release barrier in order to provide sychronization.
>
> --
> John Baldwin
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADrOrmvH9vL6X7yZ2-djDAV92%2Be4W84z21-y2O4RFfxei8oT%2BQ>