From owner-svn-src-all@freebsd.org Fri Jun 9 23:56:06 2017 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E7304BFF7C6; Fri, 9 Jun 2017 23:56:06 +0000 (UTC) (envelope-from jonlooney@gmail.com) Received: from mail-wr0-x22b.google.com (mail-wr0-x22b.google.com [IPv6:2a00:1450:400c:c0c::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7097580793; Fri, 9 Jun 2017 23:56:06 +0000 (UTC) (envelope-from jonlooney@gmail.com) Received: by mail-wr0-x22b.google.com with SMTP id q97so47178110wrb.2; Fri, 09 Jun 2017 16:56:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=U+jSKBS/qREezkmPGx7XrLb/p9r5uYveUyH6qWU9dv0=; b=A+P6iNwAho/jz984lNjCifiuxXq192wbgn4iciFZv5TxVq7G102/HsHgfL52FSmMGV l3YeJlJ5SeAoYMkjhXgiZypfP6aKIt0E14+tMNTR8U50skQRhRds/FAVia22xy8ZxUKh IRdIztHvKGNAv2MlgmG/HpmIjnlsY3CX7gggLdflTe7hkga3z0qRKmn1zpZOOuaH0mj8 5JdK26vh6xZg7ZoS+glQiwKiF82GfHf8SxFDdG/v2r8t913jlot00WKvj7l2JrCTyt6c IJulRwkJaZifXErO2PGXPkiZu9HGGy5Csrln0kQZ40pb9zQr/xQvtzu/thrSU3nZEMp+ PY8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=U+jSKBS/qREezkmPGx7XrLb/p9r5uYveUyH6qWU9dv0=; b=ra79s6iINh5zXHZmdhAHFG+MQWbXWVgkNYYeqw4giUQnpHzp78oVBgqVN6xFY+bZO7 9ixbMRkNrNjrlv7g0cm1FuTZms+nWRmeicGSYgA/vUTrplTArMY4DdPy317EJE/NxKTW jWt5EV979Qkcx4zSf9xzJIYuh+KJR54uLfTI8h4st03hdS0kQEfYaKgmWvJ3D6GOc4+D DlTiqINIdeu0Lury8TvxUs7iG0rfCuFGsmXunh2m4cvH3GJ1j2oivJF5IFKpl0Zi8dN9 Rtof1edjps3YWhAGYFhnkDc3u/qlpL3btnUwp2H/IJIly3Otrmcp1KGfMnooFlxgWNw7 axPQ== X-Gm-Message-State: AODbwcApd5ewfoP4lb+26EtH8v5OrRpVpG9LIi1VnGGbX5T4fVGvuDGm xOBpoFZBOj6x+Bjbd+US28W8kcN9bK1EdF4= X-Received: by 10.28.96.134 with SMTP id u128mr1382935wmb.85.1497052564395; Fri, 09 Jun 2017 16:56:04 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.151.132 with HTTP; Fri, 9 Jun 2017 16:56:03 -0700 (PDT) In-Reply-To: <7306919.ixyIA96xWQ@ralph.baldwin.cx> References: <201706082047.v58KlI51079003@repo.freebsd.org> <7306919.ixyIA96xWQ@ralph.baldwin.cx> From: Jonathan Looney Date: Fri, 9 Jun 2017 16:56:03 -0700 Message-ID: Subject: Re: svn commit: r319720 - head/sys/dev/vt To: John Baldwin , Konstantin Belousov Cc: "Jonathan T. Looney" , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Jun 2017 23:56:07 -0000 Hi John, Konstantin, This crash occurs during system startup when we are trying to switch from having each write to the vt device do an immediate flush to using a callout-based asynchronous flushing mechanism. It appears the crash was caused by having the VDF_ASYNC flag set while the vd_timer_armed flag was 0. The fix is to make sure that vd_timer_armed is 1 before the VDF_ASYNC flag is set. It is my understanding that the acquire semantics in the atomic_add_acq_int() call should ensure that the write to vd_timer_armed occurs before the load, bitwise-or, and store associated with `vd->vd_flags |= VDF_ASYNC`. Ensuring that ordering (or, at least the store ordering) is all that is really necessary to stop the crash from occurring. (A more thorough analysis is available in the PR [217408], which I forgot to include in the commit metadata.) To answer Konstantin's question, the VDF_ASYNC and vd_timer_armed flags are different. The VDF_ASYNC flag indicates that we want to use async flushing. The vd_timer_armed flag indicates that the callout is actually armed to flush at some point soon, so a thread that writes to the vt device doesn't need to worry about scheduling the callout. I'm not claiming that this fixes all bugs in this area. (In fact, I specifically disclaim this.) But, it does stop the crash from occurring. If you still feel there are better mechanisms to achieve the desired ordering, please let me know and I'll be happy to fix and/or improve this. Jonathan On Thu, Jun 8, 2017 at 2:49 PM, John Baldwin wrote: > On Thursday, June 08, 2017 08:47:18 PM Jonathan T. Looney wrote: > > Author: jtl > > Date: Thu Jun 8 20:47:18 2017 > > New Revision: 319720 > > URL: https://svnweb.freebsd.org/changeset/base/319720 > > > > Log: > > With EARLY_AP_STARTUP enabled, we are seeing crashes in > softclock_call_cc() > > during bootup. Debugging information shows that softclock_call_cc() is > > trying to execute the vt_consdev.vd_timer callout, and the callout > > structure contains a NULL c_func. > > > > This appears to be due to a race between vt_upgrade() running > > callout_reset() and vt_resume_flush_timer() calling callout_schedule(). > > > > Fix the race by ensuring that vd_timer_armed is always set before > > attempting to (re)schedule the callout. > > > > Discussed with: emaste > > MFC after: 2 weeks > > Sponsored by: Netflix > > Differential Revision: https://reviews.freebsd.org/D9828 > > This should probably be using atomic_thread_fence_foo() in conjunction with > a simple 'vd->vd_timer_armed = 1' assignment instead of abusing > atomic_add_acq_int(). Unfortunately atomic_thread_fence_*() aren't yet > documented in atomic(9). :( The commit message that added them is below > though: > > ------------------------------------------------------------------------ > r285283 | kib | 2015-07-08 11:12:24 -0700 (Wed, 08 Jul 2015) | 22 lines > > Add the atomic_thread_fence() family of functions with intent to > provide a semantic defined by the C11 fences with corresponding > memory_order. > > atomic_thread_fence_acq() gives r | r, w, where r and w are read and > write accesses, and | denotes the fence itself. > > atomic_thread_fence_rel() is r, w | w. > > atomic_thread_fence_acq_rel() is the combination of the acquire and > release in single operation. Note that reads after the acq+rel fence > could be made visible before writes preceeding the fence. > > atomic_thread_fence_seq_cst() orders all accesses before/after the > fence, and the fence itself is globally ordered against other > sequentially consistent atomic operations. > > Reviewed by: alc > Discussed with: bde > Sponsored by: The FreeBSD Foundation > MFC after: 3 weeks > > ------------------------------------------------------------------------ > > That said, it is hard to see how a bare acquire barrier is really > sufficient for anything. Acquire barriers generally must be paired with > a release barrier in order to provide sychronization. > > -- > John Baldwin >