From owner-svn-src-head@freebsd.org Wed Dec 20 22:31:10 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8C103E835C9 for ; Wed, 20 Dec 2017 22:31:10 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-io0-x230.google.com (mail-io0-x230.google.com [IPv6:2607:f8b0:4001:c06::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49C876593D for ; Wed, 20 Dec 2017 22:31:10 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-io0-x230.google.com with SMTP id w127so18980656iow.11 for ; Wed, 20 Dec 2017 14:31:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=DUWqV47Dshl74hqBwqRAVHfE0OjCvQR0gnARX2ZDOxk=; b=J33yQscwqVAmPz1t70m3PyawxRMT2bI09Or4sh5lQNEUu1LpQLrMA5hzCXq6447/Gg qsc0edMh3eTUBHoN+/JBLQ7MWbrSJEAn6+oZ1d09ebwqekxVGD+vmfQbJAx6E4xkkvpn vPekEyk5ab9dRDeMfZQbj5Primj1U7pogOEmThhnqclu+zaJnlbY3HazsE9T9cJ+KPZS 431YE6Ck1Zm6vH+SN+EdC9La1+2UsXBxQBEDZ4zL8Lc/54M1O4pz6mHx7UGP2UgjG7i/ ExhV2mF71FNgcJmLD55DsbqtfIiQ4ugVfN3q+c1Rmv1YdUc89HzvnLr2G846U05uIjxS N7eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=DUWqV47Dshl74hqBwqRAVHfE0OjCvQR0gnARX2ZDOxk=; b=cXJx7He6D5l0y85flKH2BFGihIvoAmhlKGNvnIGystgvOVc264YbY14jcJjO8u524h ox5MydvVRsTcdea+qOg79Kn9YiAgQ6qWfnSs6wz/naN/WFHRgl0sQwLxDrXfaVX2h37M MdzkQxd+igHKtqjFJ0w8UL048TjFPKHusf5VYcKXVqFJIa439rO/MVa3llBcSS+CRUYR m2gnXQAMrzG/eysZm0RGVgzZfq8NaRLpMLSPXHhuV7+usXpCNULB9qZxgJSbSjZKOXvD 1RSZOd2lVtj4JYGbeSWVj1Uzv4gdldq+51Tlfv7DRlqqXEa6i6f7kauEDjPapI990ZuM 42CA== X-Gm-Message-State: AKGB3mJmzOr7TMBAkcePYyw9z2N60FNi1zv6a9sVC+Lam8VEDdxg/lpf 9qBOiEA07RY7VSchEVS6kXA65IMKf3quJ4cAolCAjg== X-Google-Smtp-Source: ACJfBovPFTN7s7ziQkZyJAaqQkjKdqK1JBOjjR5WVGoHOeVsRBQI4yEgbQ6gWXGhB4t2DODka+1eiKrMCiXs9AGPTpI= X-Received: by 10.107.149.5 with SMTP id x5mr7796099iod.136.1513809069049; Wed, 20 Dec 2017 14:31:09 -0800 (PST) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.108.204 with HTTP; Wed, 20 Dec 2017 14:31:08 -0800 (PST) X-Originating-IP: [2603:300b:6:5100:1052:acc7:f9de:2b6d] In-Reply-To: <20171220203248.GB11032@raichu> References: <201712091544.vB9FiVUI096790@repo.freebsd.org> <20171210031450.GB15275@raichu> <20171220203248.GB11032@raichu> From: Warner Losh Date: Wed, 20 Dec 2017 15:31:08 -0700 X-Google-Sender-Auth: GcbEfCWlIx677E20_TvUAO0ejao Message-ID: Subject: Re: svn commit: r326731 - head/sys/ufs/ffs To: Mark Johnston Cc: src-committers , "svn-src-all@freebsd.org" , "svn-src-head@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Dec 2017 22:31:10 -0000 On Wed, Dec 20, 2017 at 1:32 PM, Mark Johnston wrote: > On Sat, Dec 09, 2017 at 10:14:50PM -0500, Mark Johnston wrote: > > On Sat, Dec 09, 2017 at 07:36:59PM -0700, Warner Losh wrote: > > > On Sat, Dec 9, 2017 at 11:03 AM, Andriy Gapon wrote: > > > > > > > On 09/12/2017 17:44, Mark Johnston wrote: > > > > > Some GEOMs do not appear to handle BIO_ORDERED correctly, meaning > that > > > > the > > > > > barrier write may not work as intended. > > > > > > > > > There's a few places we send down a BIO_ORDERED BIO_FLUSH command > > > (see softdep_synchronize for one). Will those matter? > > > > Some classes have separate handling for BIO_FLUSH, so it depends. I > > think gmirror's handling is buggy independent of BIO_ORDERED: > > g_mirror_start() sends BIO_FLUSH commands directly to the mirrors, > > while reads and writes are queued for handling by the gmirror worker > > thread. So as far as I can tell, a BIO_WRITE which arrives at the > > gmirror provider before a BIO_FLUSH might be sent to the mirrors > > after that BIO_FLUSH. I would expect BIO_FLUSH to implicitly have > > something like release semantics, i.e., a BIO_FLUSH shouldn't be > > reordered with a BIO_WRITE that preceded it. But I might be > > misunderstanding. > > > > > As I've noted elsewhere: I'd really like to kill BIO_ORDERED since it > has > > > too many icky effects (and BIO_FLUSH + BIO_ORDERED isn't guaranteed to > do, > > > well, anything since it can turn into a NOP in a number of places. Plus > > > many of the implementations of BIO_ORDERED assume the drive is like > SCSI > > > and you just set the right tag to make it 'ordered'. For ATA we issue > a non > > > NCQ command, which is a full drain of outstanding commands, send this > > > command, then start them again which really shuts down the parallelism > we > > > implemented NCQ for :(. We do similar for NVME which is even worse. > There > > > we have multiple submission queues in the hardware. To simulated it, > we do > > > a similar drain, but that's going to get in the way as we move to NUMA > and > > > systems where we try to do the I/O entirely on one CPU (both > submission and > > > completion) and ordered I/O is guaranteed lock contention. > > > > Independent of this, it doesn't really look like we have any way of > > handling write errors when dependencies are enforced using BIO_ORDERED. > > In the case of the babarrierwrite() consumer in FFS, what happens if the > > inode block write fails due to a transient error, but the following CG > > update succeeds? > > Looking at this some more, I think iosched is ok since bioq_disksort() > does in fact handle BIO_ORDERED. Except when it doesn't. BIO_READs are posted to the device completely independent of BIO_ORDERED commands. I believe this is OK, but since BIO_ORDERED is so ill defined, I can't be sure. > I'm concerned though that there are > places in the tree where we should be using BIO_ORDERED but aren't. > I maintain that there are no such places :) If you want to do ORDERED I/O, you have to order it yourself by flushing all the other outstanding I/O, then issuing the I/O you want to follow the completion of all the others, then issue new I/O. With tagged queueing, otherwise, you get non-determinate write order to the disk. > gmirror metadata writes for instance are performed synchronously but > should not be reordered with preceding BIO_WRITEs. > Right, gmirror should freeze the software bioq, drain the hardware queue, issue the right, then unfreeze its bioq in these cases. BIO_ORDERED and BIO_FLUSH isn't going to help much. It assumes there's only one 'flow' of these to the device from the upper layers, when in fact there can be several for multiple partitions. If gmirror needs strong ordering semantics, it needs to actually enforce the exact semantics it wants. Otherwise, tagged queueing can get in the way (though right now, to its credit, the ATA driver schedules the ordered I/O as non-tagged-queue so that it causes this mini-freeze, SCSI just tags it and hopes for the best, giving slightly different semantics, and NVMe I think drains all queue, then schedules the I/O then starts again). NVMe is an even poorer match to whatever BIO_ORDERED is supposed to mean than even ATA since it can do multiple streams to the drive at once. In that environment, BIO_ORDERED makes as much sense as SPLHIGH does on a MP machine. > Anyway, I've posted a review for some gmirror I/O ordering fixes here if > anyone is interested: https://reviews.freebsd.org/D13559 > I'll take a look at it, thanks. Warner