Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Jan 2022 22:32:02 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        peterj@freebsd.org, FreeBSD FS <freebsd-fs@freebsd.org>,  "freebsd-geom@FreeBSD.org" <freebsd-geom@freebsd.org>
Subject:   Re: bio re-ordering
Message-ID:  <CANCZdfoqQ3Ze%2BcMTsk_ho9x8hsSM9=fTavSao%2BUtwc2nSAEJpQ@mail.gmail.com>
In-Reply-To: <YfTEj1KLhQhoR3xP@kib.kiev.ua>
References:  <YfTCs7j3TPZFcFCD@server.rulingia.com> <YfTEj1KLhQhoR3xP@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000003f085605d6b1df04
Content-Type: text/plain; charset="UTF-8"

On Fri, Jan 28, 2022 at 9:38 PM Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Sat, Jan 29, 2022 at 03:29:39PM +1100, peterj@freebsd.org wrote:
> > I'm working on a GEOM Gate network client to better handle high-latency
> > connections and have some questions regarding bio ordering assumptions
> > (alternatively, how much should I be able to re-order bio requests
> without
> > breaking things).  Within geom_gate, an incoming bio request is retrieved
> > from the kernel using a G_GATE_CMD_START ioctl, processed in userland
> > (typically by forwarding it to a remote system) and then returned via a
> > G_GATE_CMD_DONE ioctl.  My GEOM Gate client can reorder requests quite
> > aggressively and I suspect it's breaking some kernel assumptions
> regarding
> > bio behaviour.  The following questions assume that BIO_READ, BIO_WRITE
> and
> > BIO_FLUSH are valid but BIO_DELETE isn't supported.
> >
> > a) In the absence of BIO_FLUSH operations, what (if any) are the limits
> on
> >    reordering operations?  Given a block that initially contains A,
> followed
> >    by a write B, read and write C, is there any constraint on which
> content
> >    the read returns?
> There are no limits.  Either other software entities, or hardware itself,
> can process requests in arbitrary order.  This is why things are typically
> done in the completion handler, and part of the reason why the complexity
> of UFS SU exists.
>
> >
> > b) Are individual BIO_READ and BIO_WRITE operations expected to be atomic
> >    with respect to other BIO_WRITE operations?  Give 2 adjacent blocks
> that
> >    initially contain AB, and successive write CD, read and write EF
> >    operations to those blocks, is it expected that the read would return
> CD
> >    (or maybe AD or EF, assuming that's valid from the previous question)
> or
> >    could the write operations partially complete in different orders,
> >    resulting in something like AD, CF, EB etc?
> No.  At very least, underlying entities can split request into several,
> each of which is ordered individiually.  Typically, it is higher-level
> code that ensures that there are no concurrent modifications of the same
> block.  For instance, we exclusively lock vnodes and buffers around
> metadata updates.  Similarly, we lock buffers until the data is written
> to the device.
>
> >
> > b) I assume that a BIO_FLUSH should not return DONE until all preceeding
> >    write operations have completed issued.  Is it required that write
> >    operations issued after the BIO_FLUSH must not complete before the
> >    BIO_FLUSH completes?
> UFS SU relies on BIO_FLUSH being the full barrier.
>

I think that ufs relies on two ordering primitives, both marked with
BIO_ORDERED today.
That's what most of the drivers key off of. We always set BIO_ORDERED on
all the BIO_FLUSH
events as far as I Can tell.

Also, anything that sets a B_BARRIER at the upper layers, also gets
BIO_ORDERED added
to it. b*barrierwrite() sets this, and that's used in the ffs_alloc code.

Warner

--0000000000003f085605d6b1df04
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Fri, Jan 28, 2022 at 9:38 PM Konst=
antin Belousov &lt;<a href=3D"mailto:kostikbel@gmail.com">kostikbel@gmail.c=
om</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex=
">On Sat, Jan 29, 2022 at 03:29:39PM +1100, <a href=3D"mailto:peterj@freebs=
d.org" target=3D"_blank">peterj@freebsd.org</a> wrote:<br>
&gt; I&#39;m working on a GEOM Gate network client to better handle high-la=
tency<br>
&gt; connections and have some questions regarding bio ordering assumptions=
<br>
&gt; (alternatively, how much should I be able to re-order bio requests wit=
hout<br>
&gt; breaking things).=C2=A0 Within geom_gate, an incoming bio request is r=
etrieved<br>
&gt; from the kernel using a G_GATE_CMD_START ioctl, processed in userland<=
br>
&gt; (typically by forwarding it to a remote system) and then returned via =
a<br>
&gt; G_GATE_CMD_DONE ioctl.=C2=A0 My GEOM Gate client can reorder requests =
quite<br>
&gt; aggressively and I suspect it&#39;s breaking some kernel assumptions r=
egarding<br>
&gt; bio behaviour.=C2=A0 The following questions assume that BIO_READ, BIO=
_WRITE and<br>
&gt; BIO_FLUSH are valid but BIO_DELETE isn&#39;t supported.<br>
&gt; <br>
&gt; a) In the absence of BIO_FLUSH operations, what (if any) are the limit=
s on<br>
&gt;=C2=A0 =C2=A0 reordering operations?=C2=A0 Given a block that initially=
 contains A, followed<br>
&gt;=C2=A0 =C2=A0 by a write B, read and write C, is there any constraint o=
n which content<br>
&gt;=C2=A0 =C2=A0 the read returns?<br>
There are no limits.=C2=A0 Either other software entities, or hardware itse=
lf,<br>
can process requests in arbitrary order.=C2=A0 This is why things are typic=
ally<br>
done in the completion handler, and part of the reason why the complexity<b=
r>
of UFS SU exists.<br>
<br>
&gt; <br>
&gt; b) Are individual BIO_READ and BIO_WRITE operations expected to be ato=
mic<br>
&gt;=C2=A0 =C2=A0 with respect to other BIO_WRITE operations?=C2=A0 Give 2 =
adjacent blocks that<br>
&gt;=C2=A0 =C2=A0 initially contain AB, and successive write CD, read and w=
rite EF<br>
&gt;=C2=A0 =C2=A0 operations to those blocks, is it expected that the read =
would return CD<br>
&gt;=C2=A0 =C2=A0 (or maybe AD or EF, assuming that&#39;s valid from the pr=
evious question) or<br>
&gt;=C2=A0 =C2=A0 could the write operations partially complete in differen=
t orders,<br>
&gt;=C2=A0 =C2=A0 resulting in something like AD, CF, EB etc?<br>
No.=C2=A0 At very least, underlying entities can split request into several=
,<br>
each of which is ordered individiually.=C2=A0 Typically, it is higher-level=
<br>
code that ensures that there are no concurrent modifications of the same<br=
>
block.=C2=A0 For instance, we exclusively lock vnodes and buffers around <b=
r>
metadata updates.=C2=A0 Similarly, we lock buffers until the data is writte=
n<br>
to the device.<br>
<br>
&gt; <br>
&gt; b) I assume that a BIO_FLUSH should not return DONE until all preceedi=
ng<br>
&gt;=C2=A0 =C2=A0 write operations have completed issued.=C2=A0 Is it requi=
red that write<br>
&gt;=C2=A0 =C2=A0 operations issued after the BIO_FLUSH must not complete b=
efore the<br>
&gt;=C2=A0 =C2=A0 BIO_FLUSH completes?<br>
UFS SU relies on BIO_FLUSH being the full barrier.<br></blockquote><div><br=
></div><div>I think that ufs relies on two ordering primitives, both marked=
 with BIO_ORDERED today.</div><div>That&#39;s what most of the drivers key =
off of. We always set BIO_ORDERED on all the BIO_FLUSH</div><div>events as =
far as I Can tell.<br></div><div><br></div><div>Also, anything that sets a =
B_BARRIER at the upper layers, also gets BIO_ORDERED added</div><div>to it.=
 b*barrierwrite() sets this, and that&#39;s used in the ffs_alloc code.</di=
v><div><br></div><div>Warner<br></div></div></div>

--0000000000003f085605d6b1df04--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfoqQ3Ze%2BcMTsk_ho9x8hsSM9=fTavSao%2BUtwc2nSAEJpQ>