Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 Feb 2022 02:14:40 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        Peter Jeremy <peterj@freebsd.org>, Konstantin Belousov <kostikbel@gmail.com>,  FreeBSD FS <freebsd-fs@freebsd.org>,  "freebsd-geom@FreeBSD.org" <freebsd-geom@freebsd.org>
Subject:   Re: bio re-ordering
Message-ID:  <CANCZdfp=0rbBkr4SoXhvn7hrQniPQzTeZra2HGBwXDGsJjN8XQ@mail.gmail.com>
In-Reply-To: <b75872f4-521b-5eab-68d0-4b1c04a10add@FreeBSD.org>
References:  <YfTCs7j3TPZFcFCD@server.rulingia.com> <YfTEj1KLhQhoR3xP@kib.kiev.ua> <CANCZdfoqQ3Ze%2BcMTsk_ho9x8hsSM9=fTavSao%2BUtwc2nSAEJpQ@mail.gmail.com> <Yfo3i9Yy/uCUpss1@server.rulingia.com> <CANCZdfqBQOvzMCrJxWq9GzqCKyK_AubBE1CxAW5FULnE7D_jrg@mail.gmail.com> <b75872f4-521b-5eab-68d0-4b1c04a10add@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000034caef05d70573b8
Content-Type: text/plain; charset="UTF-8"

On Wed, Feb 2, 2022 at 2:05 AM Andriy Gapon <avg@freebsd.org> wrote:

> On 02/02/2022 09:58, Warner Losh wrote:
> >
> >
> > On Wed, Feb 2, 2022, 12:49 AM Peter Jeremy <peterj@freebsd.org
> > <mailto:peterj@freebsd.org>> wrote:
> >
> >     Thanks all for the very prompt responses.
> >
> >     On 2022-Jan-28 22:32:02 -0700, Warner Losh <imp@bsdimp.com
> >     <mailto:imp@bsdimp.com>> wrote:
> >      >I think that ufs relies on two ordering primitives, both marked
> with
> >      >BIO_ORDERED today.
> >      >That's what most of the drivers key off of. We always set
> BIO_ORDERED on
> >      >all the BIO_FLUSH
> >      >events as far as I Can tell.
> >
> >     Thanks for that warning.  I don't think geom_gate understands either
> >     B_BARRIER or BIO_ORDERED.  I shall have a closer look.
> >
> >
> > It needs to understand BIO_ORDERED.
> >
> >
> >      >to it. b*barrierwrite() sets this, and that's used in the
> ffs_alloc code.
> >
> >     In my case, I'm interested in ZFS, rather than UFS and it doesn't
> seem
> >     to set B_BARRIER or BIO_ORDERED or indirectly.
> >
> >
> > I went hunting ZFS for this year's ago and in the pre OpenZFS code they
> were
> > used, but there were three layers of indirection that obscured it. ZFS
> doesn't
> > use the buffer cache, so B_BARRIER isn't relevant. I'll see if I can
> find it
> > with the new code.
> >
> > But if it never sets BIO_ORDERED, drivers are already reordering things.
> That's
> > all any other driver in the tree worries about...
>
> Hmm... it looks like both the old and new (Open)ZFS use BIO_FLUSH command
> without BIO_ORDERED flag.  Not sure if it happens to do the right thing
> anyway
> or not.
>

It's an unordered flush then. The flush will happen whenever. I have a vague
memory that ZFS will only issue this command in cases where there's no
other I/O
pending. It will be the only way for it to be reliable with nvme, since the
BIO_FLUSH
command isn't ordered w/o BIO_ORDERED flag. So ggate needn't do anything
special for BIO_FLUSH, just BIO_ORDERED. Otherwise, it's free to reorder as
it
sees fit.

The CAM I/O scheduler takes a little bit of liberty here, btw. It
interprets BIO_ORDERED
as being only wrt BIO_WRITE and BIO_FLUSH because if you schedule both a
read
and write, the results are undefined. nvd takes a stricter approach and
honors the ordering
more strictly.

Warner

--00000000000034caef05d70573b8
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Wed, Feb 2, 2022 at 2:05 AM Andriy=
 Gapon &lt;<a href=3D"mailto:avg@freebsd.org">avg@freebsd.org</a>&gt; wrote=
:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.=
8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 02/02/2022 =
09:58, Warner Losh wrote:<br>
&gt; <br>
&gt; <br>
&gt; On Wed, Feb 2, 2022, 12:49 AM Peter Jeremy &lt;<a href=3D"mailto:peter=
j@freebsd.org" target=3D"_blank">peterj@freebsd.org</a> <br>
&gt; &lt;mailto:<a href=3D"mailto:peterj@freebsd.org" target=3D"_blank">pet=
erj@freebsd.org</a>&gt;&gt; wrote:<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0Thanks all for the very prompt responses.<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0On 2022-Jan-28 22:32:02 -0700, Warner Losh &lt;<a h=
ref=3D"mailto:imp@bsdimp.com" target=3D"_blank">imp@bsdimp.com</a><br>
&gt;=C2=A0 =C2=A0 =C2=A0&lt;mailto:<a href=3D"mailto:imp@bsdimp.com" target=
=3D"_blank">imp@bsdimp.com</a>&gt;&gt; wrote:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 &gt;I think that ufs relies on two ordering primit=
ives, both marked with<br>
&gt;=C2=A0 =C2=A0 =C2=A0 &gt;BIO_ORDERED today.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 &gt;That&#39;s what most of the drivers key off of=
. We always set BIO_ORDERED on<br>
&gt;=C2=A0 =C2=A0 =C2=A0 &gt;all the BIO_FLUSH<br>
&gt;=C2=A0 =C2=A0 =C2=A0 &gt;events as far as I Can tell.<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0Thanks for that warning.=C2=A0 I don&#39;t think ge=
om_gate understands either<br>
&gt;=C2=A0 =C2=A0 =C2=A0B_BARRIER or BIO_ORDERED.=C2=A0 I shall have a clos=
er look.<br>
&gt; <br>
&gt; <br>
&gt; It needs to understand BIO_ORDERED.<br>
&gt; <br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0 &gt;to it. b*barrierwrite() sets this, and that&#3=
9;s used in the ffs_alloc code.<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0In my case, I&#39;m interested in ZFS, rather than =
UFS and it doesn&#39;t seem<br>
&gt;=C2=A0 =C2=A0 =C2=A0to set B_BARRIER or BIO_ORDERED or indirectly.<br>
&gt; <br>
&gt; <br>
&gt; I went hunting ZFS for this year&#39;s ago and in the pre OpenZFS code=
 they were <br>
&gt; used, but there were three layers of indirection that obscured it. ZFS=
 doesn&#39;t <br>
&gt; use the buffer cache, so B_BARRIER isn&#39;t relevant. I&#39;ll see if=
 I can find it <br>
&gt; with the new code.<br>
&gt; <br>
&gt; But if it never sets BIO_ORDERED, drivers are already reordering thing=
s. That&#39;s <br>
&gt; all any other driver in the tree worries about...<br>
<br>
Hmm... it looks like both the old and new (Open)ZFS use BIO_FLUSH command <=
br>
without BIO_ORDERED flag.=C2=A0 Not sure if it happens to do the right thin=
g anyway <br>
or not.<br></blockquote><div><br></div><div>It&#39;s an unordered flush the=
n. The flush will happen whenever. I have a vague</div><div>memory that ZFS=
 will only issue this command in cases where there&#39;s no other I/O</div>=
<div>pending. It will be the only way for it to be reliable with nvme, sinc=
e the BIO_FLUSH</div><div>command isn&#39;t ordered w/o BIO_ORDERED flag. S=
o ggate=C2=A0needn&#39;t do anything</div><div>special for BIO_FLUSH, just =
BIO_ORDERED. Otherwise, it&#39;s free to reorder as it</div><div>sees fit.<=
/div><div><br></div><div>The CAM I/O scheduler takes a little bit of libert=
y here, btw. It interprets BIO_ORDERED</div><div>as being only wrt BIO_WRIT=
E and BIO_FLUSH because if you schedule both a read</div><div>and write, th=
e results are undefined. nvd takes a stricter approach and honors the order=
ing</div><div>more strictly.</div><div><br></div><div>Warner</div><div><br>=
</div></div></div>

--00000000000034caef05d70573b8--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfp=0rbBkr4SoXhvn7hrQniPQzTeZra2HGBwXDGsJjN8XQ>