Date: Wed, 2 Feb 2022 02:14:40 -0700 From: Warner Losh <imp@bsdimp.com> To: Andriy Gapon <avg@freebsd.org> Cc: Peter Jeremy <peterj@freebsd.org>, Konstantin Belousov <kostikbel@gmail.com>, FreeBSD FS <freebsd-fs@freebsd.org>, "freebsd-geom@FreeBSD.org" <freebsd-geom@freebsd.org> Subject: Re: bio re-ordering Message-ID: <CANCZdfp=0rbBkr4SoXhvn7hrQniPQzTeZra2HGBwXDGsJjN8XQ@mail.gmail.com> In-Reply-To: <b75872f4-521b-5eab-68d0-4b1c04a10add@FreeBSD.org> References: <YfTCs7j3TPZFcFCD@server.rulingia.com> <YfTEj1KLhQhoR3xP@kib.kiev.ua> <CANCZdfoqQ3Ze%2BcMTsk_ho9x8hsSM9=fTavSao%2BUtwc2nSAEJpQ@mail.gmail.com> <Yfo3i9Yy/uCUpss1@server.rulingia.com> <CANCZdfqBQOvzMCrJxWq9GzqCKyK_AubBE1CxAW5FULnE7D_jrg@mail.gmail.com> <b75872f4-521b-5eab-68d0-4b1c04a10add@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000034caef05d70573b8 Content-Type: text/plain; charset="UTF-8" On Wed, Feb 2, 2022 at 2:05 AM Andriy Gapon <avg@freebsd.org> wrote: > On 02/02/2022 09:58, Warner Losh wrote: > > > > > > On Wed, Feb 2, 2022, 12:49 AM Peter Jeremy <peterj@freebsd.org > > <mailto:peterj@freebsd.org>> wrote: > > > > Thanks all for the very prompt responses. > > > > On 2022-Jan-28 22:32:02 -0700, Warner Losh <imp@bsdimp.com > > <mailto:imp@bsdimp.com>> wrote: > > >I think that ufs relies on two ordering primitives, both marked > with > > >BIO_ORDERED today. > > >That's what most of the drivers key off of. We always set > BIO_ORDERED on > > >all the BIO_FLUSH > > >events as far as I Can tell. > > > > Thanks for that warning. I don't think geom_gate understands either > > B_BARRIER or BIO_ORDERED. I shall have a closer look. > > > > > > It needs to understand BIO_ORDERED. > > > > > > >to it. b*barrierwrite() sets this, and that's used in the > ffs_alloc code. > > > > In my case, I'm interested in ZFS, rather than UFS and it doesn't > seem > > to set B_BARRIER or BIO_ORDERED or indirectly. > > > > > > I went hunting ZFS for this year's ago and in the pre OpenZFS code they > were > > used, but there were three layers of indirection that obscured it. ZFS > doesn't > > use the buffer cache, so B_BARRIER isn't relevant. I'll see if I can > find it > > with the new code. > > > > But if it never sets BIO_ORDERED, drivers are already reordering things. > That's > > all any other driver in the tree worries about... > > Hmm... it looks like both the old and new (Open)ZFS use BIO_FLUSH command > without BIO_ORDERED flag. Not sure if it happens to do the right thing > anyway > or not. > It's an unordered flush then. The flush will happen whenever. I have a vague memory that ZFS will only issue this command in cases where there's no other I/O pending. It will be the only way for it to be reliable with nvme, since the BIO_FLUSH command isn't ordered w/o BIO_ORDERED flag. So ggate needn't do anything special for BIO_FLUSH, just BIO_ORDERED. Otherwise, it's free to reorder as it sees fit. The CAM I/O scheduler takes a little bit of liberty here, btw. It interprets BIO_ORDERED as being only wrt BIO_WRITE and BIO_FLUSH because if you schedule both a read and write, the results are undefined. nvd takes a stricter approach and honors the ordering more strictly. Warner --00000000000034caef05d70573b8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">= <div dir=3D"ltr" class=3D"gmail_attr">On Wed, Feb 2, 2022 at 2:05 AM Andriy= Gapon <<a href=3D"mailto:avg@freebsd.org">avg@freebsd.org</a>> wrote= :<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.= 8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 02/02/2022 = 09:58, Warner Losh wrote:<br> > <br> > <br> > On Wed, Feb 2, 2022, 12:49 AM Peter Jeremy <<a href=3D"mailto:peter= j@freebsd.org" target=3D"_blank">peterj@freebsd.org</a> <br> > <mailto:<a href=3D"mailto:peterj@freebsd.org" target=3D"_blank">pet= erj@freebsd.org</a>>> wrote:<br> > <br> >=C2=A0 =C2=A0 =C2=A0Thanks all for the very prompt responses.<br> > <br> >=C2=A0 =C2=A0 =C2=A0On 2022-Jan-28 22:32:02 -0700, Warner Losh <<a h= ref=3D"mailto:imp@bsdimp.com" target=3D"_blank">imp@bsdimp.com</a><br> >=C2=A0 =C2=A0 =C2=A0<mailto:<a href=3D"mailto:imp@bsdimp.com" target= =3D"_blank">imp@bsdimp.com</a>>> wrote:<br> >=C2=A0 =C2=A0 =C2=A0 >I think that ufs relies on two ordering primit= ives, both marked with<br> >=C2=A0 =C2=A0 =C2=A0 >BIO_ORDERED today.<br> >=C2=A0 =C2=A0 =C2=A0 >That's what most of the drivers key off of= . We always set BIO_ORDERED on<br> >=C2=A0 =C2=A0 =C2=A0 >all the BIO_FLUSH<br> >=C2=A0 =C2=A0 =C2=A0 >events as far as I Can tell.<br> > <br> >=C2=A0 =C2=A0 =C2=A0Thanks for that warning.=C2=A0 I don't think ge= om_gate understands either<br> >=C2=A0 =C2=A0 =C2=A0B_BARRIER or BIO_ORDERED.=C2=A0 I shall have a clos= er look.<br> > <br> > <br> > It needs to understand BIO_ORDERED.<br> > <br> > <br> >=C2=A0 =C2=A0 =C2=A0 >to it. b*barrierwrite() sets this, and that= 9;s used in the ffs_alloc code.<br> > <br> >=C2=A0 =C2=A0 =C2=A0In my case, I'm interested in ZFS, rather than = UFS and it doesn't seem<br> >=C2=A0 =C2=A0 =C2=A0to set B_BARRIER or BIO_ORDERED or indirectly.<br> > <br> > <br> > I went hunting ZFS for this year's ago and in the pre OpenZFS code= they were <br> > used, but there were three layers of indirection that obscured it. ZFS= doesn't <br> > use the buffer cache, so B_BARRIER isn't relevant. I'll see if= I can find it <br> > with the new code.<br> > <br> > But if it never sets BIO_ORDERED, drivers are already reordering thing= s. That's <br> > all any other driver in the tree worries about...<br> <br> Hmm... it looks like both the old and new (Open)ZFS use BIO_FLUSH command <= br> without BIO_ORDERED flag.=C2=A0 Not sure if it happens to do the right thin= g anyway <br> or not.<br></blockquote><div><br></div><div>It's an unordered flush the= n. The flush will happen whenever. I have a vague</div><div>memory that ZFS= will only issue this command in cases where there's no other I/O</div>= <div>pending. It will be the only way for it to be reliable with nvme, sinc= e the BIO_FLUSH</div><div>command isn't ordered w/o BIO_ORDERED flag. S= o ggate=C2=A0needn't do anything</div><div>special for BIO_FLUSH, just = BIO_ORDERED. Otherwise, it's free to reorder as it</div><div>sees fit.<= /div><div><br></div><div>The CAM I/O scheduler takes a little bit of libert= y here, btw. It interprets BIO_ORDERED</div><div>as being only wrt BIO_WRIT= E and BIO_FLUSH because if you schedule both a read</div><div>and write, th= e results are undefined. nvd takes a stricter approach and honors the order= ing</div><div>more strictly.</div><div><br></div><div>Warner</div><div><br>= </div></div></div> --00000000000034caef05d70573b8--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfp=0rbBkr4SoXhvn7hrQniPQzTeZra2HGBwXDGsJjN8XQ>