Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Feb 2022 17:48:14 -0800
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Peter Jeremy <peterj@freebsd.org>
Cc:        FreeBSD FS <freebsd-fs@freebsd.org>, "freebsd-geom@FreeBSD.org" <freebsd-geom@freebsd.org>
Subject:   Re: bio re-ordering
Message-ID:  <20220218014814.GJ97875@funkthat.com>
In-Reply-To: <Yf5IUCWW/tgI/Cse@server.rulingia.com>
References:  <YfTCs7j3TPZFcFCD@server.rulingia.com> <YfTEj1KLhQhoR3xP@kib.kiev.ua> <CANCZdfoqQ3Ze%2BcMTsk_ho9x8hsSM9=fTavSao%2BUtwc2nSAEJpQ@mail.gmail.com> <Yfo3i9Yy/uCUpss1@server.rulingia.com> <CANCZdfqBQOvzMCrJxWq9GzqCKyK_AubBE1CxAW5FULnE7D_jrg@mail.gmail.com> <b75872f4-521b-5eab-68d0-4b1c04a10add@FreeBSD.org> <CANCZdfp=0rbBkr4SoXhvn7hrQniPQzTeZra2HGBwXDGsJjN8XQ@mail.gmail.com> <9848cde6-5c12-cdd4-e722-42fe26fa0349@FreeBSD.org> <Yf5IUCWW/tgI/Cse@server.rulingia.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--Md/poaVZ8hnGTzuv
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Peter Jeremy wrote this message on Sat, Feb 05, 2022 at 20:50 +1100:
> On 2022-Feb-02 11:49:44 +0200, Andriy Gapon <avg@freebsd.org> wrote:
> >On 02/02/2022 11:14, Warner Losh wrote:
> >> On Wed, Feb 2, 2022 at 2:05 AM Andriy Gapon <avg@freebsd.org=20
> >> <mailto:avg@freebsd.org>> wrote:
> >>     Hmm... it looks like both the old and new (Open)ZFS use BIO_FLUSH =
command
> >>     without BIO_ORDERED flag.=A0 Not sure if it happens to do the righ=
t thing anyway
> >>     or not.
> >>=20
> >>=20
> >> It's an unordered flush then. The flush will happen whenever. I have a=
 vague
> >> memory that ZFS will only issue this command in cases where there's no=
 other I/O
> >> pending.
> >
> >I think that there is still a potential problem that an earlier write re=
quest=20
> >might get re-ordered after the flush.
> >I think that we should add BIO_ORDERED for correctness.
>=20
> I've raised https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D261731 to
> make geom_gate support BIO_ORDERED.  Exposing the BIO_ORDERED flag to
> userland is quite easy (once a decision is made as to how to do that).
> Enhancing the geom_gate clients to correctly implement BIO_ORDERED is
> somewhat harder.

The clients are single threaded wrt IOs, so I don't think updating them
are required.

I do have patches to improve things by making ggated multithreaded to
improve IOPs, and so making this improvement would allow those patches
to be useful.

I do have a question though, what is the exact semantics of _ORDERED?

Does all the previous IOs have to be ack'd/received by the kernel before
executing them, OR can once ggated, for example, received notification
that the writes before an _ORDERED completes, that it can then execute
the _ORDERED command w/o the other side receiving it?

The reason I ask, is that if the connection is broken before the kernel
ack's the pre-_ORDERED bios, but after the _ORDERED bio has been written,
what are the implications?

I can think of an issue where the pre and _ORDERED bio is overlapping
that might cause issue.  Here is the scenario that I'm thinking of.

_WRITE 16 sectors at offset 0
_WRITE _ORDERED 16 sectors at offset 8
connection is now broken
ggate reconnects
kernel reissues both IOs.
_WRITE 16 sectors at offset 0
kernel crashes before the second _WRITE happens and needs to read the
data.

We now have a situation where sectors 16-24 have "new" data, while
sectors 8-16 have "old" data on them, which may corrupt what a FS
thinks.

And right now, the ggate protocol (from what I remember) doesn't have
a way to know when the remote kernel has received notification that an
IO is complete.

I guess this situation isn't any worse than it is right now w/o passing
the _ORDERED flag down though.

> I've done some experiments and OpenZFS doesn't generate BIO_ORDERED
> operations so I've also raised https://github.com/openzfs/zfs/issues/13065
> I haven't looked into how difficult that would be to fix.

--=20
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

--Md/poaVZ8hnGTzuv
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQJ8BAEBCgBmBQJiDvrdXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQ2MEI1RTRGMTNDNzYyMDZDNjEyMDBCNjAy
MDVGMEIzM0REMDA2QURBAAoJECBfCzPdAGraMXQP+wYeZjbb7MhdsnrY5nkmPzlY
IUdJgEuU1obovHyakyUrhRLaRmnseyQriRtCm0kBgbcn2+hrq1CCA6+5fqifOfnX
9LS52440vXSbpQn9fybLNKcBLVZiaunqkG9NuuQEJO+b1Svdvfafz3EddH35xLMd
ITxWh3uzEFYra/tsAZjZLfC1D3nbEKJt1WaEMINu+x6Chw8v9u3Gd+yUR+C51aVi
2K1JD/oEFBplB5uKBrMm4Cl/xBjoDwoOCsInWCR9D+YDrmLopZ0Ssj6GMO4HHFxA
Lr+VWRGaY6Vx/2u48bTcxaye/TIMkc94wLeqFa32pIYdC/fSRWz71O+cJcupj0DD
KOgmldm819FZPjT8+yq28nX4YptyU5YDxH8Un+z7a98AbqP7pfQ8sx4tmJhxgVZM
OddFW9VrGXOLGYSqL1J3ILvZmN+WUhWtt4ffSLfWT3iZhX1qCuoYrPu0Wt5I1QYa
x3E3zFF8KHlFwq8hU2EMOxrDZYKqhEW1umq81mifVKRmvYf/6hpDiij11CVf3mfw
8yZjYnu+4hFCYJoXTSKh9GYue80eLFUBNIpM9bXPphzUIng4uQEG2AjhsCMkzTS/
7cejdHcgDT8xDYnKu8/+QD0w9ehDj7shT9lQFjcIMpqUtr9YVre6OR3vK5kCs9tk
W85q5bZU+rZF2I8/70IH
=GiPK
-----END PGP SIGNATURE-----

--Md/poaVZ8hnGTzuv--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20220218014814.GJ97875>