From nobody Wed Feb 2 09:14:40 2022 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id DFE2F19BBDCC for ; Wed, 2 Feb 2022 09:14:52 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-vk1-xa34.google.com (mail-vk1-xa34.google.com [IPv6:2607:f8b0:4864:20::a34]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JpbjS0fbkz3MLT for ; Wed, 2 Feb 2022 09:14:51 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-vk1-xa34.google.com with SMTP id w17so12183140vko.9 for ; Wed, 02 Feb 2022 01:14:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=URcyLJzFWa6G0qMq/BWFVhe3ys6GDyOqOKh8uu9u20I=; b=JvjbNKtCnWcU+VzmwuDnmkWiS0ETrnnLtaXp3h57YDXdTJcPIoxqGQQlF23qGF9qTy sIzqnOjRUDuqe/U7dkSTRK01QsqelWvQcIbUeMrF8+a1qunliYqrHD4Xy6HYPajf1WRs hat+hmF0zCFA6vCU6RSuTuIoPHUT5F9sZaOMJw0e//lWj42U9IKHMQM5YytDxZndSxOH yY4+Oeuduges5nvylgVsS3UmJJS0QGfVHT2DO8tRlzF7kM7HMaQWr2MGjrImdRNnwXBN Lo0unMe30et12qhiSF/0Kxx2HMG8riY28Yk6lOSFeg+G+A21JULGXpESlQ+HQ9ZVbm3e TQ2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=URcyLJzFWa6G0qMq/BWFVhe3ys6GDyOqOKh8uu9u20I=; b=udsUAv1cHa4VpPySd41/T9aZowCcMB/M1xuAIwiL5bIDcYuZnTQ5ADkubpmh0ZIB7O P03o3lRTvmZ7RkdxZdvPNvH9xiXwbAd+k1azttgD4ZuR0tAnoJs7cWKPSoFOTJ2i0K4b 6Jy7K+cmADIglfMpBXgY+UVdtM2Q8oiR0CsuXzf0K1vhumT0smv81OPcY4NQIUWd+U7q ZmjFqcTfufpVdxUjrrpWOv5qIYfZTBGU7+SWbHR28yD23TuUnqQp4JjuqyoZsnwjox3V ZC8wYh5989qawtJRI3nmfYzfnanDWeewq73w5IRuA5UBpPaHxHm7ACMIZpTuWMrqRShm vDbQ== X-Gm-Message-State: AOAM533oVPYlLCd44xBnq7WNO0a3WDek4fVw6gagkjJFLrLrO8F3Tr/h 0fE22lGGfw8XkK+5PzKq6gDyGl+UwI5zNTafGRono3+4CSw= X-Google-Smtp-Source: ABdhPJz1PRBMGRZTlwV0STGZgDiv7+29u6aTAYeS+ge7KRwh9e6TthDURt1nYgZW3/pWQjCJ3t4p4R01FBOyMDVmO70= X-Received: by 2002:a05:6122:134e:: with SMTP id f14mr11960206vkp.27.1643793291194; Wed, 02 Feb 2022 01:14:51 -0800 (PST) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Warner Losh Date: Wed, 2 Feb 2022 02:14:40 -0700 Message-ID: Subject: Re: bio re-ordering To: Andriy Gapon Cc: Peter Jeremy , Konstantin Belousov , FreeBSD FS , "freebsd-geom@FreeBSD.org" Content-Type: multipart/alternative; boundary="00000000000034caef05d70573b8" X-Rspamd-Queue-Id: 4JpbjS0fbkz3MLT X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20210112.gappssmtp.com header.s=20210112 header.b=JvjbNKtC; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::a34) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-3.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20210112.gappssmtp.com:s=20210112]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20210112.gappssmtp.com:+]; NEURAL_HAM_SHORT(-1.00)[-0.999]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::a34:from]; MLMMJ_DEST(0.00)[freebsd-fs]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; R_SPF_NA(0.00)[no SPF record]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RCVD_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; FREEMAIL_CC(0.00)[freebsd.org,gmail.com] X-ThisMailContainsUnwantedMimeParts: N --00000000000034caef05d70573b8 Content-Type: text/plain; charset="UTF-8" On Wed, Feb 2, 2022 at 2:05 AM Andriy Gapon wrote: > On 02/02/2022 09:58, Warner Losh wrote: > > > > > > On Wed, Feb 2, 2022, 12:49 AM Peter Jeremy > > wrote: > > > > Thanks all for the very prompt responses. > > > > On 2022-Jan-28 22:32:02 -0700, Warner Losh > > wrote: > > >I think that ufs relies on two ordering primitives, both marked > with > > >BIO_ORDERED today. > > >That's what most of the drivers key off of. We always set > BIO_ORDERED on > > >all the BIO_FLUSH > > >events as far as I Can tell. > > > > Thanks for that warning. I don't think geom_gate understands either > > B_BARRIER or BIO_ORDERED. I shall have a closer look. > > > > > > It needs to understand BIO_ORDERED. > > > > > > >to it. b*barrierwrite() sets this, and that's used in the > ffs_alloc code. > > > > In my case, I'm interested in ZFS, rather than UFS and it doesn't > seem > > to set B_BARRIER or BIO_ORDERED or indirectly. > > > > > > I went hunting ZFS for this year's ago and in the pre OpenZFS code they > were > > used, but there were three layers of indirection that obscured it. ZFS > doesn't > > use the buffer cache, so B_BARRIER isn't relevant. I'll see if I can > find it > > with the new code. > > > > But if it never sets BIO_ORDERED, drivers are already reordering things. > That's > > all any other driver in the tree worries about... > > Hmm... it looks like both the old and new (Open)ZFS use BIO_FLUSH command > without BIO_ORDERED flag. Not sure if it happens to do the right thing > anyway > or not. > It's an unordered flush then. The flush will happen whenever. I have a vague memory that ZFS will only issue this command in cases where there's no other I/O pending. It will be the only way for it to be reliable with nvme, since the BIO_FLUSH command isn't ordered w/o BIO_ORDERED flag. So ggate needn't do anything special for BIO_FLUSH, just BIO_ORDERED. Otherwise, it's free to reorder as it sees fit. The CAM I/O scheduler takes a little bit of liberty here, btw. It interprets BIO_ORDERED as being only wrt BIO_WRITE and BIO_FLUSH because if you schedule both a read and write, the results are undefined. nvd takes a stricter approach and honors the ordering more strictly. Warner --00000000000034caef05d70573b8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Wed, Feb 2, 2022 at 2:05 AM Andriy= Gapon <avg@freebsd.org> wrote= :
On 02/02/2022 = 09:58, Warner Losh wrote:
>
>
> On Wed, Feb 2, 2022, 12:49 AM Peter Jeremy <peterj@freebsd.org
> <mailto:pet= erj@freebsd.org>> wrote:
>
>=C2=A0 =C2=A0 =C2=A0Thanks all for the very prompt responses.
>
>=C2=A0 =C2=A0 =C2=A0On 2022-Jan-28 22:32:02 -0700, Warner Losh <imp@bsdimp.com
>=C2=A0 =C2=A0 =C2=A0<mailto:imp@bsdimp.com>> wrote:
>=C2=A0 =C2=A0 =C2=A0 >I think that ufs relies on two ordering primit= ives, both marked with
>=C2=A0 =C2=A0 =C2=A0 >BIO_ORDERED today.
>=C2=A0 =C2=A0 =C2=A0 >That's what most of the drivers key off of= . We always set BIO_ORDERED on
>=C2=A0 =C2=A0 =C2=A0 >all the BIO_FLUSH
>=C2=A0 =C2=A0 =C2=A0 >events as far as I Can tell.
>
>=C2=A0 =C2=A0 =C2=A0Thanks for that warning.=C2=A0 I don't think ge= om_gate understands either
>=C2=A0 =C2=A0 =C2=A0B_BARRIER or BIO_ORDERED.=C2=A0 I shall have a clos= er look.
>
>
> It needs to understand BIO_ORDERED.
>
>
>=C2=A0 =C2=A0 =C2=A0 >to it. b*barrierwrite() sets this, and that= 9;s used in the ffs_alloc code.
>
>=C2=A0 =C2=A0 =C2=A0In my case, I'm interested in ZFS, rather than = UFS and it doesn't seem
>=C2=A0 =C2=A0 =C2=A0to set B_BARRIER or BIO_ORDERED or indirectly.
>
>
> I went hunting ZFS for this year's ago and in the pre OpenZFS code= they were
> used, but there were three layers of indirection that obscured it. ZFS= doesn't
> use the buffer cache, so B_BARRIER isn't relevant. I'll see if= I can find it
> with the new code.
>
> But if it never sets BIO_ORDERED, drivers are already reordering thing= s. That's
> all any other driver in the tree worries about...

Hmm... it looks like both the old and new (Open)ZFS use BIO_FLUSH command <= br> without BIO_ORDERED flag.=C2=A0 Not sure if it happens to do the right thin= g anyway
or not.

It's an unordered flush the= n. The flush will happen whenever. I have a vague
memory that ZFS= will only issue this command in cases where there's no other I/O
=
pending. It will be the only way for it to be reliable with nvme, sinc= e the BIO_FLUSH
command isn't ordered w/o BIO_ORDERED flag. S= o ggate=C2=A0needn't do anything
special for BIO_FLUSH, just = BIO_ORDERED. Otherwise, it's free to reorder as it
sees fit.<= /div>

The CAM I/O scheduler takes a little bit of libert= y here, btw. It interprets BIO_ORDERED
as being only wrt BIO_WRIT= E and BIO_FLUSH because if you schedule both a read
and write, th= e results are undefined. nvd takes a stricter approach and honors the order= ing
more strictly.

Warner

=
--00000000000034caef05d70573b8--