Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Apr 2022 18:24:40 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Alan Somers <asomers@freebsd.org>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: Hour-long sleeps in the ZFS write throttle: fix for 13.1 ?
Message-ID:  <CANCZdfpnE2S2uAdy81KL4mmJLAu_b2gjn59Eh%2BesOZswM8eX8A@mail.gmail.com>
In-Reply-To: <CAOtMX2j9_saonWpyUERdkKj-cPdWzsyWNGQSUcEDOa8nBF3r=w@mail.gmail.com>
References:  <CAOtMX2j9_saonWpyUERdkKj-cPdWzsyWNGQSUcEDOa8nBF3r=w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000cae50605dbf16376
Content-Type: text/plain; charset="UTF-8"

On Tue, Apr 5, 2022 at 3:06 PM Alan Somers <asomers@freebsd.org> wrote:

> All year long I've occasionally seen my ZFS processes get blocked in
> dmu_tx_wait.  They stay blocked for more than an hour but eventually
> recover.  I finally found the cause: an integer overflow bug in
> ustosbt.  The fix is simple enough, but my question is: should we try
> to commit this in time for 13.1-RELEASE?  It's a very disruptive bug,
> but also very hard to trigger.  It takes a pretty highly congested ZFS
> system to trigger it.  In theory the bug could affect other
> subsystems, too.
>
> https://github.com/openzfs/zfs/issues/13289
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=263073


These routines were originally not meant for large times (> 1s). However,
that was poorly documented and so I fixed it. But did so incorrectly.
If you look at the bug, I've posted what I think is the fix (it also matches
Alan's description).

Warner

--000000000000cae50605dbf16376
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Tue, Apr 5, 2022 at 3:06 PM Alan S=
omers &lt;<a href=3D"mailto:asomers@freebsd.org">asomers@freebsd.org</a>&gt=
; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px=
 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">All yea=
r long I&#39;ve occasionally seen my ZFS processes get blocked in<br>
dmu_tx_wait.=C2=A0 They stay blocked for more than an hour but eventually<b=
r>
recover.=C2=A0 I finally found the cause: an integer overflow bug in<br>
ustosbt.=C2=A0 The fix is simple enough, but my question is: should we try<=
br>
to commit this in time for 13.1-RELEASE?=C2=A0 It&#39;s a very disruptive b=
ug,<br>
but also very hard to trigger.=C2=A0 It takes a pretty highly congested ZFS=
<br>
system to trigger it.=C2=A0 In theory the bug could affect other<br>
subsystems, too.<br>
<br>
<a href=3D"https://github.com/openzfs/zfs/issues/13289" rel=3D"noreferrer" =
target=3D"_blank">https://github.com/openzfs/zfs/issues/13289</a><br>;
<a href=3D"https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D263073" rel=
=3D"noreferrer" target=3D"_blank">https://bugs.freebsd.org/bugzilla/show_bu=
g.cgi?id=3D263073</a></blockquote><div><br></div><div>These routines were o=
riginally not meant for large times (&gt; 1s). However,</div><div>that was =
poorly documented and so I fixed it. But did so incorrectly.<br></div><div>=
If you look at the bug, I&#39;ve posted what I think is the fix (it also ma=
tches</div><div>Alan&#39;s description).</div><div><br></div><div>Warner</d=
iv></div></div>

--000000000000cae50605dbf16376--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpnE2S2uAdy81KL4mmJLAu_b2gjn59Eh%2BesOZswM8eX8A>