Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 Apr 2021 09:27:31 +0200
From:      Felix Palmen <felix@palmen-it.de>
To:        freebsd-stable@freebsd.org
Subject:   Re: Frequent disk I/O stalls while building (poudriere), processes in "zfs tear" state
Message-ID:  <20210416072731.yla76q7sbxmmapc6@nexus.home.palmen-it.de>
In-Reply-To: <c40d1470-4147-557b-3272-ecb79cd0cf1e@heuristicsystems.com.au>
References:  <20210412094411.j3s7us5ru2d7dzcz@nexus.home.palmen-it.de> <20210415162940.hoattch77lmoulih@nexus.home.palmen-it.de> <c40d1470-4147-557b-3272-ecb79cd0cf1e@heuristicsystems.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help

--x3cm7atp66gr6t23
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

* Dewayne Geraghty <dewayne@heuristicsystems.com.au> [20210416 06:26]:
> On 16/04/2021 2:29 am, Felix Palmen wrote:
> > Right now, I'm running a test with idprio 0 instead, which still seems
> > to have the desired effect, and so far, I didn't have any of these
> > stalls. If this persists, the problem is solved for me!
> >=20
> > I'd still be curious about what might be the cause, and, what this state
> > "zfs tear" actually means. But that's kind of an "academic interest"
> > now.
>=20
> Most likely your other processes are pre-empting your build, which is
> what you want :).

Yes, that's exactly the plan.

> Use /usr/bin/top to see the priority of the processes (ie under the  PRI
> column).  Using an idprio 22, means (on my 12.2Stable) a PRI of 146.  If
> your kern.sched.preempt_thresh is using the default (of 80) then
> processes with a PRI of <80 will preempt (for IO).

I was doing that a lot, that's how I found those "global" I/O stalls
were happening when some processes were in that "zfs tear" state (shown
in top only as "zfs te").

> Even with an idprio 0, the PRI is 124. So I suspect that was more a
> matter of timing (ie good luck).

That seems kind of unlikely because the behavior is pretty reproducible.
Having observed builds on idprio 0 (yes, this results in a priority of
124) for a while, I still see from time to time processes getting
"stuck" for a few seconds, mostly ccache processes, but now in state
"zfsvfs" and the rest of the system is not affected, I/O still works.

So, something did change with ZFS and priorities between 12.2 and 13.0.
Running the whole builds on idprio 22 worked fine on 12.2.

> You could increase your pre-emption threshold for the duration of the
> build, to include your nice value. But... (not really a good idea).

That would clearly defeat the purpose, yes ;)

> Re zfs - sorry, I'm peculiar and don't use it ;)

I suspect the relevant change to be exactly in that context, still
thanks for answering :) Now that I have a working solution, it isn't an
important issue for me any more. Curiosity remains=E2=80=A6

--=20
 Dipl.-Inform. Felix Palmen  <felix@palmen-it.de>   ,.//..........
 {web}  http://palmen-it.de  {jabber} [see email]   ,//palmen-it.de
 {pgp public key}     http://palmen-it.de/pub.txt   //   """""""""""
 {pgp fingerprint} A891 3D55 5F2E 3A74 3965 B997 3EF2 8B0A BC02 DA2A

--x3cm7atp66gr6t23
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQEzBAABCAAdFiEEqJE9VV8uOnQ5ZbmXPvKLCrwC2ioFAmB5PF4ACgkQPvKLCrwC
2irwJQf7BTDl4h3WG+M3jSkHDYmJAJaHMB4FeceTDfqG+fE8krBQHcGMBQf7F5ME
JWtOO0MqhHIJtBQOW5YDWzsITfWxyEgzIykY7oOuKQ1EFAazBCDoaHVyFDpMCPiy
63woYOvLXq7ck/leEc5vT8xlOPDP8Cuif5JwW+jC89zyK+tHd00MYiNcml0eKyIX
+aFh9SNnpOzJorHBPBySP+1hW4Y/iNVPhwRqFqnCM3V1g0cbh54Q/ObzyeE/gqlD
L+j57e0z88XQTcfdyzwGlMWWpnIZCUqIF9FJ0i87Vl2nkk+3bM/800yy71fbE9gB
j7U2fV26nJ3csGKeAkit5PL9xxuz5g==
=TV7O
-----END PGP SIGNATURE-----

--x3cm7atp66gr6t23--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20210416072731.yla76q7sbxmmapc6>