Date: Fri, 16 Apr 2021 09:27:31 +0200 From: Felix Palmen <felix@palmen-it.de> To: freebsd-stable@freebsd.org Subject: Re: Frequent disk I/O stalls while building (poudriere), processes in "zfs tear" state Message-ID: <20210416072731.yla76q7sbxmmapc6@nexus.home.palmen-it.de> In-Reply-To: <c40d1470-4147-557b-3272-ecb79cd0cf1e@heuristicsystems.com.au> References: <20210412094411.j3s7us5ru2d7dzcz@nexus.home.palmen-it.de> <20210415162940.hoattch77lmoulih@nexus.home.palmen-it.de> <c40d1470-4147-557b-3272-ecb79cd0cf1e@heuristicsystems.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
--x3cm7atp66gr6t23 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable * Dewayne Geraghty <dewayne@heuristicsystems.com.au> [20210416 06:26]: > On 16/04/2021 2:29 am, Felix Palmen wrote: > > Right now, I'm running a test with idprio 0 instead, which still seems > > to have the desired effect, and so far, I didn't have any of these > > stalls. If this persists, the problem is solved for me! > >=20 > > I'd still be curious about what might be the cause, and, what this state > > "zfs tear" actually means. But that's kind of an "academic interest" > > now. >=20 > Most likely your other processes are pre-empting your build, which is > what you want :). Yes, that's exactly the plan. > Use /usr/bin/top to see the priority of the processes (ie under the PRI > column). Using an idprio 22, means (on my 12.2Stable) a PRI of 146. If > your kern.sched.preempt_thresh is using the default (of 80) then > processes with a PRI of <80 will preempt (for IO). I was doing that a lot, that's how I found those "global" I/O stalls were happening when some processes were in that "zfs tear" state (shown in top only as "zfs te"). > Even with an idprio 0, the PRI is 124. So I suspect that was more a > matter of timing (ie good luck). That seems kind of unlikely because the behavior is pretty reproducible. Having observed builds on idprio 0 (yes, this results in a priority of 124) for a while, I still see from time to time processes getting "stuck" for a few seconds, mostly ccache processes, but now in state "zfsvfs" and the rest of the system is not affected, I/O still works. So, something did change with ZFS and priorities between 12.2 and 13.0. Running the whole builds on idprio 22 worked fine on 12.2. > You could increase your pre-emption threshold for the duration of the > build, to include your nice value. But... (not really a good idea). That would clearly defeat the purpose, yes ;) > Re zfs - sorry, I'm peculiar and don't use it ;) I suspect the relevant change to be exactly in that context, still thanks for answering :) Now that I have a working solution, it isn't an important issue for me any more. Curiosity remains=E2=80=A6 --=20 Dipl.-Inform. Felix Palmen <felix@palmen-it.de> ,.//.......... {web} http://palmen-it.de {jabber} [see email] ,//palmen-it.de {pgp public key} http://palmen-it.de/pub.txt // """"""""""" {pgp fingerprint} A891 3D55 5F2E 3A74 3965 B997 3EF2 8B0A BC02 DA2A --x3cm7atp66gr6t23 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEqJE9VV8uOnQ5ZbmXPvKLCrwC2ioFAmB5PF4ACgkQPvKLCrwC 2irwJQf7BTDl4h3WG+M3jSkHDYmJAJaHMB4FeceTDfqG+fE8krBQHcGMBQf7F5ME JWtOO0MqhHIJtBQOW5YDWzsITfWxyEgzIykY7oOuKQ1EFAazBCDoaHVyFDpMCPiy 63woYOvLXq7ck/leEc5vT8xlOPDP8Cuif5JwW+jC89zyK+tHd00MYiNcml0eKyIX +aFh9SNnpOzJorHBPBySP+1hW4Y/iNVPhwRqFqnCM3V1g0cbh54Q/ObzyeE/gqlD L+j57e0z88XQTcfdyzwGlMWWpnIZCUqIF9FJ0i87Vl2nkk+3bM/800yy71fbE9gB j7U2fV26nJ3csGKeAkit5PL9xxuz5g== =TV7O -----END PGP SIGNATURE----- --x3cm7atp66gr6t23--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20210416072731.yla76q7sbxmmapc6>