Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Apr 2021 11:44:11 +0200
From:      Felix Palmen <felix@palmen-it.de>
To:        freebsd-stable@freebsd.org
Subject:   Frequent disk I/O stalls while building (poudriere), processes in "zfs tear" state
Message-ID:  <20210412094411.j3s7us5ru2d7dzcz@nexus.home.palmen-it.de>

next in thread | raw e-mail | index | archive | help

--55ruzytbwkfudvy5
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello all,

since following the releng/13.0 branch, I experience stalled disk I/O
quite often (ca. once per minute) while building packages with poudriere.

What I can see in this case is the CPU going almost idle, and several
processes shown in `top` in state "zfs te" (and procstat shows "zfs
tear" for that). For up to several seconds, no disk I/O completes (even
starting a new process is impossible), then it recovers. Only two times,
I have seen the system going into a deadlock instead, with printing
messages similar to this to the serial console:

  swap_pager: indefinite wait buffer ...

I have this behavior since -RC3 (followed releng/13.0 now up to
-RELEASE). Before that, I had the vnlru-related problem that was fixed
with faa41af1fed350327cc542cb240ca2c6e1e8ba0c.

Some details:

* CPU: Intel(R) Xeon(R) CPU E3-1240L v5 @ 2.10GHz
* RAM: 64GB (ECC)
* Four HDDs (Seagate NAS models), 4TB each
* Swap 16GB, striped over the 4 disks
* Pool: 12TB raid-z on GELI-encrypted partitions. NOT upgraded yet, so I
  have a way back to 12.2.
* Two bhyve VMs running with 1GB and 8GB RAM, both wired
* Several jails running services like samba, an MTA, nginx...
* Several NFS shares mounted by other machines
* Poudriere running on idprio 22 with 8 parallel build jobs

Reducing the parallel jobs in poudriere also reduces the frequency of
the problem, but it doesn't seem to completely go away. Also, I have the
impression running into these stalls is more likely when a lot of
compilation jobs can be satisfied from ccache.

Thanks for any ideas and insight (e.g. what this "zfs tear" status
means).

Best regards,
Felix Palmen

--=20
 Dipl.-Inform. Felix Palmen  <felix@palmen-it.de>   ,.//..........
 {web}  http://palmen-it.de  {jabber} [see email]   ,//palmen-it.de
 {pgp public key}     http://palmen-it.de/pub.txt   //   """""""""""
 {pgp fingerprint} A891 3D55 5F2E 3A74 3965 B997 3EF2 8B0A BC02 DA2A

--55ruzytbwkfudvy5
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQEzBAABCAAdFiEEqJE9VV8uOnQ5ZbmXPvKLCrwC2ioFAmB0FmQACgkQPvKLCrwC
2iqLNAgAlNuixljbKlN/GE/e6GG/Bm2uAdmia1agM0oLgEjdwmmzfC1JYK9QcO98
tw2jj75M1DlVSascUaTu7rSZ2TKFcFhVw7jb0ak6EPOgP7RRXCUZNPuuY/sSKF1C
zJG1m7B7W0BHZWaKMLFduuP1TejNErHPN9hjJS0Jrs8sNHlPgQrtkKZcoqawj9tG
On3uhXhQkGrMf0Y2agsMVpkcNWVitOKgKpaSKkvPnUJLfl2XQLWkwHHjGQw5u6xZ
tXv+iWLJWQ/FqvFrR2rmNVcSrmei2N0jEhj6LuYIRt/ggA7aYO7fUxh4RpqQHaHi
h3HwUxhaHKrFL5ULyhCMtj27qQbzoA==
=tmH0
-----END PGP SIGNATURE-----

--55ruzytbwkfudvy5--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20210412094411.j3s7us5ru2d7dzcz>