From owner-freebsd-stable@freebsd.org Fri Apr 16 07:27:36 2021 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1A92A5E7967 for ; Fri, 16 Apr 2021 07:27:36 +0000 (UTC) (envelope-from felix@palmen-it.de) Received: from stef.palmen-it.de (stef.palmen-it.de [84.38.67.7]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4FM78R1z2lz3NZS for ; Fri, 16 Apr 2021 07:27:34 +0000 (UTC) (envelope-from felix@palmen-it.de) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=palmen-it.de; s=20200414; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:To:From:Date:Sender:Reply-To:Cc: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=zlAm1WH7lhmvX9/TSmrHW8QEGfOK30uypItPGEnrtA0=; b=BcYxOgfPFw1blvv4pnTtD0dly 0XqLLSpF5DkhUmOvG8VnGDGMP/cjW/EN171f6jtCE3gqrkofkttRqKd/SgneFxWRV09uZamOp7NR1 j0nnEixXhQlPKiqsc/dELyBpadtZQfYVzOiiPQZcWH7CQn+l3nL0YZGAgENuPDJBp2Y7OE2XYTpyT MWV+FiyZvoyQUIG4t7gPcAmMSbsIWUddBoqZn1pTAWZrgY7Bfdg1Gb/l//9GidqRhn/rDOattPKs6 C9CoB2ivD3y9uGlhGerPp3qguQgLno2ZD7w2t6ZX9QlDz6sYTpcB+q37IsU5r/3JLTrrQAMqLVBqy J7YLnCVAg==; Received: from [192.168.71.101] (helo=mail.home.palmen-it.de) by stef.palmen-it.de with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lXIt2-0003GZ-9g for freebsd-stable@freebsd.org; Fri, 16 Apr 2021 09:27:32 +0200 Received: from nexus.home.palmen-it.de ([192.168.99.2]) by mail.home.palmen-it.de with esmtpsa (TLS1.3) tls TLS_CHACHA20_POLY1305_SHA256 (Exim 4.94 (FreeBSD)) (envelope-from ) id 1lXIt2-00083I-1J for freebsd-stable@freebsd.org; Fri, 16 Apr 2021 07:27:32 +0000 Date: Fri, 16 Apr 2021 09:27:31 +0200 From: Felix Palmen To: freebsd-stable@freebsd.org Subject: Re: Frequent disk I/O stalls while building (poudriere), processes in "zfs tear" state Message-ID: <20210416072731.yla76q7sbxmmapc6@nexus.home.palmen-it.de> Mail-Followup-To: freebsd-stable@freebsd.org X-Face: /1K@t"h.}e~pR@]c7HorQ!T`F^RJCa'BCr#e>IKA{>C/9OTGB4|xh"y2{?1Z5M i2w"AH^pN_LlHR^{+f',_Np~; .B; !M/bL}*qk]p5*r7F5vW}; {:@4u5S?T&f0$7BJ-71Q5SV]:v$`5 A0[DZ:=?S52x8HJ~5@^P_\T@MsjG{R( Organization: palmen-it.de References: <20210412094411.j3s7us5ru2d7dzcz@nexus.home.palmen-it.de> <20210415162940.hoattch77lmoulih@nexus.home.palmen-it.de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="x3cm7atp66gr6t23" Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20210205 X-Rspamd-Queue-Id: 4FM78R1z2lz3NZS X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=palmen-it.de header.s=20200414 header.b=BcYxOgfP; dmarc=pass (policy=none) header.from=palmen-it.de; spf=pass (mx1.freebsd.org: domain of felix@palmen-it.de designates 84.38.67.7 as permitted sender) smtp.mailfrom=felix@palmen-it.de X-Spamd-Result: default: False [-4.29 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:84.38.67.7:c]; TO_DN_NONE(0.00)[]; HAS_ORG_HEADER(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[palmen-it.de:+]; DMARC_POLICY_ALLOW(-0.50)[palmen-it.de,none]; NEURAL_HAM_SHORT(-0.09)[-0.092]; SIGNED_PGP(-2.00)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RBL_DBL_DONT_QUERY_IPS(0.00)[84.38.67.7:from]; ASN(0.00)[asn:204119, ipnet:84.38.64.0/20, country:DE]; RCVD_IN_DNSWL_LOW(-0.10)[84.38.67.7:from]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[palmen-it.de:s=20200414]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.20)[multipart/signed,text/plain]; NEURAL_SPAM_MEDIUM(1.00)[1.000]; RCPT_COUNT_ONE(0.00)[1]; DWL_DNSWL_LOW(-1.00)[palmen-it.de:dkim]; SPAMHAUS_ZRD(0.00)[84.38.67.7:from:127.0.2.255]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-stable] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Apr 2021 07:27:36 -0000 --x3cm7atp66gr6t23 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable * Dewayne Geraghty [20210416 06:26]: > On 16/04/2021 2:29 am, Felix Palmen wrote: > > Right now, I'm running a test with idprio 0 instead, which still seems > > to have the desired effect, and so far, I didn't have any of these > > stalls. If this persists, the problem is solved for me! > >=20 > > I'd still be curious about what might be the cause, and, what this state > > "zfs tear" actually means. But that's kind of an "academic interest" > > now. >=20 > Most likely your other processes are pre-empting your build, which is > what you want :). Yes, that's exactly the plan. > Use /usr/bin/top to see the priority of the processes (ie under the PRI > column). Using an idprio 22, means (on my 12.2Stable) a PRI of 146. If > your kern.sched.preempt_thresh is using the default (of 80) then > processes with a PRI of <80 will preempt (for IO). I was doing that a lot, that's how I found those "global" I/O stalls were happening when some processes were in that "zfs tear" state (shown in top only as "zfs te"). > Even with an idprio 0, the PRI is 124. So I suspect that was more a > matter of timing (ie good luck). That seems kind of unlikely because the behavior is pretty reproducible. Having observed builds on idprio 0 (yes, this results in a priority of 124) for a while, I still see from time to time processes getting "stuck" for a few seconds, mostly ccache processes, but now in state "zfsvfs" and the rest of the system is not affected, I/O still works. So, something did change with ZFS and priorities between 12.2 and 13.0. Running the whole builds on idprio 22 worked fine on 12.2. > You could increase your pre-emption threshold for the duration of the > build, to include your nice value. But... (not really a good idea). That would clearly defeat the purpose, yes ;) > Re zfs - sorry, I'm peculiar and don't use it ;) I suspect the relevant change to be exactly in that context, still thanks for answering :) Now that I have a working solution, it isn't an important issue for me any more. Curiosity remains=E2=80=A6 --=20 Dipl.-Inform. Felix Palmen ,.//.......... {web} http://palmen-it.de {jabber} [see email] ,//palmen-it.de {pgp public key} http://palmen-it.de/pub.txt // """"""""""" {pgp fingerprint} A891 3D55 5F2E 3A74 3965 B997 3EF2 8B0A BC02 DA2A --x3cm7atp66gr6t23 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEqJE9VV8uOnQ5ZbmXPvKLCrwC2ioFAmB5PF4ACgkQPvKLCrwC 2irwJQf7BTDl4h3WG+M3jSkHDYmJAJaHMB4FeceTDfqG+fE8krBQHcGMBQf7F5ME JWtOO0MqhHIJtBQOW5YDWzsITfWxyEgzIykY7oOuKQ1EFAazBCDoaHVyFDpMCPiy 63woYOvLXq7ck/leEc5vT8xlOPDP8Cuif5JwW+jC89zyK+tHd00MYiNcml0eKyIX +aFh9SNnpOzJorHBPBySP+1hW4Y/iNVPhwRqFqnCM3V1g0cbh54Q/ObzyeE/gqlD L+j57e0z88XQTcfdyzwGlMWWpnIZCUqIF9FJ0i87Vl2nkk+3bM/800yy71fbE9gB j7U2fV26nJ3csGKeAkit5PL9xxuz5g== =TV7O -----END PGP SIGNATURE----- --x3cm7atp66gr6t23--