From owner-freebsd-stable@freebsd.org Thu Apr 15 20:29:26 2021 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 310865D6487 for ; Thu, 15 Apr 2021 20:29:26 +0000 (UTC) (envelope-from dewayne@heuristicsystems.com.au) Received: from hermes.heuristicsystems.com.au (hermes.heuristicsystems.com.au [203.41.22.115]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2560 bits) client-digest SHA256) (Client CN "hermes.heuristicsystems.com.au", Issuer "Heuristic Systems Type 4 Host CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FLrXz72JQz4pdl for ; Thu, 15 Apr 2021 20:29:23 +0000 (UTC) (envelope-from dewayne@heuristicsystems.com.au) Received: from [10.0.5.3] (noddy.hs [10.0.5.3]) (authenticated bits=0) by hermes.heuristicsystems.com.au (8.15.2/8.15.2) with ESMTPSA id 13FKRiMS022844 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT) for ; Fri, 16 Apr 2021 06:27:45 +1000 (AEST) (envelope-from dewayne@heuristicsystems.com.au) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=heuristicsystems.com.au; s=hsa; t=1618518465; x=1619123266; bh=cmcYdvB8eeQd8oyqDa9VRGxg2eP+jCvrineUCbY9/n8=; h=Subject:To:From:Message-ID:Date; b=LTifbOnKoHtqRaWtEIuPQVojKUz9cPYvLiOIt0ESS84YC3SDXZPlfDP2xd+Dul7Ch P/0LBeRu+er65HqACGx0Wiwr45QPmgMVhN/hyh02K9+Wsp/QQgaBwr6kSIiWWHsMk5 SqFefub0S2/J4MzOb/HzGJxVPkbg5UTaaG3LRFAXf8HN8QwwOJtRe X-Authentication-Warning: b3.hs: Host noddy.hs [10.0.5.3] claimed to be [10.0.5.3] Subject: Re: Frequent disk I/O stalls while building (poudriere), processes in "zfs tear" state To: freebsd-stable@freebsd.org References: <20210412094411.j3s7us5ru2d7dzcz@nexus.home.palmen-it.de> <20210415162940.hoattch77lmoulih@nexus.home.palmen-it.de> From: Dewayne Geraghty Message-ID: Date: Fri, 16 Apr 2021 06:26:24 +1000 User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <20210415162940.hoattch77lmoulih@nexus.home.palmen-it.de> Content-Type: text/plain; charset=windows-1252 Content-Language: en-GB Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4FLrXz72JQz4pdl X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=pass header.d=heuristicsystems.com.au header.s=hsa header.b=LTifbOnK; dmarc=none; spf=pass (mx1.freebsd.org: domain of dewayne@heuristicsystems.com.au designates 203.41.22.115 as permitted sender) smtp.mailfrom=dewayne@heuristicsystems.com.au X-Spamd-Result: default: False [-6.19 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; HAS_XAW(0.00)[]; RCVD_DKIM_ARC_DNSWL_MED(-0.50)[]; TO_DN_NONE(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[203.41.22.115:from]; DKIM_TRACE(0.00)[heuristicsystems.com.au:+]; NEURAL_HAM_SHORT(-0.99)[-0.993]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:1221, ipnet:203.40.0.0/13, country:AU]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[heuristicsystems.com.au:s=hsa]; FROM_HAS_DN(0.00)[]; DWL_DNSWL_MED(-2.00)[heuristicsystems.com.au:dkim]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; DMARC_NA(0.00)[heuristicsystems.com.au]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-stable] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2021 20:29:26 -0000 On 16/04/2021 2:29 am, Felix Palmen wrote: > After more experimentation, I finally found what's causing these > problems for me on 13: > > * Felix Palmen [20210412 11:44]: >> * Poudriere running on idprio 22 with 8 parallel build jobs > > Running poudriere with normal priority works perfectly fine. Now, I've > had poudriere running on idprio because there are several other services > on that machine that shouldn't be slowed down by running a heavy build > and I still want to use all the CPU resources available for building. > > Right now, I'm running a test with idprio 0 instead, which still seems > to have the desired effect, and so far, I didn't have any of these > stalls. If this persists, the problem is solved for me! > > I'd still be curious about what might be the cause, and, what this state > "zfs tear" actually means. But that's kind of an "academic interest" > now. > Most likely your other processes are pre-empting your build, which is what you want :). Use /usr/bin/top to see the priority of the processes (ie under the PRI column). Using an idprio 22, means (on my 12.2Stable) a PRI of 146. If your kern.sched.preempt_thresh is using the default (of 80) then processes with a PRI of <80 will preempt (for IO). Even with an idprio 0, the PRI is 124. So I suspect that was more a matter of timing (ie good luck). You could increase your pre-emption threshold for the duration of the build, to include your nice value. But... (not really a good idea). Better if you run your build using nice (PRI of 76) which should avoid the stalls, but should also influence your more important services. Re zfs - sorry, I'm peculiar and don't use it ;)