Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 Apr 2021 06:26:24 +1000
From:      Dewayne Geraghty <dewayne@heuristicsystems.com.au>
To:        freebsd-stable@freebsd.org
Subject:   Re: Frequent disk I/O stalls while building (poudriere), processes in "zfs tear" state
Message-ID:  <c40d1470-4147-557b-3272-ecb79cd0cf1e@heuristicsystems.com.au>
In-Reply-To: <20210415162940.hoattch77lmoulih@nexus.home.palmen-it.de>
References:  <20210412094411.j3s7us5ru2d7dzcz@nexus.home.palmen-it.de> <20210415162940.hoattch77lmoulih@nexus.home.palmen-it.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On 16/04/2021 2:29 am, Felix Palmen wrote:
> After more experimentation, I finally found what's causing these
> problems for me on 13:
> 
> * Felix Palmen <felix@palmen-it.de> [20210412 11:44]:
>> * Poudriere running on idprio 22 with 8 parallel build jobs
> 
> Running poudriere with normal priority works perfectly fine. Now, I've
> had poudriere running on idprio because there are several other services
> on that machine that shouldn't be slowed down by running a heavy build
> and I still want to use all the CPU resources available for building.
> 
> Right now, I'm running a test with idprio 0 instead, which still seems
> to have the desired effect, and so far, I didn't have any of these
> stalls. If this persists, the problem is solved for me!
> 
> I'd still be curious about what might be the cause, and, what this state
> "zfs tear" actually means. But that's kind of an "academic interest"
> now.
> 

Most likely your other processes are pre-empting your build, which is
what you want :).

Use /usr/bin/top to see the priority of the processes (ie under the  PRI
column).  Using an idprio 22, means (on my 12.2Stable) a PRI of 146.  If
your kern.sched.preempt_thresh is using the default (of 80) then
processes with a PRI of <80 will preempt (for IO).

Even with an idprio 0, the PRI is 124. So I suspect that was more a
matter of timing (ie good luck).

You could increase your pre-emption threshold for the duration of the
build, to include your nice value. But... (not really a good idea).
Better if you run your build using nice (PRI of 76) which should avoid
the stalls, but should also influence your more important services.

Re zfs - sorry, I'm peculiar and don't use it ;)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?c40d1470-4147-557b-3272-ecb79cd0cf1e>