From owner-freebsd-hackers@freebsd.org  Thu Jun  4 12:37:27 2020
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 97EAE2F117D
 for <freebsd-hackers@mailman.nyi.freebsd.org>;
 Thu,  4 Jun 2020 12:37:27 +0000 (UTC)
 (envelope-from gbergling@gmail.com)
Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com
 [IPv6:2a00:1450:4864:20::342])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 49d4zq0Wskz4fHR;
 Thu,  4 Jun 2020 12:37:26 +0000 (UTC)
 (envelope-from gbergling@gmail.com)
Received: by mail-wm1-x342.google.com with SMTP id d128so5450193wmc.1;
 Thu, 04 Jun 2020 05:37:26 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:sender:date:from:to:cc:subject:message-id
 :references:mime-version:content-disposition:in-reply-to;
 bh=9xhkSKeQeaKQdcoIaFsUGLX+mVZlAXIkYzqhvbKpmDg=;
 b=oGwoqPsvtTuvLPm5u8J+LI/cTaGmhc3cP7dzpkJ5ETWk6HxkYCfSuLeAK5/pu1WAzP
 GuzJWdBB/uUUkAkPgC7FfXSftQmI9E/buxhYqEOwABwvDnBUZGimoLi20t2lxB9nT9bm
 /RUmYHAZd16bHboyjlD5apPJ2YyVRnb6QwpMk6MkH/pxtfJNg0/nVZaAXZZeiV6GUH1+
 kStxH+xwGPj0meQW6l1sBFULfAfF7CMmuePDMdyzmbDdXkGgi1jy4UTZ9dIB/QWwpDF7
 Ur6gOxTAM8a3v3jZj8tMhIktNv4NsdtB2CVNRmu7G0penbGvRqJ35JJIJK03XZv8RHI4
 echw==
X-Gm-Message-State: AOAM5322KwlaHSptpa4WKgDyS/nmUG92bJOY05vwmWFwHVnvhh4kqeV+
 4WhwxHch0jXItXOWBDFnkZ0nSPcc
X-Google-Smtp-Source: ABdhPJxR+Pywy8pgutK2ZVcdmBn/23ycYUeWFctUynSy4N4gVkQOMWaNKK5WhgsK8qcNPO8rxxT2jA==
X-Received: by 2002:a1c:22c1:: with SMTP id i184mr3717653wmi.187.1591274243646; 
 Thu, 04 Jun 2020 05:37:23 -0700 (PDT)
Received: from lion.0xfce3.net (p4fd3af72.dip0.t-ipconnect.de.
 [79.211.175.114])
 by smtp.gmail.com with ESMTPSA id c6sm7891296wro.92.2020.06.04.05.37.22
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Thu, 04 Jun 2020 05:37:22 -0700 (PDT)
Sender: Gordon Bergling <gbergling@gmail.com>
Date: Thu, 4 Jun 2020 14:37:20 +0200
From: Gordon Bergling <gbergling@googlemail.com>
To: Allan Jude <allanjude@freebsd.org>
Cc: freebsd-hackers@freebsd.org
Subject: Re: Constant load of 1 on a recent 12-STABLE
Message-ID: <20200604123720.GA63595@lion.0xfce3.net>
References: <20200603101607.GA80381@lion.0xfce3.net>
 <c18664e8-b4e3-1402-48ed-3a02dc36ce29@freebsd.org>
 <20200603202929.GA65032@lion.0xfce3.net>
 <8b1498ea-e343-506e-79c7-c25b594808f0@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <8b1498ea-e343-506e-79c7-c25b594808f0@freebsd.org>
X-Url: <https://www.gordons-perspective.com/>
X-Operating-System: FreeBSD 12.1-STABLE amd64
X-Host-Uptime: 2:17PM  up 1 day, 7 hrs, 4 users, load averages: 4.72, 4.87,
 4.74
X-Rspamd-Queue-Id: 49d4zq0Wskz4fHR
X-Spamd-Bar: ----
X-Spamd-Result: default: False [-4.00 / 15.00];
 ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US];
 REPLY(-4.00)[]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.33
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Jun 2020 12:37:27 -0000

Hi Allan,

On Wed, Jun 03, 2020 at 05:33:37PM -0400, Allan Jude wrote:
> On 2020-06-03 16:29, Gordon Bergling wrote:
> > Hi Allan,
> > 
> > On Wed, Jun 03, 2020 at 03:13:47PM -0400, Allan Jude wrote:
> >> On 2020-06-03 06:16, Gordon Bergling via freebsd-hackers wrote:
> >>> since a while I am seeing a constant load of 1.00 on 12-STABLE,
> >>> but all CPUs are shown as 100% idle in top.
> >>>
> >>> Has anyone an idea what could caused this?
> >>>
> >>> The load seems to be somewhat real, since the buildtimes on this
> >>> machine for -CURRENT increased from about 2 hours to 3 hours.
> >>>
> >>> This a virtualized system running on Hyper-V, if that matters.
> >>>
> >>> Any hints are more then appreciated.
> >>>
> >> Try running 'top -SP' and see if that shows a specific CPU being busy,
> >> or a specific process using CPU time
> > 
> > Below is the output of 'top -SP'. The only relevant process / thread that is
> > relatively constant consumes CPU time seams to be 'zfskern'.
> > 
> > -----------------------------------------------------------------------------
> > last pid: 68549;  load averages:  1.10,  1.19,  1.16 up 0+14:59:45  22:17:24
> > 67 processes:  2 running, 64 sleeping, 1 waiting
> > CPU 0:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> > CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> > CPU 2:  0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% idle
> > CPU 3:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> > Mem: 108M Active, 4160M Inact, 33M Laundry, 3196M Wired, 444M Free
> > ARC: 1858M Total, 855M MFU, 138M MRU, 96K Anon, 24M Header, 840M Other
> >      461M Compressed, 1039M Uncompressed, 2.25:1 Ratio
> > Swap: 2048M Total, 2048M Free
> > 
> >   PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
> >    11 root          4 155 ki31     0B    64K RUN      0  47.3H 386.10% idle
> >     8 root         65  -8    -     0B  1040K t->zth   0 115:39  12.61% zfskern
> > -------------------------------------------------------------------------------
> > 
> > The only key performance indicator that is relatively high IMHO, for a 
> > non-busy system, are the context switches, that vmstat has reported.
> > 
> > -------------------------------------------------------------------------------
> > procs  memory       page                    disks     faults         cpu
> > r b w  avm   fre   flt  re  pi  po    fr   sr da0 da1   in    sy    cs us sy id
> > 0 0 0 514G  444M  7877   2   7   0  9595  171   0   0    0  4347 43322 17  2 81
> > 0 0 0 514G  444M     1   0   0   0     0   44   0   0    0   121 40876  0  0 100
> > 0 0 0 514G  444M     0   0   0   0     0   40   0   0    0   133 42520  0  0 100
> > 0 0 0 514G  444M     0   0   0   0     0   40   0   0    0   120 43830  0  0 100
> > 0 0 0 514G  444M     0   0   0   0     0   40   0   0    0   132 42917  0  0 100
> > --------------------------------------------------------------------------------
> > 
> > Any other ideas what could generate that load?
> 
> I agree that load average looks out of place here when you look at the %
> cpu idle, but I wonder if it is caused by a lot of short lived processes
> or threads.
> 
> How quickly is the 'last pid' number going up?
> 
> You might also look at `zpool iostat 1` or `gstat -p` to see how busy
> your disks are

In the IDLE state the last pid isn't changing within at least 60 seconds.
During 'buildworld' times it is off course much shorter, but a "-j 4" is
resulting in a load average from about 5.0, so that the underlying problem
still persists. 'zpool iostat 1' and 'gstat -p' doesn't show anything 
suspicious.

I had a private mail that made me aware of PR173541, where this problem is
documentated. I'll add my hardware information and performance measurements 
to it when I find some time.

I am currently thinking about how to measure the spawned threads/s. Did you
have an idea how to do it?

Best regards,

Gordon
-- 
Gordon Bergling
Mobile: +49 170 23 10 948
Web: https://www.gordons-perspective.com/
Mail: gbergling@gmail.com

Think before you print!