From owner-freebsd-current@freebsd.org Sun Mar 28 15:44:35 2021 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id E49C357CFB0 for ; Sun, 28 Mar 2021 15:44:35 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from relay6-d.mail.gandi.net (relay6-d.mail.gandi.net [217.70.183.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4F7g4g3M5cz4mLN; Sun, 28 Mar 2021 15:44:35 +0000 (UTC) (envelope-from avg@FreeBSD.org) X-Originating-IP: 195.64.148.76 Received: from [192.168.0.88] (unknown [195.64.148.76]) (Authenticated sender: andriy.gapon@uabsd.com) by relay6-d.mail.gandi.net (Postfix) with ESMTPSA id C41DDC0002; Sun, 28 Mar 2021 15:44:32 +0000 (UTC) Subject: Re: Strange behavior after running under high load To: Stefan Esser , FreeBSD CURRENT References: <58bea0f0-5c3d-4263-ebee-f939a7e169e9@freebsd.org> From: Andriy Gapon Message-ID: Date: Sun, 28 Mar 2021 18:44:31 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:78.0) Gecko/20100101 Firefox/78.0 Thunderbird/78.9.0 MIME-Version: 1.0 In-Reply-To: <58bea0f0-5c3d-4263-ebee-f939a7e169e9@freebsd.org> Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4F7g4g3M5cz4mLN X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Mar 2021 15:44:35 -0000 On 28/03/2021 17:39, Stefan Esser wrote: > After a period of high load, my now idle system needs 4 to 10 seconds to > run any trivial command - even after 20 minutes of no load ... > > > I have run some Monte-Carlo simulations for a few hours, with initially 35 > processes running in parallel for some 10 seconds each. I saw somewhat similar symptoms with 13-CURRENT some time ago. To me it looked like even small kernel memory allocations took a very long time. But it was hard to properly diagnose that as my favorite tool, dtrace, was also affected by the same problem. > The load decreased over time since some parameter sets were faster to process. > All in all 63000 processes ran within some 3 hours. > > When the system became idle, interactive performance was very bad. Running > any trivial command (e.g. uptime) takes some 5 to 10 seconds. Since I have > to have this system working, I plan to reboot it later today, but will keep > it in this state for some more time to see whether this state persists or > whether the system recovers from it. > > Any ideas what might cause such a system state??? > > > The system has a Ryzen 5 3600 CPU (6 core/12 threads) and 32 GB or RAM. > > The following are a few commands that I have tried on this now practically > idle system: > > $ time vmstat -n 1 >   procs    memory    page                      disks faults       cpu >   r  b  w  avm  fre  flt  re  pi  po   fr   sr nv0   in   sy   cs us sy id >   2  0  0  26G 922M 1.2K   1   4   0 1.4K  239   0  482 7.2K  934 11  1 88 > > real    0m9,357s > user    0m0,001s > sys    0m0,018 > > ---- wait 1 minute ---- > > $ time vmstat -n 1 >   procs    memory    page                      disks faults       cpu >   r  b  w  avm  fre  flt  re  pi  po   fr   sr nv0   in   sy   cs us sy id >   1  0  0  26G 925M 1.2K   1   4   0 1.4K  239   0  482 7.2K  933 11  1 88 > > real    0m9,821s > user    0m0,003s > sys    0m0,389s > > $ systat -vm > >      4 users    Load  0.10  0.72  3.57                  Mar 28 16:15 >     Mem usage:  97%Phy 55%Kmem                           VN PAGER   SWAP PAGER > Mem:      REAL           VIRTUAL                         in   out     in  out >         Tot   Share     Tot    Share     Free   count > Act  2387M    460K  26481M     460K     923M   pages > All  2605M    218M  27105M     572M                        ioflt  Interrupts > Proc:                                                      cow     132 total >    r   p   d    s   w   Csw  Trp  Sys  Int  Sof  Flt    52 zfod     96 hpet0:t0 >               316       356   39  225  132   21   53       ozfod nvme0:admi >                                                           %ozfod nvme0:io0 >   0.1%Sys   0.0%Intr  0.0%User  0.0%Nice 99.9%Idle         daefr nvme0:io1 > |    |    |    |    |    |    |    |    |    |    |        prcfr nvme0:io2 >                                                            totfr nvme0:io3 >                                             dtbuf          react nvme0:io4 > Namei      Name-cache   Dir-cache    620370 maxvn          pdwak nvme0:io5 >     Calls    hits   %    hits   %    627486 numvn      168 pdpgs    27 xhci0 66 >        18      14  78                    65 frevn          intrn ahci0 67 >                                                     17539M wire xhci1 68 > Disks  nvd0  ada0  ada1  ada2  ada3  ada4   cd0       430M act       9 re0 69 > KB/t   0.00  0.00  0.00  0.00  0.00  0.00  0.00     12696M inact hdac0 76 > tps       0     0     0     0     0     0     0     54276K laund vgapci0 78 > MB/s   0.00  0.00  0.00  0.00  0.00  0.00  0.00       923M free > %busy     0     0     0     0     0     0     0          0 buf > > ---- 5 minutes later ---- > > $ time vmstat -n 1 >  procs    memory    page                      disks faults       cpu >  r  b  w  avm  fre  flt  re  pi  po   fr   sr nv0   in   sy   cs us sy id >  1  0  0  26G 922M 1.2K   1   4   0 1.4K  239   0  481 7.2K  931 11  1 88 > > real    0m4,270s > user    0m0,000s > sys    0m0,019s > > $ time uptime > 16:20  up 23:23, 4 users, load averages: 0,17 0,39 2,68 > > real    0m10,840s > user    0m0,001s > sys    0m0,374s > > $ time uptime > 16:37  up 23:40, 4 users, load averages: 0,29 0,27 0,96 > > real    0m9,273s > user    0m0,000s > sys    0m0,020s > -- Andriy Gapon