Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Feb 2024 19:25:52 +0000
From:      bugzilla-noreply@freebsd.org
To:        fs@FreeBSD.org
Subject:   [Bug 275594] High CPU usage by arc_prune; analysis and fix
Message-ID:  <bug-275594-3630-zkQq1DdwjP@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-275594-3630@https.bugs.freebsd.org/bugzilla/>
References:  <bug-275594-3630@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D275594

--- Comment #67 from Peter Much <pmc@citylink.dinoex.sub.org> ---
So, now I read all the material here. Great work!

I had upgraded my deploy engine from 13.2-RELEASE to 13.3-BETA, and found
(among some spurious messages from git) that it can no longer build gcc12.

There is apparently no problem with rust or llvm15, but trying to build gcc=
12
does reproducibly crash (10 core, 16081M ram). Apparently the crash happens
when gcc fully powers up its LTO for the first time:

last pid: 37369;  load averages:  9.35,  9.93,  9.27    up 0+03:15:25  07:2=
1:42
417 threads:   14 running, 379 sleeping, 24 waiting
CPU: 55.4% user,  0.0% nice, 35.6% system,  0.1% interrupt,  8.8% idle
Mem: 7047M Active, 6121M Inact, 2392M Wired, 984M Buf, 60M Free
ARC: 518M Total, 45M MFU, 451M MRU, 128K Anon, 3990K Header, 17M Other
     467M Compressed, 997M Uncompressed, 2.14:1 Ratio
Swap: 15G Total, 15G Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    0 root         -8    -     0B  2432K CPU4     4   3:14  99.79% kernel{a=
rc_p
    7 root        -16    -     0B    48K CPU6     6   2:45  99.79% pagedaem=
on{d
   15 root         52    -     0B    16K CPU0     0   3:00  99.70% vnlru
37334 root         52    0   891M   789M pfault   1   0:37  89.24% lto1
37270 root         52    0  1017M   915M pfault   3   0:43  88.63% lto1
37324 root         52    0   831M   770M pfault   8   0:39  88.59% lto1
37338 root         52    0   843M   785M pfault   2   0:36  88.50% lto1
37333 root         52    0   889M   788M pfault   7   0:37  82.76% lto1
37269 root         52    0  1001M   882M pfault   5   0:42  82.09% lto1
37274 root         52    0  1004M   885M pfault   9   0:42  80.24% lto1
    5 root         20    -     0B  1568K t->zth   9   0:02   1.02% zfskern{=
arc_
37360 root         20    0    14M  4940K CPU9     9   0:00   0.87% top

This is the last output, at this point the system becomes unresponsive, and,
when allowed neither to oom-kill nor panic, continues to consume 300% compu=
te.
Apparently these are the visible three apocalyptic riders (arc_prune,
pagedaemon, vnlru) entertaining themselves. :/

Implementing the patch (i.e. five new git commits from the github repo) sol=
ves
the issue, and afterwards it looks like this:

last pid: 11944;  load averages:  7.13,  5.29,  5.77    up 0+03:48:45  16:1=
2:46
424 threads:   19 running, 381 sleeping, 24 waiting
CPU: 67.9% user,  0.0% nice,  5.1% system,  0.0% interrupt, 27.0% idle
Mem: 9308M Active, 2285M Inact, 20M Laundry, 3643M Wired, 865M Buf, 336M Fr=
ee
eRC: 1638M Total, 855M MFU, 575M MRU, 128K Anon, 11M Header, 198M Other
     1305M Compressed, 2980M Uncompressed, 2.28:1 Ratio
Swap: 15G Total, 15G Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
11579 root        103    0  1269M  1066M CPU6     6   4:09 100.00% lto1
11605 root        103    0  1263M  1052M CPU3     3   4:08  99.87% lto1
11589 root        103    0  1295M  1091M CPU8     8   4:09  99.87% lto1
11599 root        103    0  1259M  1027M CPU9     9   4:08  99.87% lto1
11588 root        103    0  1263M  1035M CPU7     7   4:09  99.87% lto1
11590 root        103    0  1287M  1058M CPU5     5   4:08  99.87% lto1
11598 root        103    0  1311M  1082M CPU1     1   4:08  99.74% lto1
    0 root         -8    -     0B  2448K -        6   0:03   6.83% kernel{a=
rc_p
    5 root         -8    -     0B  1568K RUN      9   0:03   5.80% zfskern{=
arc_
    7 root        -16    -     0B    48K psleep   2   0:37   3.11% pagedaem=
on{d

I'm a bit worried the thing is still reluctant to page out, but otherwise t=
his
looks good.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-275594-3630-zkQq1DdwjP>