Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Jun 2024 02:23:53 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 279742] 14.1-RELEASE hangs compiling pspp requiring reboot
Message-ID:  <bug-279742-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D279742

            Bug ID: 279742
           Summary: 14.1-RELEASE hangs compiling pspp requiring reboot
           Product: Base System
           Version: 14.0-STABLE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: dgilbert@eicat.ca

Created attachment 251458
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D251458&action=
=3Dedit
core.txt of crash.

I clicked on 14.0-STABLE because 14.1-RELEASE was not yet a choice.

I upgraded my poudriere box to 14.1, created a new jail for 14.1, and launc=
hed
into a "-a" build pretty much immediately after returning from BSDCan.  The
build machine is a Threadripper 1900X with 128G of RAM and 140TB of disk in
RAID-Z2.  It has stably built poudriere almost constantly since I upgraded =
it
to it's current state --- about 3 years or so.

After the first poudriere hang, I instrumented things like temperatures.  N=
one
of these spiked, but the hang happened again and again.  After awhile, it w=
as
clear that pspp compiling was the trigger.  Note that pspp would have compi=
led
under 14.0 less than a week before (ie: just before BSDCan).

I had to get debugging in to my kernel and learn how to cause it to debug.=
=20
That took a couple tries --- all-the-while repeatedly crashing while pspp w=
as
building.  Top was up on the window I keep open ... and this was the last t=
op
on display.

last pid: 31372;  load averages: 21.72, 32.46, 41.6670=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
                        up 0+04:34:59  20:36:48
220 processes: 12 running, 192 sleeping, 2 zombie, 14 waiting
CPU: 21.7% user,  0.0% nice, 40.4% system,  0.0% interrupt, 37.8% idle
Mem: 32M Active, 264K Inact, 124G Wired, 604M Free
ARC: 16G Total, 230M MFU, 334M MRU, 22M Anon, 15G Header, 191M Other
     107M Compressed, 460M Uncompressed, 4.28:1 Ratio
Swap: 256G Total, 98G Used, 158G Free, 38% Inuse, 2868K In, 3612K Out

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMM=
AND
61759 root          2 166  i10    60M  2616K vofflo   3  40:43  75.20%
pspp-output
 6367 root          2 166  i10    88M  2604K vofflo  15  36:06  73.63%
pspp-output
15409 root          2 166  i10    92M  2608K vofflo   2  33:53  72.64%
pspp-output
81893 root          2 166  i10    86M  2600K CPU12   12  34:05  72.04%
pspp-output
78622 root          2 166  i10    57M  2588K CPU11   11  28:42  69.19%
pspp-output
25531 root          2 166  i10    95M  2616K CPU5     5  27:00  68.84%
pspp-output
81789 root          2 166  i10    42M  2584K CPU6     6  23:16  65.11%
pspp-output
87988 root          2 166  i10   102M  2596K CPU7     7  20:57  64.28%
pspp-output
11364 root          2 166  i10    57M  2612K CPU10   10  19:50  64.14%
pspp-output
23538 root          2 166  i10    66M  2604K CPU11   11  21:09  63.94%
pspp-output
61379 root          2 166  i10    93M  2624K tmpfs    4  21:10  63.46%
pspp-output
85836 root          2 166  i10    74M  2608K CPU14   14  19:19  62.69%
pspp-output
58400 root          2 166  i10    76M  2440K RUN      5  13:26  56.27%
pspp-output
58294 root          2 166  i10    72M  2444K CPU1     1  14:44  56.15%
pspp-output
70050 root          2 166  i10    48M  2440K RUN      1  12:46  56.10%
pspp-output
 2561 root          1  20    0   303M  1728K select  12   1:09   0.40% smbd
 2502 postgres      1  20    0   173M  1012K select   4   0:13   0.16% post=
gres
65067 root          1  20    0    17M  1452K CPU9     9   0:21   0.14% top
 2577 root          1  20    0    17M  1216K select   9   0:40   0.13% tmux
72517 root          6 166  i10  2310M   452K uwait    1   9:40   0.07%
ghc-9.6.4
 8903 root         45 166  i10    34G  4716K uwait    0  12:30   0.06% java
 2503 postgres      1  20    0    31M   684K select   7   0:08   0.05% post=
gres
37351 root          1  20    0    22M   328K select   9   0:03   0.05% sshd
 2190 root          1  20    0    14M   172K select   6   0:00   0.03% sysl=
ogd
72294 root         11 166  i10   345M  1664K kqread   4   0:02   0.01% node
 2294 root          1  20    0   280M   228K select  11   0:00   0.01% httpd
 1192 root          1  20    0    18M   340K select   0   0:27   0.01% moun=
td
 1162 ntpd          1  20    0    23M   520K select  12   0:01   0.01% ntpd
95259 root          1  20    0    12M   328K ttyin    4   0:03   0.01% cu
 1749 uwsgi         1  20    0    57M   412K kqread  12   0:01   0.00%
uwsgi-3.8
36420 root          1  20    0    19M   544K select   5   0:01   0.00% mini=
com
 1307 root          1  20    0   164M   460K kqread   8   0:00   0.00% php-=
fpm
 1253 root        128  68    0    12M  2316K rpcsvc  11   0:13   0.00% nfsd
91926 root          2 166  i10    74M  2908K pfault  15 123:07   0.00%
pspp-output
72530 root         11 166  i10  7498M   836K pfault   5  99:32   0.00% node
46100 root         18 166  i10   261G   932K uwait    4  18:33   0.00% dotn=
et
73028 root          1 166  i10   165M  4096B WAIT    11   3:56   0.00%
<pkg-static>
 2955 root          1 166  i10    15M  4096B wait    13   3:24   0.00% <sh>
93083 root          1 166  i10   195M  4096B WAIT    13   3:14   0.00%
<pkg-static>
22537 root          6 166  i10   298M    17M uwait   10   1:22   0.00% ld.l=
ld
 2588 root          1 166  i10    22M   224K select   6   1:02   0.00% sh
24257 root          1 166  i10   145M  4096B WAIT    12   0:42   0.00%
<pkg-static>
 1301 www           1  20    0    27M  4096B WAIT     5   0:32   0.00% <ngi=
nx>
90301 root         14 166  i10   261G   260K uwait    4   0:24   0.00% dotn=
et

It's worth noting here that the virtual terminal switch (alt - F<n>) works
after this happens, but no other input is recognized (can't hit return in a
window and shells going through the machine to others don't continue their
output).

When it happened this time, I dropped to KDB and dumped.  core.txt attached.

NOTE: this is repeatable.  I have been through the cycle 6 times so far.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-279742-227>