Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Jan 2022 16:15:23 +0000
From:      bugzilla-noreply@freebsd.org
To:        threads@FreeBSD.org
Subject:   [Bug 261338] [PATCH] kernel panic "bad pte" on heavy CPU load on 12.2 and 12.3 (i386)
Message-ID:  <bug-261338-13406@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D261338

            Bug ID: 261338
           Summary: [PATCH] kernel panic "bad pte" on heavy CPU load on
                    12.2 and 12.3 (i386)
           Product: Base System
           Version: 12.3-RELEASE
          Hardware: i386
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: threads
          Assignee: threads@FreeBSD.org
          Reporter: thedix@yandex.ru

Created attachment 231160
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D231160&action=
=3Dedit
Panic screenshot

After updating to 12.2p12 and 12.3p1 I noticed kernel panic under heavy
multi-core CPU load.
As an example of heavy load is building kernel in multi-threaded mode.

Affected systems:
- 12.2p12 i386
- 12.3p1 i386

12.X amd64 is not affected, 13.0 is not affected at all.

Tested hardware:
- Virtual machine 8 vCPU 4 GB vRAM under VMWare ESXi 6.7
- HP MicroServer Gen8 Intel Xeon E3-1265Lv2 16 GB RAM
- PC Intel Core i5-4690 16 GB RAM

Steps to reproduce:
# cd /usr/src
# make -s -j`sysctl -n hw.ncpu` KERNCONF=3DGENERIC buildkernel

And after some time the system hangs with panic like:
TPTE at 0x2857f14  IS ZERO @ VA 247c5000
panic: bad pte
cpuid =3D 7
time =3D 1642334372
KDB: stack backtrace:
#0 0x10438ee at kdb_backtrace+0x4e
#1 0xffdb68 at vpanic+0x118
#2 0xffda44 at panic+0x14
#3 0x155b6d5 at pmap_remove_pages+0x5a5
#4 0x12fceb4 at vmspace_exit+0x94
#5 0xfbe0f3 at exit1+0x593
#6 0xfbdb52 at sys_sys_exit+0x12
#7 0x1561b79 at syscall+0x3e9
#8 0xffc033e7 at PTDpde+0x43ef

Additional stack info:
#0  0x00ffd9f6 in doadump () at /usr/src/sys/kern/kern_shutdown.c:370
370             savectx(&dumppcb);
(kgdb) #0  0x00ffd9f6 in doadump () at /usr/src/sys/kern/kern_shutdown.c:370
#1  0x00ffd831 in kern_reboot (howto=3D260)
    at /usr/src/sys/kern/kern_shutdown.c:452
#2  0x00ffdbbf in vpanic (fmt=3D0x15d448a "bad pte", ap=3D0x1ff80a10 "")
    at /usr/src/sys/kern/kern_shutdown.c:881
#3  0x00ffda44 in panic (fmt=3D0x15d448a "bad pte")
    at /usr/src/sys/kern/kern_shutdown.c:808
#4  0x0155b6d5 in pmap_remove_pages (pmap=3D0x22a0354c)
    at /usr/src/sys/i386/i386/pmap.c:4845
#5  0x012fceb4 in vmspace_exit (td=3D0x1bb57380) at /usr/src/sys/vm/vm_map.=
c:411
#6  0x00fbe0f3 in exit1 (td=3D0x1bb57380, rval=3D0, signo=3D0)
    at /usr/src/sys/kern/kern_exit.c:399
#7  0x00fbdb52 in sys_sys_exit (td=3D0x1bb57380, uap=3D0x1bb57604)
    at /usr/src/sys/kern/kern_exit.c:176
#8  0x01561b79 in syscall (frame=3D0x1ff80ba8)
    at src/sys/i386/i386/../../kern/subr_syscall.c:144
#9  0xffc033e7 in ?? ()
#10 0x00000033 in ?? ()

I made some research on the kernel code and found the problem appeared in t=
he
recent changes of SMP processing in mp_x86.c:
https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f27ce5a5fdd561=
a16ac54fdd8#diff-b34ee41e14f87fb2b18fdf77337237f336830ae88aac2a02e1c32aa45e=
43b4de
https://reviews.freebsd.org/D33413

The problem is in the function smp_targeted_tlb_shootdown():
-       sched_pin();
+       KASSERT(curthread->td_pinned > 0, ("curthread not pinned"));
Under some circumstances the function is not pinned, which later causes PTE
panic.
I recompiled GENERIC kernel with INVARIANTS options and added the function =
name
to the assertion text for additional info and got an immediate panic during=
 the
boot (see attached image panic_not_pinned.png).

So the fix is to revert this line back:
-       KASSERT(curthread->td_pinned > 0, ("curthread not pinned"));
+       sched_pin();

I attached the patch mp_x86.c.patch to fix the problem.
After recompiling the kernel with this patch, I no longer see panics on both
12.2 and 12.3 when recompiling the kernel further.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-261338-13406>