Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 08 May 2018 00:52:47 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 228056] powerpc64: MCE on POWER9 machine (AC922)
Message-ID:  <bug-228056-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D228056

            Bug ID: 228056
           Summary: powerpc64: MCE on POWER9 machine (AC922)
           Product: Base System
           Version: CURRENT
          Hardware: powerpc
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: breno.leitao@gmail.com

I am creating this bug to track my progress on investigating the bootstrap =
of
FreeBSD on a AC922 (POWER9) machine.

When I boot HEAD, I found the following MCE:

 KDB: debugger backends: ddb
 KDB: current backend: ddb
 Copyright (c) 1992-2018 The FreeBSD Project.
 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
 FreeBSD is a registered trademark of The FreeBSD Foundation.
 FreeBSD 12.0-CURRENT #152 66f063557f2(master)-dirty: Tue May  8 01:17:52 C=
ET=20
2018
    root@free8:/usr/obj/root/kernel/freebsd/powerpc.powerpc64/sys/BRENO pow=
erpc
 gcc version 4.2.1 20070831 patched [FreeBSD]
 WARNING: WITNESS option enabled, expect reduced performance.
 WARNING: DIAGNOSTIC option enabled, expect reduced performance.
 Entering uma_startup with 44 boot pages configured
 startup_alloc from "UMA Kegs", 41 boot pages left
 startup_alloc from "UMA Zones", 40 boot pages left
 startup_alloc from "UMA Zones", 38 boot pages left
 startup_alloc from "UMA Zones", 36 boot pages left
 start at c000000001e30100
 KERNEL BASE at 100100
 sum is  c000000001d30000

 fatal kernel trap:

   exception       =3D 0x200 (machine check)
   srr0            =3D 0xc00000000255d284 (0x82d284)
   srr1            =3D 0x9000000000201032
   current msr     =3D 0x9000000000000032
   lr              =3D 0xc00000000255d278 (0x82d278)
   curthread       =3D 0xc000000002e2bbc0
          pid =3D 0, comm =3D=20

 [ thread pid 0 tid 0 ]
 Stopped at      0xc00000000255d284


 Digging further, this is where it is breaking:

     82d264:       7f c3 f3 78     mr      r3,r30
     82d268:       7e e4 bb 78     mr      r4,r23
     82d26c:       7f 65 db 78     mr      r5,r27
     82d270:       7f 86 e3 78     mr      r6,r28
     82d274:       4b ff f8 49     bl      82cabc <.keg_alloc_slab>=20=20=
=20=20=20=20=20=20=20=20=20=20=20
     82d278:       7c 7d 1b 79     mr.     r29,r3
     82d27c:       41 a2 00 94     beq+    82d310 <.keg_fetch_slab+0x2cc>
     82d280:       7f bc eb 78     mr      r28,r29
->>  82d284:       e8 1d 00 00     ld      r0,0(r29)
     82d288:       7f a0 f0 00     cmpd    cr7,r0,r30=20=20=20=20=20


At this place, r29 contains:

  db> print $r29
  c00003fffffddf90


Looking at that code, I think we are here:

               slab =3D keg_alloc_slab(keg, zone, domain, allocflags);
                /*
                 * If we got a slab here it's safe to mark it partially used
                 * and return.  We assume that the caller is going to remov=
e=20=20=20=20
                 * at least one item.
                 */
                if (slab) {
       ->>               MPASS(slab->us_keg =3D=3D keg);

where 'slab' is at r29 and 'us_keg' should be the very first (0) field. Keg
should be r30:

  > print $r30=20=20
  c00003fffffd7000

The problem seem to be when the code is dereferencing slab(r29), which seem=
s to
be causing the MCE.

This is the content of the value r30:

  db> x $r30=20=20=20=20=20=20
  0xc00003fffffd7000:     c0000000
  db>=20
  0xc00003fffffd7004:     2af8bb8

But I am not able to dereference d29:

  db> x $r29=20
  0xc00003fffffddf90: (machine halts)

I am wondering why accessing this page is causing this problem.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-228056-227>