Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 11 Apr 2023 14:28:35 +0000
From:      Lee MATTHEWS <Lee.MATTHEWS.external@stormshield.eu>
To:        Eugene Grosbein <eugen@grosbein.net>, "freebsd-hackers@FreeBSD.org" <freebsd-hackers@FreeBSD.org>
Subject:   Re: BINIT and BERR signals in MCA
Message-ID:  <fed099e1e6c2448e96883ebb6496347d@stormshield.eu>
In-Reply-To: <24a51bf0-71de-f596-ef8b-785da4a27fd7@grosbein.net>
References:  <4bd3e1017a104598ab92e658f25b5367@stormshield.eu>,<24a51bf0-71de-f596-ef8b-785da4a27fd7@grosbein.net>

next in thread | previous in thread | raw e-mail | index | archive | help
--_000_fed099e1e6c2448e96883ebb6496347dstormshieldeu_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

Thanks for getting back to me Eugene.


On the two cores that I've received, they seem to die at the same point :


#4  0xffffffff8049a9e3 in panic (fmt=3D<unavailable>) at ../../../kern/kern=
_shutdown.c:714
#5  0xffffffff80780a2b in mca_intr () at ../../../x86/x86/mca.c:1193
#6  <signal handler called>
#7  smp_rendezvous_action () at ../../../kern/subr_smp.c:417
#8  0xffffffff804e5f79 in smp_rendezvous_cpus (map=3D...,
    setup_func=3D0xffffffff804e5e40 <smp_no_rendezvous_barrier>,
    action_func=3D0xffffffff80496730 <rm_cleanIPI>,
    teardown_func=3D0xffffffff804e5e40 <smp_no_rendezvous_barrier>, arg=3D0=
xffffffff80cb5048 <g_conf_lock>)
    at ../../../kern/subr_smp.c:554
#9  0xffffffff80496639 in _rm_wlock (rm=3D0xffffffff80cb5048 <g_conf_lock>)
    at ../../../kern/kern_rmlock.c:551


Do you think the temperature could still be an issue? If it were temperatur=
e related, could one not expect the MCA interrupt to occur during other fun=
ction calls?


I've asked for a log of the CPU temperatures, I'll write back when I get th=
em.


Lee

________________________________
From: Eugene Grosbein <eugen@grosbein.net>
Sent: 11 April 2023 13:59:08
To: Lee MATTHEWS; freebsd-hackers@FreeBSD.org
Subject: Re: BINIT and BERR signals in MCA

11.04.2023 18:45, Lee MATTHEWS wrote:

> Hello,
>
> One of our clients is experiencing problems using one of our products. It=
 runs FreeBSD 11.3 on an Intel Atom Apollo Lake E3930 two core SoC processo=
r.
>
> Occasionally, under very light load, the kernel will panic. I've managed =
to get a couple of vmcores and I notice via the backtrace that the MCA inte=
rrupt is called.
>
> I've managed to recover two vmcores and I notice in both of them that the=
 Inter-Processor Interrupts are not being transferred from one CPU to the o=
ther. I've also noticed that the structure mca_internal contains informatio=
n concerning the state of the MCA status register (value : 0x90000000200000=
03) for bank 0.
>
>>>From Intel's software architecture document, the MCA Error Code is 0x0003=
 "The BINIT# from another processor caused this processor to enter machine =
check." and the Model Specific Error Code is 0x2000 "1 if BERR is driven."
>
> The Intel document is not clear; could anyone please explain what the BIN=
IT and BERR signals mean? They appear to be related to a bus, but I'm not s=
ure which one. A bus external to the Atom SoC or one of the internal buses =
within the Atom SoC?
>
> Do you have any ideas of what could generate this type of error? Is it li=
kely a hardware or a software issue?
>
> Thanks in advance.
>
> Best wishes,
> Lee Matthews

I believe this is some hardware issue, probably over-heating. Did you check=
 for thermal sensor values?



--_000_fed099e1e6c2448e96883ebb6496347dstormshieldeu_
Content-Type: text/html; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1=
252">
<meta name=3D"Generator" content=3D"Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; pad=
ding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<meta content=3D"text/html; charset=3DUTF-8">
<style type=3D"text/css" style=3D"">
<!--
p
	{margin-top:0;
	margin-bottom:0}
-->
</style>
<div dir=3D"ltr">
<div id=3D"x_divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; col=
or:#000000; font-family:Calibri,Helvetica,sans-serif">
<p>Thanks for getting back to me Eugene.</p>
<p><br>
</p>
<p>On the two cores that I've received, they seem to die at the same point =
:</p>
<p><br>
</p>
<p></p>
<div>#4&nbsp; 0xffffffff8049a9e3 in panic (fmt=3D&lt;unavailable&gt;) at ..=
/../../kern/kern_shutdown.c:714<br>
#5&nbsp; 0xffffffff80780a2b in mca_intr () at ../../../x86/x86/mca.c:1193<b=
r>
#6&nbsp; &lt;signal handler called&gt;<br>
#7&nbsp; smp_rendezvous_action () at ../../../kern/subr_smp.c:417<br>
#8&nbsp; 0xffffffff804e5f79 in smp_rendezvous_cpus (map=3D..., <br>
&nbsp;&nbsp;&nbsp; setup_func=3D0xffffffff804e5e40 &lt;smp_no_rendezvous_ba=
rrier&gt;, <br>
&nbsp;&nbsp;&nbsp; action_func=3D0xffffffff80496730 &lt;rm_cleanIPI&gt;, <b=
r>
&nbsp;&nbsp;&nbsp; teardown_func=3D0xffffffff804e5e40 &lt;smp_no_rendezvous=
_barrier&gt;, arg=3D0xffffffff80cb5048 &lt;g_conf_lock&gt;)<br>
&nbsp;&nbsp;&nbsp; at ../../../kern/subr_smp.c:554<br>
#9&nbsp; 0xffffffff80496639 in _rm_wlock (rm=3D0xffffffff80cb5048 &lt;g_con=
f_lock&gt;)<br>
&nbsp;&nbsp;&nbsp; at ../../../kern/kern_rmlock.c:551<br>
<br>
</div>
<p></p>
<p>Do you think the temperature could still be an issue? If it were tempera=
ture related, could one not expect the MCA interrupt to occur during other =
function calls?</p>
<p><br>
</p>
<p>I've asked for a log of the CPU temperatures, I'll write back when I get=
 them.</p>
<p><br>
</p>
<p>Lee<br>
</p>
</div>
<hr tabindex=3D"-1" style=3D"display:inline-block; width:98%">
<div id=3D"x_divRplyFwdMsg" dir=3D"ltr"><font face=3D"Calibri, sans-serif" =
color=3D"#000000" style=3D"font-size:11pt"><b>From:</b> Eugene Grosbein &lt=
;eugen@grosbein.net&gt;<br>
<b>Sent:</b> 11 April 2023 13:59:08<br>
<b>To:</b> Lee MATTHEWS; freebsd-hackers@FreeBSD.org<br>
<b>Subject:</b> Re: BINIT and BERR signals in MCA</font>
<div>&nbsp;</div>
</div>
</div>
<font size=3D"2"><span style=3D"font-size:10pt;">
<div class=3D"PlainText">11.04.2023 18:45, Lee MATTHEWS wrote:<br>
<br>
&gt; Hello,<br>
&gt; <br>
&gt; One of our clients is experiencing problems using one of our products.=
 It runs FreeBSD 11.3 on an Intel Atom Apollo Lake E3930 two core SoC proce=
ssor.<br>
&gt; <br>
&gt; Occasionally, under very light load, the kernel will panic. I've manag=
ed to get a couple of vmcores and I notice via the backtrace that the MCA i=
nterrupt is called.<br>
&gt; <br>
&gt; I've managed to recover two vmcores and I notice in both of them that =
the Inter-Processor Interrupts are not being transferred from one CPU to th=
e other. I've also noticed that the structure mca_internal contains informa=
tion concerning the state of the MCA
 status register (value : 0x9000000020000003) for bank 0.<br>
&gt; <br>
&gt;&gt;From Intel's software architecture document, the MCA Error Code is =
0x0003 &quot;The BINIT# from another processor caused this processor to ent=
er machine check.&quot; and the Model Specific Error Code is 0x2000 &quot;1=
 if BERR is driven.&quot;<br>
&gt; <br>
&gt; The Intel document is not clear; could anyone please explain what the =
BINIT and BERR signals mean? They appear to be related to a bus, but I'm no=
t sure which one. A bus external to the Atom SoC or one of the internal bus=
es within the Atom SoC?<br>
&gt; <br>
&gt; Do you have any ideas of what could generate this type of error? Is it=
 likely a hardware or a software issue?<br>
&gt; <br>
&gt; Thanks in advance.<br>
&gt; <br>
&gt; Best wishes,<br>
&gt; Lee Matthews<br>
<br>
I believe this is some hardware issue, probably over-heating. Did you check=
 for thermal sensor values?<br>
<br>
<br>
</div>
</span></font>
</body>
</html>

--_000_fed099e1e6c2448e96883ebb6496347dstormshieldeu_--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fed099e1e6c2448e96883ebb6496347d>