Date: Tue, 11 Apr 2023 14:28:35 +0000 From: Lee MATTHEWS <Lee.MATTHEWS.external@stormshield.eu> To: Eugene Grosbein <eugen@grosbein.net>, "freebsd-hackers@FreeBSD.org" <freebsd-hackers@FreeBSD.org> Subject: Re: BINIT and BERR signals in MCA Message-ID: <fed099e1e6c2448e96883ebb6496347d@stormshield.eu> In-Reply-To: <24a51bf0-71de-f596-ef8b-785da4a27fd7@grosbein.net> References: <4bd3e1017a104598ab92e658f25b5367@stormshield.eu>,<24a51bf0-71de-f596-ef8b-785da4a27fd7@grosbein.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--_000_fed099e1e6c2448e96883ebb6496347dstormshieldeu_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Thanks for getting back to me Eugene. On the two cores that I've received, they seem to die at the same point : #4 0xffffffff8049a9e3 in panic (fmt=3D<unavailable>) at ../../../kern/kern= _shutdown.c:714 #5 0xffffffff80780a2b in mca_intr () at ../../../x86/x86/mca.c:1193 #6 <signal handler called> #7 smp_rendezvous_action () at ../../../kern/subr_smp.c:417 #8 0xffffffff804e5f79 in smp_rendezvous_cpus (map=3D..., setup_func=3D0xffffffff804e5e40 <smp_no_rendezvous_barrier>, action_func=3D0xffffffff80496730 <rm_cleanIPI>, teardown_func=3D0xffffffff804e5e40 <smp_no_rendezvous_barrier>, arg=3D0= xffffffff80cb5048 <g_conf_lock>) at ../../../kern/subr_smp.c:554 #9 0xffffffff80496639 in _rm_wlock (rm=3D0xffffffff80cb5048 <g_conf_lock>) at ../../../kern/kern_rmlock.c:551 Do you think the temperature could still be an issue? If it were temperatur= e related, could one not expect the MCA interrupt to occur during other fun= ction calls? I've asked for a log of the CPU temperatures, I'll write back when I get th= em. Lee ________________________________ From: Eugene Grosbein <eugen@grosbein.net> Sent: 11 April 2023 13:59:08 To: Lee MATTHEWS; freebsd-hackers@FreeBSD.org Subject: Re: BINIT and BERR signals in MCA 11.04.2023 18:45, Lee MATTHEWS wrote: > Hello, > > One of our clients is experiencing problems using one of our products. It= runs FreeBSD 11.3 on an Intel Atom Apollo Lake E3930 two core SoC processo= r. > > Occasionally, under very light load, the kernel will panic. I've managed = to get a couple of vmcores and I notice via the backtrace that the MCA inte= rrupt is called. > > I've managed to recover two vmcores and I notice in both of them that the= Inter-Processor Interrupts are not being transferred from one CPU to the o= ther. I've also noticed that the structure mca_internal contains informatio= n concerning the state of the MCA status register (value : 0x90000000200000= 03) for bank 0. > >>>From Intel's software architecture document, the MCA Error Code is 0x0003= "The BINIT# from another processor caused this processor to enter machine = check." and the Model Specific Error Code is 0x2000 "1 if BERR is driven." > > The Intel document is not clear; could anyone please explain what the BIN= IT and BERR signals mean? They appear to be related to a bus, but I'm not s= ure which one. A bus external to the Atom SoC or one of the internal buses = within the Atom SoC? > > Do you have any ideas of what could generate this type of error? Is it li= kely a hardware or a software issue? > > Thanks in advance. > > Best wishes, > Lee Matthews I believe this is some hardware issue, probably over-heating. Did you check= for thermal sensor values? --_000_fed099e1e6c2448e96883ebb6496347dstormshieldeu_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1= 252"> <meta name=3D"Generator" content=3D"Microsoft Exchange Server"> <!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; pad= ding-left: 4pt; border-left: #800000 2px solid; } --></style> </head> <body> <meta content=3D"text/html; charset=3DUTF-8"> <style type=3D"text/css" style=3D""> <!-- p {margin-top:0; margin-bottom:0} --> </style> <div dir=3D"ltr"> <div id=3D"x_divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; col= or:#000000; font-family:Calibri,Helvetica,sans-serif"> <p>Thanks for getting back to me Eugene.</p> <p><br> </p> <p>On the two cores that I've received, they seem to die at the same point = :</p> <p><br> </p> <p></p> <div>#4 0xffffffff8049a9e3 in panic (fmt=3D<unavailable>) at ..= /../../kern/kern_shutdown.c:714<br> #5 0xffffffff80780a2b in mca_intr () at ../../../x86/x86/mca.c:1193<b= r> #6 <signal handler called><br> #7 smp_rendezvous_action () at ../../../kern/subr_smp.c:417<br> #8 0xffffffff804e5f79 in smp_rendezvous_cpus (map=3D..., <br> setup_func=3D0xffffffff804e5e40 <smp_no_rendezvous_ba= rrier>, <br> action_func=3D0xffffffff80496730 <rm_cleanIPI>, <b= r> teardown_func=3D0xffffffff804e5e40 <smp_no_rendezvous= _barrier>, arg=3D0xffffffff80cb5048 <g_conf_lock>)<br> at ../../../kern/subr_smp.c:554<br> #9 0xffffffff80496639 in _rm_wlock (rm=3D0xffffffff80cb5048 <g_con= f_lock>)<br> at ../../../kern/kern_rmlock.c:551<br> <br> </div> <p></p> <p>Do you think the temperature could still be an issue? If it were tempera= ture related, could one not expect the MCA interrupt to occur during other = function calls?</p> <p><br> </p> <p>I've asked for a log of the CPU temperatures, I'll write back when I get= them.</p> <p><br> </p> <p>Lee<br> </p> </div> <hr tabindex=3D"-1" style=3D"display:inline-block; width:98%"> <div id=3D"x_divRplyFwdMsg" dir=3D"ltr"><font face=3D"Calibri, sans-serif" = color=3D"#000000" style=3D"font-size:11pt"><b>From:</b> Eugene Grosbein <= ;eugen@grosbein.net><br> <b>Sent:</b> 11 April 2023 13:59:08<br> <b>To:</b> Lee MATTHEWS; freebsd-hackers@FreeBSD.org<br> <b>Subject:</b> Re: BINIT and BERR signals in MCA</font> <div> </div> </div> </div> <font size=3D"2"><span style=3D"font-size:10pt;"> <div class=3D"PlainText">11.04.2023 18:45, Lee MATTHEWS wrote:<br> <br> > Hello,<br> > <br> > One of our clients is experiencing problems using one of our products.= It runs FreeBSD 11.3 on an Intel Atom Apollo Lake E3930 two core SoC proce= ssor.<br> > <br> > Occasionally, under very light load, the kernel will panic. I've manag= ed to get a couple of vmcores and I notice via the backtrace that the MCA i= nterrupt is called.<br> > <br> > I've managed to recover two vmcores and I notice in both of them that = the Inter-Processor Interrupts are not being transferred from one CPU to th= e other. I've also noticed that the structure mca_internal contains informa= tion concerning the state of the MCA status register (value : 0x9000000020000003) for bank 0.<br> > <br> >>From Intel's software architecture document, the MCA Error Code is = 0x0003 "The BINIT# from another processor caused this processor to ent= er machine check." and the Model Specific Error Code is 0x2000 "1= if BERR is driven."<br> > <br> > The Intel document is not clear; could anyone please explain what the = BINIT and BERR signals mean? They appear to be related to a bus, but I'm no= t sure which one. A bus external to the Atom SoC or one of the internal bus= es within the Atom SoC?<br> > <br> > Do you have any ideas of what could generate this type of error? Is it= likely a hardware or a software issue?<br> > <br> > Thanks in advance.<br> > <br> > Best wishes,<br> > Lee Matthews<br> <br> I believe this is some hardware issue, probably over-heating. Did you check= for thermal sensor values?<br> <br> <br> </div> </span></font> </body> </html> --_000_fed099e1e6c2448e96883ebb6496347dstormshieldeu_--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fed099e1e6c2448e96883ebb6496347d>