Date: Tue, 21 Jun 2022 09:52:50 -0700 From: Ultima <ultima1252@gmail.com> To: "Rodney W. Grimes" <freebsd-rwg@gndrsh.dnsmgr.net> Cc: Larry Rosenman <ler@lerctr.org>, Freebsd current <freebsd-current@freebsd.org> Subject: Re: MCE: Does this look possibly like a slot issue? Message-ID: <CANJ8om4_xEjd_Dun%2BOSAGrJMO4sO%2BqO0dChfy4TF=su5nct5Vw@mail.gmail.com> In-Reply-To: <202206211606.25LG6Out053747@gndrsh.dnsmgr.net> References: <c70d0b6f3ef344786f282d7c6500a390@lerctr.org> <202206211606.25LG6Out053747@gndrsh.dnsmgr.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000b43f7505e1f80db6 Content-Type: text/plain; charset="UTF-8" Completely agree with you, Rodney. The LGA on the motherboard can be bent very easy when moving so I wanted to recommend this last. Larry, as Rodney mentioned, it's more or less your last option. This is likely the CPU and not the module itself. There is still a small chance that is motherboard/slot related, a way you can determine this is by swapping the CPU's slot 0 <----> slot 1 and seeing if the error moves. As I mentioned though, be very cautious. I don't want you to be in a worse-off state. I would reseat the problem CPU socket before swapping the CPUs. Best regards, Richard Gallamore On Tue, Jun 21, 2022 at 9:06 AM Rodney W. Grimes < freebsd-rwg@gndrsh.dnsmgr.net> wrote: > > > > > > Swapped 2 DIMMS, now we wait for the ZFS ARC to fill and start using all > > the memory. > > Depending on the results of that one thing that is often overlooked > when trying to trouble shoot memory systems in modern Intel systems > is the fact that the DIMM now talks directly to the CPU chip that > has the memory controller built into it. THUS these "slot" related > ECC/Parity/blowup errors can actually be the CPU and/or the CPU > socket and/or the seating of the CPU in the socket. > > So if the error sticks with the DIMM slot and not the DIMM > module the next thing I would try would be a CPU chip reseat, > including a good inspection of the socket for for a damaged > pin. Also look at the lands on the CPU chip itself, and you > can even try swaping CPU chips to see if it follows the > CPU or the socket, much as you do with a DIMM. > > > > > > On 06/20/2022 7:59 pm, Larry Rosenman wrote: > > > > > SuperMicro X8DTN+ > > > > > > 2 Processors, 6-core/12-Thread. CPU: Intel(R) Xeon(R) CPU > > > E5645 @ 2.40GHz (2400.20-MHz K8-class CPU) > > > > > > I'll bring it down and swap DIMMS around > > > > > > On 06/20/2022 7:57 pm, Ultima wrote: > > > > > > Hey Larry, > > > > > > One red flag I am seeing is that the error is being produced on > > > the same CPU/bank with each error you have provided so far. > > > > > > Can you try and follow my original recommendation and swap > > > currently installed DIMM with the problem DIMM slot and see > > > if anything changes? > > > > > > Can you also provide the motherboard model? Also, do you > > > have multiple CPUs installed in this system? > > > > > > Best regards, > > > Richard Gallamore > > > > > > On Mon, Jun 20, 2022 at 5:41 PM Larry Rosenman <ler@lerctr.org> wrote: > > > > > > Yes and Yes. > > > > > > On 06/20/2022 7:37 pm, Ultima wrote: > > > > > > Are you sure that the module you replaced it with was good? > > > Are you sure you replaced the correct module? > > > > > > Best regards, > > > Richard Gallamore > > > > > > On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman <ler@lerctr.org> wrote: > > > > > > I'm seeing them constantly: > > > > > > root@freenas[~]# mcelog --dmi > > > Hardware event. This is not a software error. > > > MCE 0 > > > CPU 22 BANK 8 TSC 20aab486464a > > > MISC ac29890200046444 ADDR ee2f6e800 > > > TIME 1655770989 Mon Jun 20 19:23:09 2022 > > > MCG status: > > > Memory read ECC error > > > Memory corrected error count (CORE_ERR_CNT): 1 > > > Memory transaction Tracker ID (RTId): 44 > > > Memory DIMM ID of error: 0 > > > Memory channel ID of error: 1 > > > Memory ECC syndrome: ac298902 > > > STATUS 8c0000400001009f MCGSTATUS 0 > > > MCGCAP 1c09 APICID 34 SOCKETID 0 > > > CPUID Vendor Intel Family 6 Model 44 Step 2 > > > WARNING: SMBIOS data is often unreliable. Take with a grain of salt! > > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > > > Device Locator: P2-DIMM2C > > > Bank Locator: BANK14 > > > Manufacturer: Hyundai > > > Serial Number: 40F3C20F > > > Asset Tag: > > > Part Number: HMT151R7BFR4C-H9 > > > Hardware event. This is not a software error. > > > MCE 1 > > > CPU 22 BANK 8 TSC 296dfcc82582 > > > MISC ac29890200041381 ADDR ee2f6e800 > > > TIME 1655770989 Mon Jun 20 19:23:09 2022 > > > MCG status: > > > Memory read ECC error > > > Memory corrected error count (CORE_ERR_CNT): 1 > > > Memory transaction Tracker ID (RTId): 81 > > > Memory DIMM ID of error: 0 > > > Memory channel ID of error: 1 > > > Memory ECC syndrome: ac298902 > > > STATUS 8c0000400001009f MCGSTATUS 0 > > > MCGCAP 1c09 APICID 34 SOCKETID 0 > > > CPUID Vendor Intel Family 6 Model 44 Step 2 > > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > > > Device Locator: P2-DIMM2C > > > Bank Locator: BANK14 > > > Manufacturer: Hyundai > > > Serial Number: 40F3C20F > > > Asset Tag: > > > Part Number: HMT151R7BFR4C-H9 > > > Hardware event. This is not a software error. > > > MCE 2 > > > CPU 22 BANK 8 TSC 2a5604a6a070 > > > MISC ac29890200044281 > > > TIME 1655770989 Mon Jun 20 19:23:09 2022 > > > MCG status: > > > Memory ECC error occurred during scrub > > > Memory corrected error count (CORE_ERR_CNT): 1 > > > Memory transaction Tracker ID (RTId): 81 > > > Memory DIMM ID of error: 0 > > > Memory channel ID of error: 1 > > > Memory ECC syndrome: ac298902 > > > STATUS 88000040000200cf MCGSTATUS 0 > > > MCGCAP 1c09 APICID 34 SOCKETID 0 > > > CPUID Vendor Intel Family 6 Model 44 Step 2 > > > Hardware event. This is not a software error. > > > MCE 3 > > > CPU 22 BANK 8 TSC 31e141418eb8 > > > MISC ac29890200046a4a ADDR ee2f6e800 > > > TIME 1655770989 Mon Jun 20 19:23:09 2022 > > > MCG status: > > > Memory read ECC error > > > Memory corrected error count (CORE_ERR_CNT): 1 > > > Memory transaction Tracker ID (RTId): 4a > > > Memory DIMM ID of error: 0 > > > Memory channel ID of error: 1 > > > Memory ECC syndrome: ac298902 > > > STATUS 8c0000400001009f MCGSTATUS 0 > > > MCGCAP 1c09 APICID 34 SOCKETID 0 > > > CPUID Vendor Intel Family 6 Model 44 Step 2 > > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > > > Device Locator: P2-DIMM2C > > > Bank Locator: BANK14 > > > Manufacturer: Hyundai > > > Serial Number: 40F3C20F > > > Asset Tag: > > > Part Number: HMT151R7BFR4C-H9 > > > Hardware event. This is not a software error. > > > MCE 4 > > > CPU 22 BANK 8 TSC 3a014afee106 > > > MISC ac29890200046646 ADDR ee2f6e800 > > > TIME 1655770989 Mon Jun 20 19:23:09 2022 > > > MCG status: > > > Memory read ECC error > > > Memory corrected error count (CORE_ERR_CNT): 1 > > > Memory transaction Tracker ID (RTId): 46 > > > Memory DIMM ID of error: 0 > > > Memory channel ID of error: 1 > > > Memory ECC syndrome: ac298902 > > > STATUS 8c0000400001009f MCGSTATUS 0 > > > MCGCAP 1c09 APICID 34 SOCKETID 0 > > > CPUID Vendor Intel Family 6 Model 44 Step 2 > > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > > > Device Locator: P2-DIMM2C > > > Bank Locator: BANK14 > > > Manufacturer: Hyundai > > > Serial Number: 40F3C20F > > > Asset Tag: > > > Part Number: HMT151R7BFR4C-H9 > > > Hardware event. This is not a software error. > > > MCE 5 > > > CPU 22 BANK 8 TSC 41d1dbef1a6a > > > MISC ac29890200046141 ADDR ee2f6e800 > > > TIME 1655770989 Mon Jun 20 19:23:09 2022 > > > MCG status: > > > Memory read ECC error > > > Memory corrected error count (CORE_ERR_CNT): 1 > > > Memory transaction Tracker ID (RTId): 41 > > > Memory DIMM ID of error: 0 > > > Memory channel ID of error: 1 > > > Memory ECC syndrome: ac298902 > > > STATUS 8c0000400001009f MCGSTATUS 0 > > > MCGCAP 1c09 APICID 34 SOCKETID 0 > > > CPUID Vendor Intel Family 6 Model 44 Step 2 > > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > > > Device Locator: P2-DIMM2C > > > Bank Locator: BANK14 > > > Manufacturer: Hyundai > > > Serial Number: 40F3C20F > > > Asset Tag: > > > Part Number: HMT151R7BFR4C-H9 > > > Hardware event. This is not a software error. > > > MCE 6 > > > CPU 22 BANK 8 TSC 4a1b1ecef446 > > > MISC ac29890200046a4a ADDR ee2f6e800 > > > TIME 1655770989 Mon Jun 20 19:23:09 2022 > > > MCG status: > > > Memory read ECC error > > > Memory corrected error count (CORE_ERR_CNT): 1 > > > Memory transaction Tracker ID (RTId): 4a > > > Memory DIMM ID of error: 0 > > > Memory channel ID of error: 1 > > > Memory ECC syndrome: ac298902 > > > STATUS 8c0000400001009f MCGSTATUS 0 > > > MCGCAP 1c09 APICID 34 SOCKETID 0 > > > CPUID Vendor Intel Family 6 Model 44 Step 2 > > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > > > Device Locator: P2-DIMM2C > > > Bank Locator: BANK14 > > > Manufacturer: Hyundai > > > Serial Number: 40F3C20F > > > Asset Tag: > > > Part Number: HMT151R7BFR4C-H9 > > > Hardware event. This is not a software error. > > > MCE 7 > > > CPU 22 BANK 8 TSC 527bc27db776 > > > MISC ac29890200040386 ADDR ee2f6e800 > > > TIME 1655770989 Mon Jun 20 19:23:09 2022 > > > MCG status: > > > Memory read ECC error > > > Memory corrected error count (CORE_ERR_CNT): 1 > > > Memory transaction Tracker ID (RTId): 86 > > > Memory DIMM ID of error: 0 > > > Memory channel ID of error: 1 > > > Memory ECC syndrome: ac298902 > > > STATUS 8c0000400001009f MCGSTATUS 0 > > > MCGCAP 1c09 APICID 34 SOCKETID 0 > > > CPUID Vendor Intel Family 6 Model 44 Step 2 > > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > > > Device Locator: P2-DIMM2C > > > Bank Locator: BANK14 > > > Manufacturer: Hyundai > > > Serial Number: 40F3C20F > > > Asset Tag: > > > Part Number: HMT151R7BFR4C-H9 > > > Hardware event. This is not a software error. > > > MCE 8 > > > CPU 22 BANK 8 TSC 5aa4ecdd795a > > > MISC ac29890200046646 ADDR ee2f6e800 > > > TIME 1655770989 Mon Jun 20 19:23:09 2022 > > > MCG status: > > > Memory read ECC error > > > Memory corrected error count (CORE_ERR_CNT): 1 > > > Memory transaction Tracker ID (RTId): 46 > > > Memory DIMM ID of error: 0 > > > Memory channel ID of error: 1 > > > Memory ECC syndrome: ac298902 > > > STATUS 8c0000400001009f MCGSTATUS 0 > > > MCGCAP 1c09 APICID 34 SOCKETID 0 > > > CPUID Vendor Intel Family 6 Model 44 Step 2 > > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > > > Device Locator: P2-DIMM2C > > > Bank Locator: BANK14 > > > Manufacturer: Hyundai > > > Serial Number: 40F3C20F > > > Asset Tag: > > > Part Number: HMT151R7BFR4C-H9 > > > root@freenas[~]# > > > > > > and I replaced the DIMM yesterday :( > > > > > > On 06/20/2022 7:19 pm, Ultima wrote: > > > > > > Hey Larry, > > > > > > It is possible it's the motherboard itself, but it's rare. The way I > > > would determine this is to swap the DIMM module with another > > > populated slot on the motherboard and see if the error migrated > > > to the new slot or not. Also, this error doesn't necessarily mean > > > there is a problem that needs to be addressed. If you have been > > > running the system for many months and you see ECC errors a > > > handful of times, it can probably be safely ignored. > > > > > > Best regards, > > > Richard Gallamore > > > > > > On Mon, Jun 20, 2022 at 3:14 PM Larry Rosenman <ler@lerctr.org> > wrote: > > > I've gotten a BUNCH of these on my TrueNAS server. I've replaced this > > > DIMM a couple of times, and still the MCE's continue. > > > Is it possible it's Motherboard slot issue? > > > > > > Hardware event. This is not a software error. > > > MCE 8 > > > CPU 22 BANK 8 TSC 5aa4ecdd795a > > > MISC ac29890200046646 ADDR ee2f6e800 > > > TIME 1655762472 Mon Jun 20 17:01:12 2022 > > > MCG status: > > > Memory read ECC error > > > Memory corrected error count (CORE_ERR_CNT): 1 > > > Memory transaction Tracker ID (RTId): 46 > > > Memory DIMM ID of error: 0 > > > Memory channel ID of error: 1 > > > Memory ECC syndrome: ac298902 > > > STATUS 8c0000400001009f MCGSTATUS 0 > > > MCGCAP 1c09 APICID 34 SOCKETID 0 > > > CPUID Vendor Intel Family 6 Model 44 Step 2 > > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB > > > Device Locator: P2-DIMM2C > > > Bank Locator: BANK14 > > > Manufacturer: Hyundai > > > Serial Number: 40F3C20F > > > Asset Tag: > > > Part Number: HMT151R7BFR4C-H9 > > > > > > -- > > > Larry Rosenman http://www.lerctr.org/~ler > > > Phone: +1 214-642-9640 E-Mail: ler@lerctr.org > > > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 > > > > -- > > Larry Rosenman http://www.lerctr.org/~ler > > Phone: +1 214-642-9640 E-Mail: ler@lerctr.org > > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 > > > > -- > > Larry Rosenman http://www.lerctr.org/~ler > > Phone: +1 214-642-9640 E-Mail: ler@lerctr.org > > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 > > > > -- > > Larry Rosenman http://www.lerctr.org/~ler > > Phone: +1 214-642-9640 E-Mail: ler@lerctr.org > > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 > > > > -- > > Larry Rosenman http://www.lerctr.org/~ler > > Phone: +1 214-642-9640 E-Mail: ler@lerctr.org > > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 > > -- > Rod Grimes > rgrimes@freebsd.org > --000000000000b43f7505e1f80db6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>Completely agree with you, Rodney. The LGA on the mot= herboard</div><div>can be bent very easy when moving so I wanted to recomme= nd this</div><div>last.</div><div><br></div><div>Larry, as Rodney mentioned= , it's more or less your last option. This</div><div>is likely the CPU = and not the module itself. There is still a small chance</div><div>that is = motherboard/slot related, a way you can determine this is</div><div>by swap= ping the CPU's slot 0 <----> slot 1 and seeing if the error moves= .<br></div><div>As I mentioned though, be very cautious. I don't want y= ou to be in a worse-off</div><div>state.</div><div><br></div><div>I would r= eseat the problem CPU socket before swapping the CPUs.<br></div><div><br></= div><div></div><div>Best regards,</div><div>Richard Gallamore<br></div></di= v><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On T= ue, Jun 21, 2022 at 9:06 AM Rodney W. Grimes <<a href=3D"mailto:freebsd-= rwg@gndrsh.dnsmgr.net">freebsd-rwg@gndrsh.dnsmgr.net</a>> wrote:<br></di= v><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;borde= r-left:1px solid rgb(204,204,204);padding-left:1ex">> <br> > <br> > Swapped 2 DIMMS, now we wait for the ZFS ARC to fill and start using a= ll <br> > the memory.<br> <br> Depending on the results of that one thing that is often overlooked<br> when trying to trouble shoot memory systems in modern Intel systems<br> is the fact that the DIMM now talks directly to the CPU chip that<br> has the memory controller built into it.=C2=A0 THUS these "slot" = related<br> ECC/Parity/blowup errors can actually be the CPU and/or the CPU<br> socket and/or the seating of the CPU in the socket.=C2=A0 <br> <br> So if the error sticks with the DIMM slot and not the DIMM<br> module the next thing I would try would be a CPU chip reseat,<br> including a good inspection of the socket for for a damaged<br> pin.=C2=A0 Also look at the lands on the CPU chip itself, and you<br> can even try swaping CPU chips to see if it follows the<br> CPU or the socket, much as you do with a DIMM.<br> <br> <br> > <br> > On 06/20/2022 7:59 pm, Larry Rosenman wrote:<br> > <br> > > SuperMicro X8DTN+<br> > > <br> > > 2 Processors, 6-core/12-Thread. CPU: Intel(R) Xeon(R) CPU=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<br> > > E5645=C2=A0 @ 2.40GHz (2400.20-MHz K8-class CPU)<br> > > <br> > > I'll bring it down and swap DIMMS around<br> > > <br> > > On 06/20/2022 7:57 pm, Ultima wrote:<br> > > <br> > > Hey Larry,<br> > > <br> > > One red flag I am seeing is that the error is being produced on<b= r> > > the same CPU/bank with each error you have provided so far.<br> > > <br> > > Can you try and follow my original recommendation and swap<br> > > currently installed DIMM with the problem DIMM slot and see<br> > > if anything changes?<br> > > <br> > > Can you also provide the motherboard model? Also, do you<br> > > have multiple CPUs installed in this system?<br> > > <br> > > Best regards,<br> > > Richard Gallamore<br> > > <br> > > On Mon, Jun 20, 2022 at 5:41 PM Larry Rosenman <<a href=3D"mai= lto:ler@lerctr.org" target=3D"_blank">ler@lerctr.org</a>> wrote:<br> > > <br> > > Yes and Yes.<br> > > <br> > > On 06/20/2022 7:37 pm, Ultima wrote:<br> > > <br> > > Are you sure that the module you replaced it with was good?<br> > > Are you sure you replaced the correct module?<br> > > <br> > > Best regards,<br> > > Richard Gallamore<br> > > <br> > > On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman <<a href=3D"mai= lto:ler@lerctr.org" target=3D"_blank">ler@lerctr.org</a>> wrote:<br> > > <br> > > I'm seeing them constantly:<br> > > <br> > > root@freenas[~]# mcelog --dmi<br> > > Hardware event. This is not a software error.<br> > > MCE 0<br> > > CPU 22 BANK 8 TSC 20aab486464a<br> > > MISC ac29890200046444 ADDR ee2f6e800<br> > > TIME 1655770989 Mon Jun 20 19:23:09 2022<br> > > MCG status:<br> > > Memory read ECC error<br> > > Memory corrected error count (CORE_ERR_CNT): 1<br> > > Memory transaction Tracker ID (RTId): 44<br> > > Memory DIMM ID of error: 0<br> > > Memory channel ID of error: 1<br> > > Memory ECC syndrome: ac298902<br> > > STATUS 8c0000400001009f MCGSTATUS 0<br> > > MCGCAP 1c09 APICID 34 SOCKETID 0<br> > > CPUID Vendor Intel Family 6 Model 44 Step 2<br> > > WARNING: SMBIOS data is often unreliable. Take with a grain of sa= lt!<br> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br> > > Device Locator: P2-DIMM2C<br> > > Bank Locator: BANK14<br> > > Manufacturer: Hyundai<br> > > Serial Number: 40F3C20F<br> > > Asset Tag:<br> > > Part Number: HMT151R7BFR4C-H9<br> > > Hardware event. This is not a software error.<br> > > MCE 1<br> > > CPU 22 BANK 8 TSC 296dfcc82582<br> > > MISC ac29890200041381 ADDR ee2f6e800<br> > > TIME 1655770989 Mon Jun 20 19:23:09 2022<br> > > MCG status:<br> > > Memory read ECC error<br> > > Memory corrected error count (CORE_ERR_CNT): 1<br> > > Memory transaction Tracker ID (RTId): 81<br> > > Memory DIMM ID of error: 0<br> > > Memory channel ID of error: 1<br> > > Memory ECC syndrome: ac298902<br> > > STATUS 8c0000400001009f MCGSTATUS 0<br> > > MCGCAP 1c09 APICID 34 SOCKETID 0<br> > > CPUID Vendor Intel Family 6 Model 44 Step 2<br> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br> > > Device Locator: P2-DIMM2C<br> > > Bank Locator: BANK14<br> > > Manufacturer: Hyundai<br> > > Serial Number: 40F3C20F<br> > > Asset Tag:<br> > > Part Number: HMT151R7BFR4C-H9<br> > > Hardware event. This is not a software error.<br> > > MCE 2<br> > > CPU 22 BANK 8 TSC 2a5604a6a070<br> > > MISC ac29890200044281<br> > > TIME 1655770989 Mon Jun 20 19:23:09 2022<br> > > MCG status:<br> > > Memory ECC error occurred during scrub<br> > > Memory corrected error count (CORE_ERR_CNT): 1<br> > > Memory transaction Tracker ID (RTId): 81<br> > > Memory DIMM ID of error: 0<br> > > Memory channel ID of error: 1<br> > > Memory ECC syndrome: ac298902<br> > > STATUS 88000040000200cf MCGSTATUS 0<br> > > MCGCAP 1c09 APICID 34 SOCKETID 0<br> > > CPUID Vendor Intel Family 6 Model 44 Step 2<br> > > Hardware event. This is not a software error.<br> > > MCE 3<br> > > CPU 22 BANK 8 TSC 31e141418eb8<br> > > MISC ac29890200046a4a ADDR ee2f6e800<br> > > TIME 1655770989 Mon Jun 20 19:23:09 2022<br> > > MCG status:<br> > > Memory read ECC error<br> > > Memory corrected error count (CORE_ERR_CNT): 1<br> > > Memory transaction Tracker ID (RTId): 4a<br> > > Memory DIMM ID of error: 0<br> > > Memory channel ID of error: 1<br> > > Memory ECC syndrome: ac298902<br> > > STATUS 8c0000400001009f MCGSTATUS 0<br> > > MCGCAP 1c09 APICID 34 SOCKETID 0<br> > > CPUID Vendor Intel Family 6 Model 44 Step 2<br> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br> > > Device Locator: P2-DIMM2C<br> > > Bank Locator: BANK14<br> > > Manufacturer: Hyundai<br> > > Serial Number: 40F3C20F<br> > > Asset Tag:<br> > > Part Number: HMT151R7BFR4C-H9<br> > > Hardware event. This is not a software error.<br> > > MCE 4<br> > > CPU 22 BANK 8 TSC 3a014afee106<br> > > MISC ac29890200046646 ADDR ee2f6e800<br> > > TIME 1655770989 Mon Jun 20 19:23:09 2022<br> > > MCG status:<br> > > Memory read ECC error<br> > > Memory corrected error count (CORE_ERR_CNT): 1<br> > > Memory transaction Tracker ID (RTId): 46<br> > > Memory DIMM ID of error: 0<br> > > Memory channel ID of error: 1<br> > > Memory ECC syndrome: ac298902<br> > > STATUS 8c0000400001009f MCGSTATUS 0<br> > > MCGCAP 1c09 APICID 34 SOCKETID 0<br> > > CPUID Vendor Intel Family 6 Model 44 Step 2<br> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br> > > Device Locator: P2-DIMM2C<br> > > Bank Locator: BANK14<br> > > Manufacturer: Hyundai<br> > > Serial Number: 40F3C20F<br> > > Asset Tag:<br> > > Part Number: HMT151R7BFR4C-H9<br> > > Hardware event. This is not a software error.<br> > > MCE 5<br> > > CPU 22 BANK 8 TSC 41d1dbef1a6a<br> > > MISC ac29890200046141 ADDR ee2f6e800<br> > > TIME 1655770989 Mon Jun 20 19:23:09 2022<br> > > MCG status:<br> > > Memory read ECC error<br> > > Memory corrected error count (CORE_ERR_CNT): 1<br> > > Memory transaction Tracker ID (RTId): 41<br> > > Memory DIMM ID of error: 0<br> > > Memory channel ID of error: 1<br> > > Memory ECC syndrome: ac298902<br> > > STATUS 8c0000400001009f MCGSTATUS 0<br> > > MCGCAP 1c09 APICID 34 SOCKETID 0<br> > > CPUID Vendor Intel Family 6 Model 44 Step 2<br> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br> > > Device Locator: P2-DIMM2C<br> > > Bank Locator: BANK14<br> > > Manufacturer: Hyundai<br> > > Serial Number: 40F3C20F<br> > > Asset Tag:<br> > > Part Number: HMT151R7BFR4C-H9<br> > > Hardware event. This is not a software error.<br> > > MCE 6<br> > > CPU 22 BANK 8 TSC 4a1b1ecef446<br> > > MISC ac29890200046a4a ADDR ee2f6e800<br> > > TIME 1655770989 Mon Jun 20 19:23:09 2022<br> > > MCG status:<br> > > Memory read ECC error<br> > > Memory corrected error count (CORE_ERR_CNT): 1<br> > > Memory transaction Tracker ID (RTId): 4a<br> > > Memory DIMM ID of error: 0<br> > > Memory channel ID of error: 1<br> > > Memory ECC syndrome: ac298902<br> > > STATUS 8c0000400001009f MCGSTATUS 0<br> > > MCGCAP 1c09 APICID 34 SOCKETID 0<br> > > CPUID Vendor Intel Family 6 Model 44 Step 2<br> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br> > > Device Locator: P2-DIMM2C<br> > > Bank Locator: BANK14<br> > > Manufacturer: Hyundai<br> > > Serial Number: 40F3C20F<br> > > Asset Tag:<br> > > Part Number: HMT151R7BFR4C-H9<br> > > Hardware event. This is not a software error.<br> > > MCE 7<br> > > CPU 22 BANK 8 TSC 527bc27db776<br> > > MISC ac29890200040386 ADDR ee2f6e800<br> > > TIME 1655770989 Mon Jun 20 19:23:09 2022<br> > > MCG status:<br> > > Memory read ECC error<br> > > Memory corrected error count (CORE_ERR_CNT): 1<br> > > Memory transaction Tracker ID (RTId): 86<br> > > Memory DIMM ID of error: 0<br> > > Memory channel ID of error: 1<br> > > Memory ECC syndrome: ac298902<br> > > STATUS 8c0000400001009f MCGSTATUS 0<br> > > MCGCAP 1c09 APICID 34 SOCKETID 0<br> > > CPUID Vendor Intel Family 6 Model 44 Step 2<br> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br> > > Device Locator: P2-DIMM2C<br> > > Bank Locator: BANK14<br> > > Manufacturer: Hyundai<br> > > Serial Number: 40F3C20F<br> > > Asset Tag:<br> > > Part Number: HMT151R7BFR4C-H9<br> > > Hardware event. This is not a software error.<br> > > MCE 8<br> > > CPU 22 BANK 8 TSC 5aa4ecdd795a<br> > > MISC ac29890200046646 ADDR ee2f6e800<br> > > TIME 1655770989 Mon Jun 20 19:23:09 2022<br> > > MCG status:<br> > > Memory read ECC error<br> > > Memory corrected error count (CORE_ERR_CNT): 1<br> > > Memory transaction Tracker ID (RTId): 46<br> > > Memory DIMM ID of error: 0<br> > > Memory channel ID of error: 1<br> > > Memory ECC syndrome: ac298902<br> > > STATUS 8c0000400001009f MCGSTATUS 0<br> > > MCGCAP 1c09 APICID 34 SOCKETID 0<br> > > CPUID Vendor Intel Family 6 Model 44 Step 2<br> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br> > > Device Locator: P2-DIMM2C<br> > > Bank Locator: BANK14<br> > > Manufacturer: Hyundai<br> > > Serial Number: 40F3C20F<br> > > Asset Tag:<br> > > Part Number: HMT151R7BFR4C-H9<br> > > root@freenas[~]#<br> > > <br> > > and I replaced the DIMM yesterday :(<br> > > <br> > > On 06/20/2022 7:19 pm, Ultima wrote:<br> > > <br> > > Hey Larry,<br> > > <br> > > It is possible it's the motherboard itself, but it's rare= . The way I<br> > > would determine this is to swap the DIMM module with another<br> > > populated slot on the motherboard and see if the error migrated<b= r> > > to the new slot or not. Also, this error doesn't necessarily = mean<br> > > there is a problem that needs to be addressed. If you have been<b= r> > > running the system for many months and you see ECC errors a<br> > > handful of times, it can probably be safely ignored.<br> > > <br> > > Best regards,<br> > > Richard Gallamore<br> > > <br> > > On Mon, Jun 20, 2022 at 3:14 PM Larry Rosenman <<a href=3D"mai= lto:ler@lerctr.org" target=3D"_blank">ler@lerctr.org</a>> wrote: <br> > > I've gotten a BUNCH of these on my TrueNAS server.=C2=A0 I= 9;ve replaced this<br> > > DIMM a couple of times, and still the MCE's continue.<br> > > Is it possible it's Motherboard slot issue?<br> > > <br> > > Hardware event. This is not a software error.<br> > > MCE 8<br> > > CPU 22 BANK 8 TSC 5aa4ecdd795a<br> > > MISC ac29890200046646 ADDR ee2f6e800<br> > > TIME 1655762472 Mon Jun 20 17:01:12 2022<br> > > MCG status:<br> > > Memory read ECC error<br> > > Memory corrected error count (CORE_ERR_CNT): 1<br> > > Memory transaction Tracker ID (RTId): 46<br> > > Memory DIMM ID of error: 0<br> > > Memory channel ID of error: 1<br> > > Memory ECC syndrome: ac298902<br> > > STATUS 8c0000400001009f MCGSTATUS 0<br> > > MCGCAP 1c09 APICID 34 SOCKETID 0<br> > > CPUID Vendor Intel Family 6 Model 44 Step 2<br> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br> > > Device Locator: P2-DIMM2C<br> > > Bank Locator: BANK14<br> > > Manufacturer: Hyundai<br> > > Serial Number: 40F3C20F<br> > > Asset Tag:<br> > > Part Number: HMT151R7BFR4C-H9<br> > > <br> > > --<br> > > Larry Rosenman=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0<a href=3D"http://www.lerctr.org/~ler" rel=3D"no= referrer" target=3D"_blank">http://www.lerctr.org/~ler</a><br> > > Phone: +1 214-642-9640=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0E-Mail: <a href=3D"mailto:ler@lerctr.org" target=3D"_bl= ank">ler@lerctr.org</a><br> > > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106<br> > <br> > -- <br> > Larry Rosenman=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0<a href=3D"http://www.lerctr.org/~ler" rel=3D"noreferre= r" target=3D"_blank">http://www.lerctr.org/~ler</a><br> > Phone: +1 214-642-9640=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0E-Mail: <a href=3D"mailto:ler@lerctr.org" target=3D"_blank">l= er@lerctr.org</a><br> > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106<br> > <br> > -- <br> > Larry Rosenman=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0<a href=3D"http://www.lerctr.org/~ler" rel=3D"noreferre= r" target=3D"_blank">http://www.lerctr.org/~ler</a><br> > Phone: +1 214-642-9640=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0E-Mail: <a href=3D"mailto:ler@lerctr.org" target=3D"_blank">l= er@lerctr.org</a><br> > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106<br> > <br> > -- <br> > Larry Rosenman=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0<a href=3D"http://www.lerctr.org/~ler" rel=3D"noreferre= r" target=3D"_blank">http://www.lerctr.org/~ler</a><br> > Phone: +1 214-642-9640=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0E-Mail: <a href=3D"mailto:ler@lerctr.org" target=3D"_blank">l= er@lerctr.org</a><br> > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106<br> > <br> > -- <br> > Larry Rosenman=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0<a href=3D"http://www.lerctr.org/~ler" rel=3D"noreferre= r" target=3D"_blank">http://www.lerctr.org/~ler</a><br> > Phone: +1 214-642-9640=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0E-Mail: <a href=3D"mailto:ler@lerctr.org" target=3D"_blank">l= er@lerctr.org</a><br> > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106<br> <br> -- <br> Rod Grimes=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<a href=3D"mailto:rgrimes@freebsd.org= " target=3D"_blank">rgrimes@freebsd.org</a><br> </blockquote></div> --000000000000b43f7505e1f80db6--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANJ8om4_xEjd_Dun%2BOSAGrJMO4sO%2BqO0dChfy4TF=su5nct5Vw>