Date: Tue, 21 Jun 2022 11:13:28 -0500 From: Larry Rosenman <ler@lerctr.org> To: "Rodney W. Grimes" <freebsd-rwg@gndrsh.dnsmgr.net> Cc: Ultima <ultima1252@gmail.com>, Freebsd current <freebsd-current@freebsd.org> Subject: Re: MCE: Does this look possibly like a slot issue? Message-ID: <56938c90ef717a0d29566f81353c1295@lerctr.org> In-Reply-To: <202206211606.25LG6Out053747@gndrsh.dnsmgr.net> References: <202206211606.25LG6Out053747@gndrsh.dnsmgr.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Looks like it might be just that, Rodney: root@freenas[~]# mcelog Hardware event. This is not a software error. MCE 0 CPU 14 BANK 8 TSC 525efc019bb6 MISC ac29890200040083 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 83 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 1 CPU 14 BANK 8 TSC 52a513d27f2c MISC ac29890200041083 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 83 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 2 CPU 14 BANK 8 TSC 53d8cf2ceb4a MISC ac29890200040582 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 3 CPU 14 BANK 8 TSC 5e4dae622cb6 MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 4 CPU 14 BANK 8 TSC 5eea68fdad4e MISC ac29890200041784 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 84 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 5 CPU 14 BANK 8 TSC 5eea6e0bbce0 MISC ac29890200044000 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 0 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 6 CPU 12 BANK 8 TSC 5f6cbe9ef2bc MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 20 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 7 CPU 14 BANK 8 TSC 64ba63c66e52 MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 8 CPU 14 BANK 8 TSC 659878c17622 MISC ac29890200040282 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 9 CPU 14 BANK 8 TSC 66b71c1dccf6 MISC ac29890200040183 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 83 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 10 CPU 14 BANK 8 TSC 6be0988610ce MISC ac29890200040682 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 11 CPU 14 BANK 8 TSC 6be0995926f8 MISC ac29890200044000 ADDR ee2f6e800 TIME 1655827944 Tue Jun 21 11:12:24 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 0 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 root@freenas[~]# mcelog --dmi Hardware event. This is not a software error. MCE 0 CPU 14 BANK 8 TSC 525efc019bb6 MISC ac29890200040083 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 83 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 WARNING: SMBIOS data is often unreliable. Take with a grain of salt! DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 1 CPU 14 BANK 8 TSC 52a513d27f2c MISC ac29890200041083 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 83 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 2 CPU 14 BANK 8 TSC 53d8cf2ceb4a MISC ac29890200040582 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 3 CPU 14 BANK 8 TSC 5e4dae622cb6 MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 4 CPU 14 BANK 8 TSC 5eea68fdad4e MISC ac29890200041784 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 84 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 5 CPU 14 BANK 8 TSC 5eea6e0bbce0 MISC ac29890200044000 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 0 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 6 CPU 12 BANK 8 TSC 5f6cbe9ef2bc MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 20 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 7 CPU 14 BANK 8 TSC 64ba63c66e52 MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 8 CPU 14 BANK 8 TSC 659878c17622 MISC ac29890200040282 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 9 CPU 14 BANK 8 TSC 66b71c1dccf6 MISC ac29890200040183 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 83 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 10 CPU 14 BANK 8 TSC 6be0988610ce MISC ac29890200040682 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 11 CPU 14 BANK 8 TSC 6be0995926f8 MISC ac29890200044000 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 0 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c0000400001009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE root@freenas[~]# On 06/21/2022 11:06 am, Rodney W. Grimes wrote: >> >> >> Swapped 2 DIMMS, now we wait for the ZFS ARC to fill and start using >> all >> the memory. > > Depending on the results of that one thing that is often overlooked > when trying to trouble shoot memory systems in modern Intel systems > is the fact that the DIMM now talks directly to the CPU chip that > has the memory controller built into it. THUS these "slot" related > ECC/Parity/blowup errors can actually be the CPU and/or the CPU > socket and/or the seating of the CPU in the socket. > > So if the error sticks with the DIMM slot and not the DIMM > module the next thing I would try would be a CPU chip reseat, > including a good inspection of the socket for for a damaged > pin. Also look at the lands on the CPU chip itself, and you > can even try swaping CPU chips to see if it follows the > CPU or the socket, much as you do with a DIMM. > > >> >> On 06/20/2022 7:59 pm, Larry Rosenman wrote: >> >> > SuperMicro X8DTN+ >> > >> > 2 Processors, 6-core/12-Thread. CPU: Intel(R) Xeon(R) CPU >> > E5645 @ 2.40GHz (2400.20-MHz K8-class CPU) >> > >> > I'll bring it down and swap DIMMS around >> > >> > On 06/20/2022 7:57 pm, Ultima wrote: >> > >> > Hey Larry, >> > >> > One red flag I am seeing is that the error is being produced on >> > the same CPU/bank with each error you have provided so far. >> > >> > Can you try and follow my original recommendation and swap >> > currently installed DIMM with the problem DIMM slot and see >> > if anything changes? >> > >> > Can you also provide the motherboard model? Also, do you >> > have multiple CPUs installed in this system? >> > >> > Best regards, >> > Richard Gallamore >> > >> > On Mon, Jun 20, 2022 at 5:41 PM Larry Rosenman <ler@lerctr.org> wrote: >> > >> > Yes and Yes. >> > >> > On 06/20/2022 7:37 pm, Ultima wrote: >> > >> > Are you sure that the module you replaced it with was good? >> > Are you sure you replaced the correct module? >> > >> > Best regards, >> > Richard Gallamore >> > >> > On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman <ler@lerctr.org> wrote: >> > >> > I'm seeing them constantly: >> > >> > root@freenas[~]# mcelog --dmi >> > Hardware event. This is not a software error. >> > MCE 0 >> > CPU 22 BANK 8 TSC 20aab486464a >> > MISC ac29890200046444 ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 44 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > WARNING: SMBIOS data is often unreliable. Take with a grain of salt! >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 1 >> > CPU 22 BANK 8 TSC 296dfcc82582 >> > MISC ac29890200041381 ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 81 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 2 >> > CPU 22 BANK 8 TSC 2a5604a6a070 >> > MISC ac29890200044281 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory ECC error occurred during scrub >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 81 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 88000040000200cf MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > Hardware event. This is not a software error. >> > MCE 3 >> > CPU 22 BANK 8 TSC 31e141418eb8 >> > MISC ac29890200046a4a ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 4a >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 4 >> > CPU 22 BANK 8 TSC 3a014afee106 >> > MISC ac29890200046646 ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 46 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 5 >> > CPU 22 BANK 8 TSC 41d1dbef1a6a >> > MISC ac29890200046141 ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 41 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 6 >> > CPU 22 BANK 8 TSC 4a1b1ecef446 >> > MISC ac29890200046a4a ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 4a >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 7 >> > CPU 22 BANK 8 TSC 527bc27db776 >> > MISC ac29890200040386 ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 86 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > Hardware event. This is not a software error. >> > MCE 8 >> > CPU 22 BANK 8 TSC 5aa4ecdd795a >> > MISC ac29890200046646 ADDR ee2f6e800 >> > TIME 1655770989 Mon Jun 20 19:23:09 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 46 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > root@freenas[~]# >> > >> > and I replaced the DIMM yesterday :( >> > >> > On 06/20/2022 7:19 pm, Ultima wrote: >> > >> > Hey Larry, >> > >> > It is possible it's the motherboard itself, but it's rare. The way I >> > would determine this is to swap the DIMM module with another >> > populated slot on the motherboard and see if the error migrated >> > to the new slot or not. Also, this error doesn't necessarily mean >> > there is a problem that needs to be addressed. If you have been >> > running the system for many months and you see ECC errors a >> > handful of times, it can probably be safely ignored. >> > >> > Best regards, >> > Richard Gallamore >> > >> > On Mon, Jun 20, 2022 at 3:14 PM Larry Rosenman <ler@lerctr.org> wrote: >> > I've gotten a BUNCH of these on my TrueNAS server. I've replaced this >> > DIMM a couple of times, and still the MCE's continue. >> > Is it possible it's Motherboard slot issue? >> > >> > Hardware event. This is not a software error. >> > MCE 8 >> > CPU 22 BANK 8 TSC 5aa4ecdd795a >> > MISC ac29890200046646 ADDR ee2f6e800 >> > TIME 1655762472 Mon Jun 20 17:01:12 2022 >> > MCG status: >> > Memory read ECC error >> > Memory corrected error count (CORE_ERR_CNT): 1 >> > Memory transaction Tracker ID (RTId): 46 >> > Memory DIMM ID of error: 0 >> > Memory channel ID of error: 1 >> > Memory ECC syndrome: ac298902 >> > STATUS 8c0000400001009f MCGSTATUS 0 >> > MCGCAP 1c09 APICID 34 SOCKETID 0 >> > CPUID Vendor Intel Family 6 Model 44 Step 2 >> > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB >> > Device Locator: P2-DIMM2C >> > Bank Locator: BANK14 >> > Manufacturer: Hyundai >> > Serial Number: 40F3C20F >> > Asset Tag: >> > Part Number: HMT151R7BFR4C-H9 >> > >> > -- >> > Larry Rosenman http://www.lerctr.org/~ler >> > Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >> > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 >> >> -- >> Larry Rosenman http://www.lerctr.org/~ler >> Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 >> >> -- >> Larry Rosenman http://www.lerctr.org/~ler >> Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 >> >> -- >> Larry Rosenman http://www.lerctr.org/~ler >> Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 >> >> -- >> Larry Rosenman http://www.lerctr.org/~ler >> Phone: +1 214-642-9640 E-Mail: ler@lerctr.org >> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: ler@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56938c90ef717a0d29566f81353c1295>