Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Jun 2022 09:52:50 -0700
From:      Ultima <ultima1252@gmail.com>
To:        "Rodney W. Grimes" <freebsd-rwg@gndrsh.dnsmgr.net>
Cc:        Larry Rosenman <ler@lerctr.org>, Freebsd current <freebsd-current@freebsd.org>
Subject:   Re: MCE: Does this look possibly like a slot issue?
Message-ID:  <CANJ8om4_xEjd_Dun%2BOSAGrJMO4sO%2BqO0dChfy4TF=su5nct5Vw@mail.gmail.com>
In-Reply-To: <202206211606.25LG6Out053747@gndrsh.dnsmgr.net>
References:  <c70d0b6f3ef344786f282d7c6500a390@lerctr.org> <202206211606.25LG6Out053747@gndrsh.dnsmgr.net>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000b43f7505e1f80db6
Content-Type: text/plain; charset="UTF-8"

Completely agree with you, Rodney. The LGA on the motherboard
can be bent very easy when moving so I wanted to recommend this
last.

Larry, as Rodney mentioned, it's more or less your last option. This
is likely the CPU and not the module itself. There is still a small chance
that is motherboard/slot related, a way you can determine this is
by swapping the CPU's slot 0 <----> slot 1 and seeing if the error moves.
As I mentioned though, be very cautious. I don't want you to be in a
worse-off
state.

I would reseat the problem CPU socket before swapping the CPUs.

Best regards,
Richard Gallamore

On Tue, Jun 21, 2022 at 9:06 AM Rodney W. Grimes <
freebsd-rwg@gndrsh.dnsmgr.net> wrote:

> >
> >
> > Swapped 2 DIMMS, now we wait for the ZFS ARC to fill and start using all
> > the memory.
>
> Depending on the results of that one thing that is often overlooked
> when trying to trouble shoot memory systems in modern Intel systems
> is the fact that the DIMM now talks directly to the CPU chip that
> has the memory controller built into it.  THUS these "slot" related
> ECC/Parity/blowup errors can actually be the CPU and/or the CPU
> socket and/or the seating of the CPU in the socket.
>
> So if the error sticks with the DIMM slot and not the DIMM
> module the next thing I would try would be a CPU chip reseat,
> including a good inspection of the socket for for a damaged
> pin.  Also look at the lands on the CPU chip itself, and you
> can even try swaping CPU chips to see if it follows the
> CPU or the socket, much as you do with a DIMM.
>
>
> >
> > On 06/20/2022 7:59 pm, Larry Rosenman wrote:
> >
> > > SuperMicro X8DTN+
> > >
> > > 2 Processors, 6-core/12-Thread. CPU: Intel(R) Xeon(R) CPU
> > > E5645  @ 2.40GHz (2400.20-MHz K8-class CPU)
> > >
> > > I'll bring it down and swap DIMMS around
> > >
> > > On 06/20/2022 7:57 pm, Ultima wrote:
> > >
> > > Hey Larry,
> > >
> > > One red flag I am seeing is that the error is being produced on
> > > the same CPU/bank with each error you have provided so far.
> > >
> > > Can you try and follow my original recommendation and swap
> > > currently installed DIMM with the problem DIMM slot and see
> > > if anything changes?
> > >
> > > Can you also provide the motherboard model? Also, do you
> > > have multiple CPUs installed in this system?
> > >
> > > Best regards,
> > > Richard Gallamore
> > >
> > > On Mon, Jun 20, 2022 at 5:41 PM Larry Rosenman <ler@lerctr.org> wrote:
> > >
> > > Yes and Yes.
> > >
> > > On 06/20/2022 7:37 pm, Ultima wrote:
> > >
> > > Are you sure that the module you replaced it with was good?
> > > Are you sure you replaced the correct module?
> > >
> > > Best regards,
> > > Richard Gallamore
> > >
> > > On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman <ler@lerctr.org> wrote:
> > >
> > > I'm seeing them constantly:
> > >
> > > root@freenas[~]# mcelog --dmi
> > > Hardware event. This is not a software error.
> > > MCE 0
> > > CPU 22 BANK 8 TSC 20aab486464a
> > > MISC ac29890200046444 ADDR ee2f6e800
> > > TIME 1655770989 Mon Jun 20 19:23:09 2022
> > > MCG status:
> > > Memory read ECC error
> > > Memory corrected error count (CORE_ERR_CNT): 1
> > > Memory transaction Tracker ID (RTId): 44
> > > Memory DIMM ID of error: 0
> > > Memory channel ID of error: 1
> > > Memory ECC syndrome: ac298902
> > > STATUS 8c0000400001009f MCGSTATUS 0
> > > MCGCAP 1c09 APICID 34 SOCKETID 0
> > > CPUID Vendor Intel Family 6 Model 44 Step 2
> > > WARNING: SMBIOS data is often unreliable. Take with a grain of salt!
> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> > > Device Locator: P2-DIMM2C
> > > Bank Locator: BANK14
> > > Manufacturer: Hyundai
> > > Serial Number: 40F3C20F
> > > Asset Tag:
> > > Part Number: HMT151R7BFR4C-H9
> > > Hardware event. This is not a software error.
> > > MCE 1
> > > CPU 22 BANK 8 TSC 296dfcc82582
> > > MISC ac29890200041381 ADDR ee2f6e800
> > > TIME 1655770989 Mon Jun 20 19:23:09 2022
> > > MCG status:
> > > Memory read ECC error
> > > Memory corrected error count (CORE_ERR_CNT): 1
> > > Memory transaction Tracker ID (RTId): 81
> > > Memory DIMM ID of error: 0
> > > Memory channel ID of error: 1
> > > Memory ECC syndrome: ac298902
> > > STATUS 8c0000400001009f MCGSTATUS 0
> > > MCGCAP 1c09 APICID 34 SOCKETID 0
> > > CPUID Vendor Intel Family 6 Model 44 Step 2
> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> > > Device Locator: P2-DIMM2C
> > > Bank Locator: BANK14
> > > Manufacturer: Hyundai
> > > Serial Number: 40F3C20F
> > > Asset Tag:
> > > Part Number: HMT151R7BFR4C-H9
> > > Hardware event. This is not a software error.
> > > MCE 2
> > > CPU 22 BANK 8 TSC 2a5604a6a070
> > > MISC ac29890200044281
> > > TIME 1655770989 Mon Jun 20 19:23:09 2022
> > > MCG status:
> > > Memory ECC error occurred during scrub
> > > Memory corrected error count (CORE_ERR_CNT): 1
> > > Memory transaction Tracker ID (RTId): 81
> > > Memory DIMM ID of error: 0
> > > Memory channel ID of error: 1
> > > Memory ECC syndrome: ac298902
> > > STATUS 88000040000200cf MCGSTATUS 0
> > > MCGCAP 1c09 APICID 34 SOCKETID 0
> > > CPUID Vendor Intel Family 6 Model 44 Step 2
> > > Hardware event. This is not a software error.
> > > MCE 3
> > > CPU 22 BANK 8 TSC 31e141418eb8
> > > MISC ac29890200046a4a ADDR ee2f6e800
> > > TIME 1655770989 Mon Jun 20 19:23:09 2022
> > > MCG status:
> > > Memory read ECC error
> > > Memory corrected error count (CORE_ERR_CNT): 1
> > > Memory transaction Tracker ID (RTId): 4a
> > > Memory DIMM ID of error: 0
> > > Memory channel ID of error: 1
> > > Memory ECC syndrome: ac298902
> > > STATUS 8c0000400001009f MCGSTATUS 0
> > > MCGCAP 1c09 APICID 34 SOCKETID 0
> > > CPUID Vendor Intel Family 6 Model 44 Step 2
> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> > > Device Locator: P2-DIMM2C
> > > Bank Locator: BANK14
> > > Manufacturer: Hyundai
> > > Serial Number: 40F3C20F
> > > Asset Tag:
> > > Part Number: HMT151R7BFR4C-H9
> > > Hardware event. This is not a software error.
> > > MCE 4
> > > CPU 22 BANK 8 TSC 3a014afee106
> > > MISC ac29890200046646 ADDR ee2f6e800
> > > TIME 1655770989 Mon Jun 20 19:23:09 2022
> > > MCG status:
> > > Memory read ECC error
> > > Memory corrected error count (CORE_ERR_CNT): 1
> > > Memory transaction Tracker ID (RTId): 46
> > > Memory DIMM ID of error: 0
> > > Memory channel ID of error: 1
> > > Memory ECC syndrome: ac298902
> > > STATUS 8c0000400001009f MCGSTATUS 0
> > > MCGCAP 1c09 APICID 34 SOCKETID 0
> > > CPUID Vendor Intel Family 6 Model 44 Step 2
> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> > > Device Locator: P2-DIMM2C
> > > Bank Locator: BANK14
> > > Manufacturer: Hyundai
> > > Serial Number: 40F3C20F
> > > Asset Tag:
> > > Part Number: HMT151R7BFR4C-H9
> > > Hardware event. This is not a software error.
> > > MCE 5
> > > CPU 22 BANK 8 TSC 41d1dbef1a6a
> > > MISC ac29890200046141 ADDR ee2f6e800
> > > TIME 1655770989 Mon Jun 20 19:23:09 2022
> > > MCG status:
> > > Memory read ECC error
> > > Memory corrected error count (CORE_ERR_CNT): 1
> > > Memory transaction Tracker ID (RTId): 41
> > > Memory DIMM ID of error: 0
> > > Memory channel ID of error: 1
> > > Memory ECC syndrome: ac298902
> > > STATUS 8c0000400001009f MCGSTATUS 0
> > > MCGCAP 1c09 APICID 34 SOCKETID 0
> > > CPUID Vendor Intel Family 6 Model 44 Step 2
> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> > > Device Locator: P2-DIMM2C
> > > Bank Locator: BANK14
> > > Manufacturer: Hyundai
> > > Serial Number: 40F3C20F
> > > Asset Tag:
> > > Part Number: HMT151R7BFR4C-H9
> > > Hardware event. This is not a software error.
> > > MCE 6
> > > CPU 22 BANK 8 TSC 4a1b1ecef446
> > > MISC ac29890200046a4a ADDR ee2f6e800
> > > TIME 1655770989 Mon Jun 20 19:23:09 2022
> > > MCG status:
> > > Memory read ECC error
> > > Memory corrected error count (CORE_ERR_CNT): 1
> > > Memory transaction Tracker ID (RTId): 4a
> > > Memory DIMM ID of error: 0
> > > Memory channel ID of error: 1
> > > Memory ECC syndrome: ac298902
> > > STATUS 8c0000400001009f MCGSTATUS 0
> > > MCGCAP 1c09 APICID 34 SOCKETID 0
> > > CPUID Vendor Intel Family 6 Model 44 Step 2
> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> > > Device Locator: P2-DIMM2C
> > > Bank Locator: BANK14
> > > Manufacturer: Hyundai
> > > Serial Number: 40F3C20F
> > > Asset Tag:
> > > Part Number: HMT151R7BFR4C-H9
> > > Hardware event. This is not a software error.
> > > MCE 7
> > > CPU 22 BANK 8 TSC 527bc27db776
> > > MISC ac29890200040386 ADDR ee2f6e800
> > > TIME 1655770989 Mon Jun 20 19:23:09 2022
> > > MCG status:
> > > Memory read ECC error
> > > Memory corrected error count (CORE_ERR_CNT): 1
> > > Memory transaction Tracker ID (RTId): 86
> > > Memory DIMM ID of error: 0
> > > Memory channel ID of error: 1
> > > Memory ECC syndrome: ac298902
> > > STATUS 8c0000400001009f MCGSTATUS 0
> > > MCGCAP 1c09 APICID 34 SOCKETID 0
> > > CPUID Vendor Intel Family 6 Model 44 Step 2
> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> > > Device Locator: P2-DIMM2C
> > > Bank Locator: BANK14
> > > Manufacturer: Hyundai
> > > Serial Number: 40F3C20F
> > > Asset Tag:
> > > Part Number: HMT151R7BFR4C-H9
> > > Hardware event. This is not a software error.
> > > MCE 8
> > > CPU 22 BANK 8 TSC 5aa4ecdd795a
> > > MISC ac29890200046646 ADDR ee2f6e800
> > > TIME 1655770989 Mon Jun 20 19:23:09 2022
> > > MCG status:
> > > Memory read ECC error
> > > Memory corrected error count (CORE_ERR_CNT): 1
> > > Memory transaction Tracker ID (RTId): 46
> > > Memory DIMM ID of error: 0
> > > Memory channel ID of error: 1
> > > Memory ECC syndrome: ac298902
> > > STATUS 8c0000400001009f MCGSTATUS 0
> > > MCGCAP 1c09 APICID 34 SOCKETID 0
> > > CPUID Vendor Intel Family 6 Model 44 Step 2
> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> > > Device Locator: P2-DIMM2C
> > > Bank Locator: BANK14
> > > Manufacturer: Hyundai
> > > Serial Number: 40F3C20F
> > > Asset Tag:
> > > Part Number: HMT151R7BFR4C-H9
> > > root@freenas[~]#
> > >
> > > and I replaced the DIMM yesterday :(
> > >
> > > On 06/20/2022 7:19 pm, Ultima wrote:
> > >
> > > Hey Larry,
> > >
> > > It is possible it's the motherboard itself, but it's rare. The way I
> > > would determine this is to swap the DIMM module with another
> > > populated slot on the motherboard and see if the error migrated
> > > to the new slot or not. Also, this error doesn't necessarily mean
> > > there is a problem that needs to be addressed. If you have been
> > > running the system for many months and you see ECC errors a
> > > handful of times, it can probably be safely ignored.
> > >
> > > Best regards,
> > > Richard Gallamore
> > >
> > > On Mon, Jun 20, 2022 at 3:14 PM Larry Rosenman <ler@lerctr.org>
> wrote:
> > > I've gotten a BUNCH of these on my TrueNAS server.  I've replaced this
> > > DIMM a couple of times, and still the MCE's continue.
> > > Is it possible it's Motherboard slot issue?
> > >
> > > Hardware event. This is not a software error.
> > > MCE 8
> > > CPU 22 BANK 8 TSC 5aa4ecdd795a
> > > MISC ac29890200046646 ADDR ee2f6e800
> > > TIME 1655762472 Mon Jun 20 17:01:12 2022
> > > MCG status:
> > > Memory read ECC error
> > > Memory corrected error count (CORE_ERR_CNT): 1
> > > Memory transaction Tracker ID (RTId): 46
> > > Memory DIMM ID of error: 0
> > > Memory channel ID of error: 1
> > > Memory ECC syndrome: ac298902
> > > STATUS 8c0000400001009f MCGSTATUS 0
> > > MCGCAP 1c09 APICID 34 SOCKETID 0
> > > CPUID Vendor Intel Family 6 Model 44 Step 2
> > > DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB
> > > Device Locator: P2-DIMM2C
> > > Bank Locator: BANK14
> > > Manufacturer: Hyundai
> > > Serial Number: 40F3C20F
> > > Asset Tag:
> > > Part Number: HMT151R7BFR4C-H9
> > >
> > > --
> > > Larry Rosenman                     http://www.lerctr.org/~ler
> > > Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
> > > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
> >
> > --
> > Larry Rosenman                     http://www.lerctr.org/~ler
> > Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
> > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
> >
> > --
> > Larry Rosenman                     http://www.lerctr.org/~ler
> > Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
> > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
> >
> > --
> > Larry Rosenman                     http://www.lerctr.org/~ler
> > Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
> > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
> >
> > --
> > Larry Rosenman                     http://www.lerctr.org/~ler
> > Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
> > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
>
> --
> Rod Grimes
> rgrimes@freebsd.org
>

--000000000000b43f7505e1f80db6
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Completely agree with you, Rodney. The LGA on the mot=
herboard</div><div>can be bent very easy when moving so I wanted to recomme=
nd this</div><div>last.</div><div><br></div><div>Larry, as Rodney mentioned=
, it&#39;s more or less your last option. This</div><div>is likely the CPU =
and not the module itself. There is still a small chance</div><div>that is =
motherboard/slot related, a way you can determine this is</div><div>by swap=
ping the CPU&#39;s slot 0 &lt;----&gt; slot 1 and seeing if the error moves=
.<br></div><div>As I mentioned though, be very cautious. I don&#39;t want y=
ou to be in a worse-off</div><div>state.</div><div><br></div><div>I would r=
eseat the problem CPU socket before swapping the CPUs.<br></div><div><br></=
div><div></div><div>Best regards,</div><div>Richard Gallamore<br></div></di=
v><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On T=
ue, Jun 21, 2022 at 9:06 AM Rodney W. Grimes &lt;<a href=3D"mailto:freebsd-=
rwg@gndrsh.dnsmgr.net">freebsd-rwg@gndrsh.dnsmgr.net</a>&gt; wrote:<br></di=
v><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;borde=
r-left:1px solid rgb(204,204,204);padding-left:1ex">&gt; <br>
&gt; <br>
&gt; Swapped 2 DIMMS, now we wait for the ZFS ARC to fill and start using a=
ll <br>
&gt; the memory.<br>
<br>
Depending on the results of that one thing that is often overlooked<br>
when trying to trouble shoot memory systems in modern Intel systems<br>
is the fact that the DIMM now talks directly to the CPU chip that<br>
has the memory controller built into it.=C2=A0 THUS these &quot;slot&quot; =
related<br>
ECC/Parity/blowup errors can actually be the CPU and/or the CPU<br>
socket and/or the seating of the CPU in the socket.=C2=A0 <br>
<br>
So if the error sticks with the DIMM slot and not the DIMM<br>
module the next thing I would try would be a CPU chip reseat,<br>
including a good inspection of the socket for for a damaged<br>
pin.=C2=A0 Also look at the lands on the CPU chip itself, and you<br>
can even try swaping CPU chips to see if it follows the<br>
CPU or the socket, much as you do with a DIMM.<br>
<br>
<br>
&gt; <br>
&gt; On 06/20/2022 7:59 pm, Larry Rosenman wrote:<br>
&gt; <br>
&gt; &gt; SuperMicro X8DTN+<br>
&gt; &gt; <br>
&gt; &gt; 2 Processors, 6-core/12-Thread. CPU: Intel(R) Xeon(R) CPU=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<br>
&gt; &gt; E5645=C2=A0 @ 2.40GHz (2400.20-MHz K8-class CPU)<br>
&gt; &gt; <br>
&gt; &gt; I&#39;ll bring it down and swap DIMMS around<br>
&gt; &gt; <br>
&gt; &gt; On 06/20/2022 7:57 pm, Ultima wrote:<br>
&gt; &gt; <br>
&gt; &gt; Hey Larry,<br>
&gt; &gt; <br>
&gt; &gt; One red flag I am seeing is that the error is being produced on<b=
r>
&gt; &gt; the same CPU/bank with each error you have provided so far.<br>
&gt; &gt; <br>
&gt; &gt; Can you try and follow my original recommendation and swap<br>
&gt; &gt; currently installed DIMM with the problem DIMM slot and see<br>
&gt; &gt; if anything changes?<br>
&gt; &gt; <br>
&gt; &gt; Can you also provide the motherboard model? Also, do you<br>
&gt; &gt; have multiple CPUs installed in this system?<br>
&gt; &gt; <br>
&gt; &gt; Best regards,<br>
&gt; &gt; Richard Gallamore<br>
&gt; &gt; <br>
&gt; &gt; On Mon, Jun 20, 2022 at 5:41 PM Larry Rosenman &lt;<a href=3D"mai=
lto:ler@lerctr.org" target=3D"_blank">ler@lerctr.org</a>&gt; wrote:<br>
&gt; &gt; <br>
&gt; &gt; Yes and Yes.<br>
&gt; &gt; <br>
&gt; &gt; On 06/20/2022 7:37 pm, Ultima wrote:<br>
&gt; &gt; <br>
&gt; &gt; Are you sure that the module you replaced it with was good?<br>
&gt; &gt; Are you sure you replaced the correct module?<br>
&gt; &gt; <br>
&gt; &gt; Best regards,<br>
&gt; &gt; Richard Gallamore<br>
&gt; &gt; <br>
&gt; &gt; On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman &lt;<a href=3D"mai=
lto:ler@lerctr.org" target=3D"_blank">ler@lerctr.org</a>&gt; wrote:<br>
&gt; &gt; <br>
&gt; &gt; I&#39;m seeing them constantly:<br>
&gt; &gt; <br>
&gt; &gt; root@freenas[~]# mcelog --dmi<br>
&gt; &gt; Hardware event. This is not a software error.<br>
&gt; &gt; MCE 0<br>
&gt; &gt; CPU 22 BANK 8 TSC 20aab486464a<br>
&gt; &gt; MISC ac29890200046444 ADDR ee2f6e800<br>
&gt; &gt; TIME 1655770989 Mon Jun 20 19:23:09 2022<br>
&gt; &gt; MCG status:<br>
&gt; &gt; Memory read ECC error<br>
&gt; &gt; Memory corrected error count (CORE_ERR_CNT): 1<br>
&gt; &gt; Memory transaction Tracker ID (RTId): 44<br>
&gt; &gt; Memory DIMM ID of error: 0<br>
&gt; &gt; Memory channel ID of error: 1<br>
&gt; &gt; Memory ECC syndrome: ac298902<br>
&gt; &gt; STATUS 8c0000400001009f MCGSTATUS 0<br>
&gt; &gt; MCGCAP 1c09 APICID 34 SOCKETID 0<br>
&gt; &gt; CPUID Vendor Intel Family 6 Model 44 Step 2<br>
&gt; &gt; WARNING: SMBIOS data is often unreliable. Take with a grain of sa=
lt!<br>
&gt; &gt; DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br>
&gt; &gt; Device Locator: P2-DIMM2C<br>
&gt; &gt; Bank Locator: BANK14<br>
&gt; &gt; Manufacturer: Hyundai<br>
&gt; &gt; Serial Number: 40F3C20F<br>
&gt; &gt; Asset Tag:<br>
&gt; &gt; Part Number: HMT151R7BFR4C-H9<br>
&gt; &gt; Hardware event. This is not a software error.<br>
&gt; &gt; MCE 1<br>
&gt; &gt; CPU 22 BANK 8 TSC 296dfcc82582<br>
&gt; &gt; MISC ac29890200041381 ADDR ee2f6e800<br>
&gt; &gt; TIME 1655770989 Mon Jun 20 19:23:09 2022<br>
&gt; &gt; MCG status:<br>
&gt; &gt; Memory read ECC error<br>
&gt; &gt; Memory corrected error count (CORE_ERR_CNT): 1<br>
&gt; &gt; Memory transaction Tracker ID (RTId): 81<br>
&gt; &gt; Memory DIMM ID of error: 0<br>
&gt; &gt; Memory channel ID of error: 1<br>
&gt; &gt; Memory ECC syndrome: ac298902<br>
&gt; &gt; STATUS 8c0000400001009f MCGSTATUS 0<br>
&gt; &gt; MCGCAP 1c09 APICID 34 SOCKETID 0<br>
&gt; &gt; CPUID Vendor Intel Family 6 Model 44 Step 2<br>
&gt; &gt; DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br>
&gt; &gt; Device Locator: P2-DIMM2C<br>
&gt; &gt; Bank Locator: BANK14<br>
&gt; &gt; Manufacturer: Hyundai<br>
&gt; &gt; Serial Number: 40F3C20F<br>
&gt; &gt; Asset Tag:<br>
&gt; &gt; Part Number: HMT151R7BFR4C-H9<br>
&gt; &gt; Hardware event. This is not a software error.<br>
&gt; &gt; MCE 2<br>
&gt; &gt; CPU 22 BANK 8 TSC 2a5604a6a070<br>
&gt; &gt; MISC ac29890200044281<br>
&gt; &gt; TIME 1655770989 Mon Jun 20 19:23:09 2022<br>
&gt; &gt; MCG status:<br>
&gt; &gt; Memory ECC error occurred during scrub<br>
&gt; &gt; Memory corrected error count (CORE_ERR_CNT): 1<br>
&gt; &gt; Memory transaction Tracker ID (RTId): 81<br>
&gt; &gt; Memory DIMM ID of error: 0<br>
&gt; &gt; Memory channel ID of error: 1<br>
&gt; &gt; Memory ECC syndrome: ac298902<br>
&gt; &gt; STATUS 88000040000200cf MCGSTATUS 0<br>
&gt; &gt; MCGCAP 1c09 APICID 34 SOCKETID 0<br>
&gt; &gt; CPUID Vendor Intel Family 6 Model 44 Step 2<br>
&gt; &gt; Hardware event. This is not a software error.<br>
&gt; &gt; MCE 3<br>
&gt; &gt; CPU 22 BANK 8 TSC 31e141418eb8<br>
&gt; &gt; MISC ac29890200046a4a ADDR ee2f6e800<br>
&gt; &gt; TIME 1655770989 Mon Jun 20 19:23:09 2022<br>
&gt; &gt; MCG status:<br>
&gt; &gt; Memory read ECC error<br>
&gt; &gt; Memory corrected error count (CORE_ERR_CNT): 1<br>
&gt; &gt; Memory transaction Tracker ID (RTId): 4a<br>
&gt; &gt; Memory DIMM ID of error: 0<br>
&gt; &gt; Memory channel ID of error: 1<br>
&gt; &gt; Memory ECC syndrome: ac298902<br>
&gt; &gt; STATUS 8c0000400001009f MCGSTATUS 0<br>
&gt; &gt; MCGCAP 1c09 APICID 34 SOCKETID 0<br>
&gt; &gt; CPUID Vendor Intel Family 6 Model 44 Step 2<br>
&gt; &gt; DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br>
&gt; &gt; Device Locator: P2-DIMM2C<br>
&gt; &gt; Bank Locator: BANK14<br>
&gt; &gt; Manufacturer: Hyundai<br>
&gt; &gt; Serial Number: 40F3C20F<br>
&gt; &gt; Asset Tag:<br>
&gt; &gt; Part Number: HMT151R7BFR4C-H9<br>
&gt; &gt; Hardware event. This is not a software error.<br>
&gt; &gt; MCE 4<br>
&gt; &gt; CPU 22 BANK 8 TSC 3a014afee106<br>
&gt; &gt; MISC ac29890200046646 ADDR ee2f6e800<br>
&gt; &gt; TIME 1655770989 Mon Jun 20 19:23:09 2022<br>
&gt; &gt; MCG status:<br>
&gt; &gt; Memory read ECC error<br>
&gt; &gt; Memory corrected error count (CORE_ERR_CNT): 1<br>
&gt; &gt; Memory transaction Tracker ID (RTId): 46<br>
&gt; &gt; Memory DIMM ID of error: 0<br>
&gt; &gt; Memory channel ID of error: 1<br>
&gt; &gt; Memory ECC syndrome: ac298902<br>
&gt; &gt; STATUS 8c0000400001009f MCGSTATUS 0<br>
&gt; &gt; MCGCAP 1c09 APICID 34 SOCKETID 0<br>
&gt; &gt; CPUID Vendor Intel Family 6 Model 44 Step 2<br>
&gt; &gt; DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br>
&gt; &gt; Device Locator: P2-DIMM2C<br>
&gt; &gt; Bank Locator: BANK14<br>
&gt; &gt; Manufacturer: Hyundai<br>
&gt; &gt; Serial Number: 40F3C20F<br>
&gt; &gt; Asset Tag:<br>
&gt; &gt; Part Number: HMT151R7BFR4C-H9<br>
&gt; &gt; Hardware event. This is not a software error.<br>
&gt; &gt; MCE 5<br>
&gt; &gt; CPU 22 BANK 8 TSC 41d1dbef1a6a<br>
&gt; &gt; MISC ac29890200046141 ADDR ee2f6e800<br>
&gt; &gt; TIME 1655770989 Mon Jun 20 19:23:09 2022<br>
&gt; &gt; MCG status:<br>
&gt; &gt; Memory read ECC error<br>
&gt; &gt; Memory corrected error count (CORE_ERR_CNT): 1<br>
&gt; &gt; Memory transaction Tracker ID (RTId): 41<br>
&gt; &gt; Memory DIMM ID of error: 0<br>
&gt; &gt; Memory channel ID of error: 1<br>
&gt; &gt; Memory ECC syndrome: ac298902<br>
&gt; &gt; STATUS 8c0000400001009f MCGSTATUS 0<br>
&gt; &gt; MCGCAP 1c09 APICID 34 SOCKETID 0<br>
&gt; &gt; CPUID Vendor Intel Family 6 Model 44 Step 2<br>
&gt; &gt; DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br>
&gt; &gt; Device Locator: P2-DIMM2C<br>
&gt; &gt; Bank Locator: BANK14<br>
&gt; &gt; Manufacturer: Hyundai<br>
&gt; &gt; Serial Number: 40F3C20F<br>
&gt; &gt; Asset Tag:<br>
&gt; &gt; Part Number: HMT151R7BFR4C-H9<br>
&gt; &gt; Hardware event. This is not a software error.<br>
&gt; &gt; MCE 6<br>
&gt; &gt; CPU 22 BANK 8 TSC 4a1b1ecef446<br>
&gt; &gt; MISC ac29890200046a4a ADDR ee2f6e800<br>
&gt; &gt; TIME 1655770989 Mon Jun 20 19:23:09 2022<br>
&gt; &gt; MCG status:<br>
&gt; &gt; Memory read ECC error<br>
&gt; &gt; Memory corrected error count (CORE_ERR_CNT): 1<br>
&gt; &gt; Memory transaction Tracker ID (RTId): 4a<br>
&gt; &gt; Memory DIMM ID of error: 0<br>
&gt; &gt; Memory channel ID of error: 1<br>
&gt; &gt; Memory ECC syndrome: ac298902<br>
&gt; &gt; STATUS 8c0000400001009f MCGSTATUS 0<br>
&gt; &gt; MCGCAP 1c09 APICID 34 SOCKETID 0<br>
&gt; &gt; CPUID Vendor Intel Family 6 Model 44 Step 2<br>
&gt; &gt; DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br>
&gt; &gt; Device Locator: P2-DIMM2C<br>
&gt; &gt; Bank Locator: BANK14<br>
&gt; &gt; Manufacturer: Hyundai<br>
&gt; &gt; Serial Number: 40F3C20F<br>
&gt; &gt; Asset Tag:<br>
&gt; &gt; Part Number: HMT151R7BFR4C-H9<br>
&gt; &gt; Hardware event. This is not a software error.<br>
&gt; &gt; MCE 7<br>
&gt; &gt; CPU 22 BANK 8 TSC 527bc27db776<br>
&gt; &gt; MISC ac29890200040386 ADDR ee2f6e800<br>
&gt; &gt; TIME 1655770989 Mon Jun 20 19:23:09 2022<br>
&gt; &gt; MCG status:<br>
&gt; &gt; Memory read ECC error<br>
&gt; &gt; Memory corrected error count (CORE_ERR_CNT): 1<br>
&gt; &gt; Memory transaction Tracker ID (RTId): 86<br>
&gt; &gt; Memory DIMM ID of error: 0<br>
&gt; &gt; Memory channel ID of error: 1<br>
&gt; &gt; Memory ECC syndrome: ac298902<br>
&gt; &gt; STATUS 8c0000400001009f MCGSTATUS 0<br>
&gt; &gt; MCGCAP 1c09 APICID 34 SOCKETID 0<br>
&gt; &gt; CPUID Vendor Intel Family 6 Model 44 Step 2<br>
&gt; &gt; DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br>
&gt; &gt; Device Locator: P2-DIMM2C<br>
&gt; &gt; Bank Locator: BANK14<br>
&gt; &gt; Manufacturer: Hyundai<br>
&gt; &gt; Serial Number: 40F3C20F<br>
&gt; &gt; Asset Tag:<br>
&gt; &gt; Part Number: HMT151R7BFR4C-H9<br>
&gt; &gt; Hardware event. This is not a software error.<br>
&gt; &gt; MCE 8<br>
&gt; &gt; CPU 22 BANK 8 TSC 5aa4ecdd795a<br>
&gt; &gt; MISC ac29890200046646 ADDR ee2f6e800<br>
&gt; &gt; TIME 1655770989 Mon Jun 20 19:23:09 2022<br>
&gt; &gt; MCG status:<br>
&gt; &gt; Memory read ECC error<br>
&gt; &gt; Memory corrected error count (CORE_ERR_CNT): 1<br>
&gt; &gt; Memory transaction Tracker ID (RTId): 46<br>
&gt; &gt; Memory DIMM ID of error: 0<br>
&gt; &gt; Memory channel ID of error: 1<br>
&gt; &gt; Memory ECC syndrome: ac298902<br>
&gt; &gt; STATUS 8c0000400001009f MCGSTATUS 0<br>
&gt; &gt; MCGCAP 1c09 APICID 34 SOCKETID 0<br>
&gt; &gt; CPUID Vendor Intel Family 6 Model 44 Step 2<br>
&gt; &gt; DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br>
&gt; &gt; Device Locator: P2-DIMM2C<br>
&gt; &gt; Bank Locator: BANK14<br>
&gt; &gt; Manufacturer: Hyundai<br>
&gt; &gt; Serial Number: 40F3C20F<br>
&gt; &gt; Asset Tag:<br>
&gt; &gt; Part Number: HMT151R7BFR4C-H9<br>
&gt; &gt; root@freenas[~]#<br>
&gt; &gt; <br>
&gt; &gt; and I replaced the DIMM yesterday :(<br>
&gt; &gt; <br>
&gt; &gt; On 06/20/2022 7:19 pm, Ultima wrote:<br>
&gt; &gt; <br>
&gt; &gt; Hey Larry,<br>
&gt; &gt; <br>
&gt; &gt; It is possible it&#39;s the motherboard itself, but it&#39;s rare=
. The way I<br>
&gt; &gt; would determine this is to swap the DIMM module with another<br>
&gt; &gt; populated slot on the motherboard and see if the error migrated<b=
r>
&gt; &gt; to the new slot or not. Also, this error doesn&#39;t necessarily =
mean<br>
&gt; &gt; there is a problem that needs to be addressed. If you have been<b=
r>
&gt; &gt; running the system for many months and you see ECC errors a<br>
&gt; &gt; handful of times, it can probably be safely ignored.<br>
&gt; &gt; <br>
&gt; &gt; Best regards,<br>
&gt; &gt; Richard Gallamore<br>
&gt; &gt; <br>
&gt; &gt; On Mon, Jun 20, 2022 at 3:14 PM Larry Rosenman &lt;<a href=3D"mai=
lto:ler@lerctr.org" target=3D"_blank">ler@lerctr.org</a>&gt; wrote: <br>
&gt; &gt; I&#39;ve gotten a BUNCH of these on my TrueNAS server.=C2=A0 I&#3=
9;ve replaced this<br>
&gt; &gt; DIMM a couple of times, and still the MCE&#39;s continue.<br>
&gt; &gt; Is it possible it&#39;s Motherboard slot issue?<br>
&gt; &gt; <br>
&gt; &gt; Hardware event. This is not a software error.<br>
&gt; &gt; MCE 8<br>
&gt; &gt; CPU 22 BANK 8 TSC 5aa4ecdd795a<br>
&gt; &gt; MISC ac29890200046646 ADDR ee2f6e800<br>
&gt; &gt; TIME 1655762472 Mon Jun 20 17:01:12 2022<br>
&gt; &gt; MCG status:<br>
&gt; &gt; Memory read ECC error<br>
&gt; &gt; Memory corrected error count (CORE_ERR_CNT): 1<br>
&gt; &gt; Memory transaction Tracker ID (RTId): 46<br>
&gt; &gt; Memory DIMM ID of error: 0<br>
&gt; &gt; Memory channel ID of error: 1<br>
&gt; &gt; Memory ECC syndrome: ac298902<br>
&gt; &gt; STATUS 8c0000400001009f MCGSTATUS 0<br>
&gt; &gt; MCGCAP 1c09 APICID 34 SOCKETID 0<br>
&gt; &gt; CPUID Vendor Intel Family 6 Model 44 Step 2<br>
&gt; &gt; DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB<br>
&gt; &gt; Device Locator: P2-DIMM2C<br>
&gt; &gt; Bank Locator: BANK14<br>
&gt; &gt; Manufacturer: Hyundai<br>
&gt; &gt; Serial Number: 40F3C20F<br>
&gt; &gt; Asset Tag:<br>
&gt; &gt; Part Number: HMT151R7BFR4C-H9<br>
&gt; &gt; <br>
&gt; &gt; --<br>
&gt; &gt; Larry Rosenman=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0<a href=3D"http://www.lerctr.org/~ler" rel=3D"no=
referrer" target=3D"_blank">http://www.lerctr.org/~ler</a><br>;
&gt; &gt; Phone: +1 214-642-9640=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0E-Mail: <a href=3D"mailto:ler@lerctr.org" target=3D"_bl=
ank">ler@lerctr.org</a><br>
&gt; &gt; US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106<br>
&gt; <br>
&gt; -- <br>
&gt; Larry Rosenman=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0<a href=3D"http://www.lerctr.org/~ler" rel=3D"noreferre=
r" target=3D"_blank">http://www.lerctr.org/~ler</a><br>;
&gt; Phone: +1 214-642-9640=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0E-Mail: <a href=3D"mailto:ler@lerctr.org" target=3D"_blank">l=
er@lerctr.org</a><br>
&gt; US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106<br>
&gt; <br>
&gt; -- <br>
&gt; Larry Rosenman=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0<a href=3D"http://www.lerctr.org/~ler" rel=3D"noreferre=
r" target=3D"_blank">http://www.lerctr.org/~ler</a><br>;
&gt; Phone: +1 214-642-9640=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0E-Mail: <a href=3D"mailto:ler@lerctr.org" target=3D"_blank">l=
er@lerctr.org</a><br>
&gt; US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106<br>
&gt; <br>
&gt; -- <br>
&gt; Larry Rosenman=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0<a href=3D"http://www.lerctr.org/~ler" rel=3D"noreferre=
r" target=3D"_blank">http://www.lerctr.org/~ler</a><br>;
&gt; Phone: +1 214-642-9640=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0E-Mail: <a href=3D"mailto:ler@lerctr.org" target=3D"_blank">l=
er@lerctr.org</a><br>
&gt; US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106<br>
&gt; <br>
&gt; -- <br>
&gt; Larry Rosenman=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0<a href=3D"http://www.lerctr.org/~ler" rel=3D"noreferre=
r" target=3D"_blank">http://www.lerctr.org/~ler</a><br>;
&gt; Phone: +1 214-642-9640=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0E-Mail: <a href=3D"mailto:ler@lerctr.org" target=3D"_blank">l=
er@lerctr.org</a><br>
&gt; US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106<br>
<br>
-- <br>
Rod Grimes=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<a href=3D"mailto:rgrimes@freebsd.org=
" target=3D"_blank">rgrimes@freebsd.org</a><br>
</blockquote></div>

--000000000000b43f7505e1f80db6--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANJ8om4_xEjd_Dun%2BOSAGrJMO4sO%2BqO0dChfy4TF=su5nct5Vw>