Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Jan 2011 17:02:30 -0500
From:      "Michael Jung" <mikej@paymentallianceintl.com>
To:        "John Baldwin" <jhb@freebsd.org>, <freebsd-current@freebsd.org>
Subject:   Re: unknown mtx_assert at /usr/src/sys/x86/x86/io_apic.c:161
Message-ID:  <C95B7826.2B40F%mikej@paymentallianceintl.com>
In-Reply-To: <C95668AF.2769D%mikej@paymentallianceintl.com>

next in thread | previous in thread | raw e-mail | index | archive | help



On 1/14/11 8:55 PM, "Michael Jung" <mikej@paymentallianceintl.com> =
wrote:

> John:
>=20
> Thanks, I actually didn=B9t see the MCA errors on the screen as the =
system has
> reloaded but noted them in the ddb.txt file last night.
>=20
> The Motherboard, CPU, Memory and PS were replaced today.  I=B9ll post =
back if
> this has or not corrected the problem but I suspect you are on target =
in
> that the hardware was defective.  This machine was remote and I found =
the
> fan in the power supply not working, so I=B9m suspecting that the CPU =
was or
> other logic was damaged.
>=20
> Thanks for your reply.
>=20
> --mikej
>=20
>=20
> On 1/14/11 4:13 PM, "John Baldwin" <jhb@freebsd.org> wrote:
>=20
>> > On Thursday, January 13, 2011 11:26:46 am Michael Jung wrote:
>>>> >> > Links to crash info below.
>>>> >> > http://216.26.153.6/msgbuf.txt
>> >
>> > This might be a hardware problem.  The panic you got is a "should =
never
>> > happen" panic.  Note that in the code line sourced, the second =
argument to
>> > mtx_assert() is MA_OWNED.  The panic is saying that it is some =
invalid
>> value
>> > (i.e. something other than MA_OWNED).  Given that is a constant, =
that's not
>> > very likely at all barring some hardware glitch.
>> >
>> > You do have a somewhat scary looking machine check logged before =
your
>> panic:
>> >
>> > MCA: Bank 1, Status 0xd000000000000171
>> > MCA: Global Cap 0x0000000000000105, Status 0x0000000000000000
>> > MCA: Vendor "AuthenticAMD", ID 0x20fc2, APIC ID 0
>> > MCA: CPU 0 COR OVER ICACHE L1 EVICT error
>> >
>> > It is a correctable error, but given the nature of the panic I'd =
suspect a
>> > hardware problem.
>> >
>> > mcelog doesn't provide many more details:
>> >
>> > HARDWARE ERROR. This is *NOT* a software problem!
>> > Please contact your hardware vendor
>> > CPU 0 1 instruction cache
>> >        bit62 =3D error overflow (multiple errors)
>> >   memory/cache error 'evict mem transaction, instruction =
transaction, level
>> 1'
>> > STATUS d000000000000171 MCGSTATUS 0
>> > MCGCAP 105 APICID 0 SOCKETID 0
>> > CPUID Vendor AMD Family 15 Model 44
>> >
>> > --
>> > John Baldwin
>> >
>=20
> The box has run fine since hardware was replaced.  Thanks for you =
help.
>=20
> ---mikej


CONFIDENTIALITY NOTE: This message is intended only for the use
of the individual or entity to whom it is addressed and may contain=20
information that is privileged, confidential, and exempt from=20
disclosure under applicable law. If the reader of this message is=20
not the intended recipient, you are hereby notified that any=20
dissemination, distribution or copying of this communication=20
is strictly prohibited. If you have received this transmission=20
in error, please notify us by telephone at (502) 212-4001 or=20
notify us at PAI , Dept. 99, 11857 Commonwealth Drive,=20
Louisville, KY  40299.  Thank you.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C95B7826.2B40F%mikej>