Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Apr 2004 18:48:51 -0700
From:      Kris Kennaway <kris@obsecurity.org>
To:        Rick Updegrove <dislists@updegrove.net>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: 4.9 SMP Stability?
Message-ID:  <20040415014851.GA58873@xor.obsecurity.org>
In-Reply-To: <407DE32B.8040304@updegrove.net>
References:  <407C5AED.9040709@updegrove.net> <407C76A6.5080502@users.sourceforge.net> <407CA3D6.2090803@updegrove.net> <20040414083216.A45296@server.gisp.dk> <407D466E.9060900@updegrove.net> <407DBD39.6020405@updegrove.net> <20040414232312.GA56901@xor.obsecurity.org> <407DCB29.8010109@updegrove.net> <20040415000022.GA57253@xor.obsecurity.org> <407DE32B.8040304@updegrove.net>

next in thread | previous in thread | raw e-mail | index | archive | help

--mP3DRpeJDSE+ciuQ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Apr 14, 2004 at 06:19:39PM -0700, Rick Updegrove wrote:

> I will do this just as soon as I sent this reply, which has more
> questions I need answered.  Besides, I need to run the new BIOS with the
> 4.10-BETA kernel until it crashes to eliminate the BIOS as a suspect righ=
t?

Yes.

>=20
> >* The hardware is all in order, you don't have mismatched components
> >like CPUs with different steppings, etc.
>=20
> This may sound silly but how do I verify this?

Run mptable.

> (I have attached dmesg -a at the bottom of this email in case that helps)
>=20
> >These three points hold *whether or not an older version of FreeBSD
> >works for you*, because different versions of FreeBSD interact in
> >different ways with the hardware, and a previously existing problem
> >may suddenly leap out at you when you run a different version.
>=20
> Sorry but to me the above paragraph is confusing.  I don't agree with
> what I think it says.
>=20
> The hardware runs just fine with 4.8-STABLE so I don't think you can=20
> convince me that my hardware is the cause of this problem.

Here's an example of how it's true:

Older versions of FreeBSD did not make use of hyperthreading on CPUs
that support it.  Some early attempts from motherboard vendors to
support HTT were flawed, and can cause runtime problems.  You'd see
that an older version FreeBSD "works fine", but that's only because
it's not exercising the bug on your system.

There are other examples.  Even something as simple and "harmless" as
a change in timing of I/O can expose hardware problems.  Note that I'm
not trying to pass of what may well be an OS bug, but these
possibilities cannot just be discounted.

> >* you're not using out-of-date kernel modules, since in general they
> >must be rebuilt whenever you update your kernel.
>=20
> How do I verify this?

Look in /modules for stale files, and check kldstat to see what you're
running.

> /usr/sbin/config MYKERNEL
> cd ../../compile/MYKERNEL
> make depend
> cd /usr/src
> make -j4 buildworld
> cd /usr/src
> make buildkernel KERNCONF=3DMYKERNEL
> make installkernel KERNCONF=3DMYKERNEL
> make installworld
> cd /dev
> /bin/sh MAKEDEV all
> cd /usr/src/release/sysinstall
> make all install
> shutdown -r now
>=20
> Am I missing anything specific?

This doesn't correspond to the recommended upgrade procedure (yes, in
the handbook :-), e.g. you omit running mergemaster, and run the
installworld target before rebooting, but that shouldn't cause panics.

It will, however, cause you to destroy your system if you attempted to
use that method to update to 5.x ;-)

> >You said the machine panicked. =20
>=20
> I said the machine reboots without any warning and without leaving
> anything useful in any of the logs.

Are you sure that nothing is displayed on the system console before it
reboots?  Hooking up a serial console can be a good way to catch this.

> >When you encounter a panic, the useful
> >thing to do is to obtain a debugging traceback, as described in the
> >developers handbook.
> >
> >  http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/k=
erneldebug.html
> >
> >Your bug report will be more useful the more relevant details you can
> >provide about it.  For example, provide a copy of boot -v, and details
> >of what you are doing to provoke the problem, what you have tried to
> >work around it, and any other partial results you might have.
>=20
> boot -v
> -bash: boot: command not found

man boot and see the 'v' option.

> APIC_IO: Testing 8254 interrupt delivery
> APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0
> intpin 2
> APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0

This indicates a problem - I don't know how serious - the BIOS update
may help.

Also, buggy firmware on raid controllers can cause problems.  Also try
updating that.

Kris

--mP3DRpeJDSE+ciuQ
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (FreeBSD)

iD8DBQFAfeoCWry0BWjoQKURAoSLAKDIVziHLvrYgIo92yxQNiMRL9Dv4QCgpfZJ
cj/9wN4O3dNLGOP5yF2HLNU=
=fPSL
-----END PGP SIGNATURE-----

--mP3DRpeJDSE+ciuQ--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040415014851.GA58873>