Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Oct 2003 13:21:34 +0930
From:      Greg 'groggy' Lehey <grog@FreeBSD.org>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        FreeBSD current users <FreeBSD-current@freebsd.org>
Subject:   Re: Serial debug broken in recent -CURRENT?
Message-ID:  <20031015035134.GB13080@wantadilla.lemis.com>
In-Reply-To: <20031008011915.J961@gamplex.bde.org>
References:  <20030929083007.GA33083@wantadilla.lemis.com> <20030930074500.GY45668@wantadilla.lemis.com> <20030930204919.A4354@gamplex.bde.org> <200309300932.54682.sam@errno.com> <20031008011915.J961@gamplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--Fba/0zbH8Xs+Fj9o
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wednesday,  8 October 2003 at  2:08:55 +1000, Bruce Evans wrote:
> On Tue, 30 Sep 2003, Sam Leffler wrote:
>
>> It reliably locks up for me when you break into a running system; set a
>> breakpoint; and then continue.  Machine is UP+HTT.  Haven't tried other
>> machines.
>
> This seems to be because rev.1.75 of db_interface.c disturbed some much
> larger bugs related to the ones that it fixed.  It takes miracles for
> entering ddb to even sort of work in the SMP case.=20

Ah, interesting.  I hadn't thought that it might be related to SMP.

> If one of multiple CPUs in kdb_trap() somehow stops the others, then the
> others face different problems when they restart.  They can't just return
> because debugger traps are not restartable (by just returning).  They can=
't
> just proceed because the first CPU may changed the state in such a way as
> to make proceeding in the normal way not work (e.g., it may have deleted
> a breakpoint).
>
> These problems are not correctly or completely fixed in:
>
>>>>
> Index: db_interface.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /home/ncvs/src/sys/i386/i386/db_interface.c,v
> retrieving revision 1.75
> diff -u -2 -r1.75 db_interface.c
> --- db_interface.c	7 Sep 2003 13:43:01 -0000	1.75
> +++ db_interface.c	7 Oct 2003 14:11:35 -0000
> ...
> This is supposed to stop the other CPUs either in kdb_trap() or normally.
> The timeouts are hopefully long enough for all the CPUs to stop in 1
> of these ways.  But it doesn't always work.  1 possible problem is
> that stop and start IPIs may be delivered out of order, so CPUs stopped
> in kdb_trap() may end up stopped (since we don't wait for them to see
> the stop IPI).

Correct.  This patch doesn't fix the problem on my system.  I've built
a single processor kernel (comment out SMP and APIC_IO), and that
*does* work with remote gdb, so it's almost certainly an SMP issue.  I
have a dump of a partially hanging system if that's of any help.

Greg
--
See complete headers for address and phone numbers.

--Fba/0zbH8Xs+Fj9o
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (FreeBSD)

iD8DBQE/jMRGIubykFB6QiMRApWZAJ9YtM9L9xhhIMjbKke6uHuNF0NLOQCfa1J7
RZA3QHAsN4IyvLE1vaDvQsM=
=+AOn
-----END PGP SIGNATURE-----

--Fba/0zbH8Xs+Fj9o--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031015035134.GB13080>