Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Jun 2011 17:33:37 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Marcel Moolenaar <marcel@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r222801 - in head/sys: ddb kern sys
Message-ID:  <20110607161510.M1142@besplex.bde.org>
In-Reply-To: <201106070128.p571SCVI059714@svn.freebsd.org>
References:  <201106070128.p571SCVI059714@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 7 Jun 2011, Marcel Moolenaar wrote:

> Log:
>  Fix making kernel dumps from the debugger by creating a command
>  for it.

It is impossible to fix this.  It is only possible to make it less
broken.  Making kernel dumps touches far to much space to be
controllable by a debugger.  Debugger commands really should put
back everything the way that it was before the command, except
possibly for changing a byte or two of memory, and this is too hard
to do for a complicated operation like making dumps.  The "byte or
two" changed by it should strictly be all bytes in the dump image
but nothing in the system state anywhere else.

>  Do not not expect a developer to call doadump(). Calling
>  doadump does not necessarily work when it's declared static. Nor

The bug that breaks static doadump() is presumably gcc
-finline-functions-called-once.  This breaks lots of other things.
E.g., it breaks stack traces.  A stack tracer with full debugging info
might be able to restore all the inlined functions virtually, but I've
mever seen gdb doing that, and a primitive stack tracer like the one
used by ddb and panic can't do that.  However, we already use the
technique of putting static functions that are normally called not at
all, but might be useful for ddb, by wrapping them as ddb functions
so that they might be called at least once.  They sometimes remain
static, but are still hard to inline and remove if they are called
once since, since the calls to them are via function pointers in the
ddb command tables.

>  does it necessarily do what was intended in the context of text
>  dumps. The dump command always creates a core dump.
>
>  Move printing of error messages from doadump to the dump command,
>  now that we don't have to worry about being called from DDB.

Printing in doadump() (when called from ddb) is still quite broken (like
the entire command.  doadump() still uses printf() for at least status
messages.  I think there are hacks to turn printf() into db_printf()
when in ddb context, so you gain little from moving the first printf()
into a db_printf() in the dump command.  However, since the other printf()s
are actually db_printf()s, they don't operate normally.  In particular,
they don't go to the message buffer.  This may be a feature.

The problem with turning a cally to the any() function into an 'any' command
is that it confuses naive developers into thinking that the 'any' operation
is actually supported by ddb.

ddb should use trampolines for all calls to the any() function and for
unsafe `any' commands.  This involves backing out of ddb context into
a normal context and arranging to come back.  A breakpoint in the
trampoline might be enough to come back, but perhaps there should be
options to keep more control, e.g.:
- keep interrupts masked while running the trampoline.  But this has
   never worked right even for single stepping -- single stepping involves
   backing out of ddb and coming back with a trace trap; it is important
   that the single step steps only a single instruction, but it actually
   allows any number of instructions to execute via interrupts on the
   current CPU (because it enables interrupts on the current CPU, and
   interrupts have precedence over the trace trap on at least i386), and
   it allows any number of instructions to execute on other CPUs (since
   it unstops other CPUs)
- keep other CPUs stopped while running the trampoline.  Otherwise you
   have the same bugs as for trace traps.  It is unclear if an any()
   function can run safely either with all CPUs stopped or without them
   stopped.  In general it can't, since it might be the start function
   or the stop function.

[k]gdb runs in a separate address space and/or machine, so it must use a
a trampoline to call the any() function and would need similar to have
commands.  I think gdb gets this right, but kgdb doesn't.  At best, the
kernel part of kgdb could install the trampoline and back out of kdb to
run the trampoline; it cannot keep enough control to work right since
kdb/ddb doesn't support keeping enough control for even trace traps to
work right.

kgdb also cannot reasonably have a builtin dump command.  It couldn't
do much better than calling doadump().  doadump() needs to be non-inline
to support this, and the problems with calling it directly are obvious.

> Modified:
>  head/sys/ddb/db_command.c
>  head/sys/kern/kern_shutdown.c
>  head/sys/sys/conf.h
>
> Modified: head/sys/ddb/db_command.c
> ==============================================================================
> --- head/sys/ddb/db_command.c	Tue Jun  7 01:06:49 2011	(r222800)
> +++ head/sys/ddb/db_command.c	Tue Jun  7 01:28:12 2011	(r222801)
> @@ -44,6 +44,7 @@ __FBSDID("$FreeBSD$");
> #include <sys/signalvar.h>
> #include <sys/systm.h>
> #include <sys/cons.h>
> +#include <sys/conf.h>
> #include <sys/watchdog.h>
> #include <sys/kernel.h>
>

Style bug (includes more unsorted than before ('f' < 's').

> @@ -526,6 +529,27 @@ db_error(s)
> 	kdb_reenter();
> }
>
> +static void
> +db_dump(db_expr_t dummy, boolean_t dummy2, db_expr_t dummy3, char *dummy4)
> +{
> +	int error;
> +
> +	error = doadump(FALSE);
> +	if (error) {
> +		db_printf("Cannot dump: ");
> +		switch (error) {
> +		case EBUSY:
> +			db_printf("debugger got invoked while dumping.\n");
> +			break;
> +		case ENXIO:
> +			db_printf("no dump device specified.\n");
> +			break;
> +		default:
> +			db_printf("unknown error (error=%d).\n", error);
> +			break;
> +		}
> +	}
> +}
>
> /*
>  * Call random function:
>

Style bugs (kernel messages are not terminated with a ".").

> Modified: head/sys/kern/kern_shutdown.c
> ==============================================================================
> --- head/sys/kern/kern_shutdown.c	Tue Jun  7 01:06:49 2011	(r222800)
> +++ head/sys/kern/kern_shutdown.c	Tue Jun  7 01:28:12 2011	(r222801)
> @@ -233,30 +233,32 @@ print_uptime(void)
> 	printf("%lds\n", (long)ts.tv_sec);
> }
>
> -static void
> -doadump(void)
> +int
> +doadump(boolean_t textdump)
> {
> +	boolean_t coredump;

Style bugs (use of boolean_t.  KNF uses plain int.  boolean_t is a Mach vm
type which I made the mistake of starting to centralize in FreeBSD-1, but
it is still used in only about 10 places in /sys/kern).

>
> -	/*
> -	 * Sometimes people have to call this from the kernel debugger.
> -	 * (if 'panic' can not dump)

This fixes a style bug (the "." was not at the end of the sentence).

> -	 * Give them a clue as to why they can't dump.
> -	 */
> -	if (dumper.dumper == NULL) {
> -		printf("Cannot dump. Device not defined or unavailable.\n");
> -		return;
> -	}

This fixes a style bug (sentence break of 1 space instead of 2).  Termination
with a "." may be acceptable for multiple sentences (but you changed the
output into 1 sentence by replacing the first "." with a ":").

The new message for ENXIO gives less detail than this.

The comment seems to have been wrong in limiting itself to the interactive
case.  doadump() used to return void, so callers couldn't tell if it worked.
Now it returns int, but the kern_reboot() caller still doesn't check.  So
no message is printed when one was before when RB_DUMP is specified but
dumping is not available.  There was never any message for suppression of
dumps due to cold or recursive dumping.  The call to doadump() from
kern_reboot() avoids these cases, but the one from the dump command is as
blind as a developer calling doadump() without knowing about these cases.
This may be a feature.

> +	if (dumping)
> +		return (EBUSY);

Oops, recursive dumping is prevented here, so it is impossible to force
even if it might work.

> +	if (dumper.dumper == NULL)
> +		return (ENXIO);
> ...
> @@ -425,7 +427,7 @@ kern_reboot(int howto)
> 	EVENTHANDLER_INVOKE(shutdown_post_sync, howto);
>
> 	if ((howto & (RB_HALT|RB_DUMP)) == RB_DUMP && !cold && !dumping)
> -		doadump();
> +		doadump(TRUE);
>
> 	/* Now that we're going to really halt the system... */
> 	EVENTHANDLER_INVOKE(shutdown_final, howto);
>

The check for recursion is now redundant here.  There was no warning for
recursion.  Now there is a warning for the dump command but not for here
(not sure if we can get here with dumping != 0 -- it is probably a recursive
panic then).

doadump() still doesn't check `cold', a blind developer can fall into
a cold dump and a non-blind developer can try to force it.  But maybe
`cold' implies dumper.dumper == NULL, so the check of `cold' here is
just redundant.

> Modified: head/sys/sys/conf.h
> ==============================================================================
> --- head/sys/sys/conf.h	Tue Jun  7 01:06:49 2011	(r222800)
> +++ head/sys/sys/conf.h	Tue Jun  7 01:28:12 2011	(r222801)
> @@ -332,6 +332,7 @@ struct dumperinfo {
> int set_dumper(struct dumperinfo *);
> int dump_write(struct dumperinfo *, void *, vm_offset_t, off_t, size_t);
> void dumpsys(struct dumperinfo *);
> +int doadump(boolean_t);
> extern int dumping;		/* system is dumping */
>
> #endif /* _KERNEL */
>

Style bug (declarations more unsorted than before ('o' < 'u').

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110607161510.M1142>