Date: Tue, 7 Jun 2011 17:33:37 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Marcel Moolenaar <marcel@freebsd.org> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r222801 - in head/sys: ddb kern sys Message-ID: <20110607161510.M1142@besplex.bde.org> In-Reply-To: <201106070128.p571SCVI059714@svn.freebsd.org> References: <201106070128.p571SCVI059714@svn.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 7 Jun 2011, Marcel Moolenaar wrote: > Log: > Fix making kernel dumps from the debugger by creating a command > for it. It is impossible to fix this. It is only possible to make it less broken. Making kernel dumps touches far to much space to be controllable by a debugger. Debugger commands really should put back everything the way that it was before the command, except possibly for changing a byte or two of memory, and this is too hard to do for a complicated operation like making dumps. The "byte or two" changed by it should strictly be all bytes in the dump image but nothing in the system state anywhere else. > Do not not expect a developer to call doadump(). Calling > doadump does not necessarily work when it's declared static. Nor The bug that breaks static doadump() is presumably gcc -finline-functions-called-once. This breaks lots of other things. E.g., it breaks stack traces. A stack tracer with full debugging info might be able to restore all the inlined functions virtually, but I've mever seen gdb doing that, and a primitive stack tracer like the one used by ddb and panic can't do that. However, we already use the technique of putting static functions that are normally called not at all, but might be useful for ddb, by wrapping them as ddb functions so that they might be called at least once. They sometimes remain static, but are still hard to inline and remove if they are called once since, since the calls to them are via function pointers in the ddb command tables. > does it necessarily do what was intended in the context of text > dumps. The dump command always creates a core dump. > > Move printing of error messages from doadump to the dump command, > now that we don't have to worry about being called from DDB. Printing in doadump() (when called from ddb) is still quite broken (like the entire command. doadump() still uses printf() for at least status messages. I think there are hacks to turn printf() into db_printf() when in ddb context, so you gain little from moving the first printf() into a db_printf() in the dump command. However, since the other printf()s are actually db_printf()s, they don't operate normally. In particular, they don't go to the message buffer. This may be a feature. The problem with turning a cally to the any() function into an 'any' command is that it confuses naive developers into thinking that the 'any' operation is actually supported by ddb. ddb should use trampolines for all calls to the any() function and for unsafe `any' commands. This involves backing out of ddb context into a normal context and arranging to come back. A breakpoint in the trampoline might be enough to come back, but perhaps there should be options to keep more control, e.g.: - keep interrupts masked while running the trampoline. But this has never worked right even for single stepping -- single stepping involves backing out of ddb and coming back with a trace trap; it is important that the single step steps only a single instruction, but it actually allows any number of instructions to execute via interrupts on the current CPU (because it enables interrupts on the current CPU, and interrupts have precedence over the trace trap on at least i386), and it allows any number of instructions to execute on other CPUs (since it unstops other CPUs) - keep other CPUs stopped while running the trampoline. Otherwise you have the same bugs as for trace traps. It is unclear if an any() function can run safely either with all CPUs stopped or without them stopped. In general it can't, since it might be the start function or the stop function. [k]gdb runs in a separate address space and/or machine, so it must use a a trampoline to call the any() function and would need similar to have commands. I think gdb gets this right, but kgdb doesn't. At best, the kernel part of kgdb could install the trampoline and back out of kdb to run the trampoline; it cannot keep enough control to work right since kdb/ddb doesn't support keeping enough control for even trace traps to work right. kgdb also cannot reasonably have a builtin dump command. It couldn't do much better than calling doadump(). doadump() needs to be non-inline to support this, and the problems with calling it directly are obvious. > Modified: > head/sys/ddb/db_command.c > head/sys/kern/kern_shutdown.c > head/sys/sys/conf.h > > Modified: head/sys/ddb/db_command.c > ============================================================================== > --- head/sys/ddb/db_command.c Tue Jun 7 01:06:49 2011 (r222800) > +++ head/sys/ddb/db_command.c Tue Jun 7 01:28:12 2011 (r222801) > @@ -44,6 +44,7 @@ __FBSDID("$FreeBSD$"); > #include <sys/signalvar.h> > #include <sys/systm.h> > #include <sys/cons.h> > +#include <sys/conf.h> > #include <sys/watchdog.h> > #include <sys/kernel.h> > Style bug (includes more unsorted than before ('f' < 's'). > @@ -526,6 +529,27 @@ db_error(s) > kdb_reenter(); > } > > +static void > +db_dump(db_expr_t dummy, boolean_t dummy2, db_expr_t dummy3, char *dummy4) > +{ > + int error; > + > + error = doadump(FALSE); > + if (error) { > + db_printf("Cannot dump: "); > + switch (error) { > + case EBUSY: > + db_printf("debugger got invoked while dumping.\n"); > + break; > + case ENXIO: > + db_printf("no dump device specified.\n"); > + break; > + default: > + db_printf("unknown error (error=%d).\n", error); > + break; > + } > + } > +} > > /* > * Call random function: > Style bugs (kernel messages are not terminated with a "."). > Modified: head/sys/kern/kern_shutdown.c > ============================================================================== > --- head/sys/kern/kern_shutdown.c Tue Jun 7 01:06:49 2011 (r222800) > +++ head/sys/kern/kern_shutdown.c Tue Jun 7 01:28:12 2011 (r222801) > @@ -233,30 +233,32 @@ print_uptime(void) > printf("%lds\n", (long)ts.tv_sec); > } > > -static void > -doadump(void) > +int > +doadump(boolean_t textdump) > { > + boolean_t coredump; Style bugs (use of boolean_t. KNF uses plain int. boolean_t is a Mach vm type which I made the mistake of starting to centralize in FreeBSD-1, but it is still used in only about 10 places in /sys/kern). > > - /* > - * Sometimes people have to call this from the kernel debugger. > - * (if 'panic' can not dump) This fixes a style bug (the "." was not at the end of the sentence). > - * Give them a clue as to why they can't dump. > - */ > - if (dumper.dumper == NULL) { > - printf("Cannot dump. Device not defined or unavailable.\n"); > - return; > - } This fixes a style bug (sentence break of 1 space instead of 2). Termination with a "." may be acceptable for multiple sentences (but you changed the output into 1 sentence by replacing the first "." with a ":"). The new message for ENXIO gives less detail than this. The comment seems to have been wrong in limiting itself to the interactive case. doadump() used to return void, so callers couldn't tell if it worked. Now it returns int, but the kern_reboot() caller still doesn't check. So no message is printed when one was before when RB_DUMP is specified but dumping is not available. There was never any message for suppression of dumps due to cold or recursive dumping. The call to doadump() from kern_reboot() avoids these cases, but the one from the dump command is as blind as a developer calling doadump() without knowing about these cases. This may be a feature. > + if (dumping) > + return (EBUSY); Oops, recursive dumping is prevented here, so it is impossible to force even if it might work. > + if (dumper.dumper == NULL) > + return (ENXIO); > ... > @@ -425,7 +427,7 @@ kern_reboot(int howto) > EVENTHANDLER_INVOKE(shutdown_post_sync, howto); > > if ((howto & (RB_HALT|RB_DUMP)) == RB_DUMP && !cold && !dumping) > - doadump(); > + doadump(TRUE); > > /* Now that we're going to really halt the system... */ > EVENTHANDLER_INVOKE(shutdown_final, howto); > The check for recursion is now redundant here. There was no warning for recursion. Now there is a warning for the dump command but not for here (not sure if we can get here with dumping != 0 -- it is probably a recursive panic then). doadump() still doesn't check `cold', a blind developer can fall into a cold dump and a non-blind developer can try to force it. But maybe `cold' implies dumper.dumper == NULL, so the check of `cold' here is just redundant. > Modified: head/sys/sys/conf.h > ============================================================================== > --- head/sys/sys/conf.h Tue Jun 7 01:06:49 2011 (r222800) > +++ head/sys/sys/conf.h Tue Jun 7 01:28:12 2011 (r222801) > @@ -332,6 +332,7 @@ struct dumperinfo { > int set_dumper(struct dumperinfo *); > int dump_write(struct dumperinfo *, void *, vm_offset_t, off_t, size_t); > void dumpsys(struct dumperinfo *); > +int doadump(boolean_t); > extern int dumping; /* system is dumping */ > > #endif /* _KERNEL */ > Style bug (declarations more unsorted than before ('o' < 'u'). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110607161510.M1142>