Date: Fri, 29 Aug 2008 11:44:54 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-current@freebsd.org Cc: Kirk Strauser <kirk@strauser.com> Subject: Re: System, diagnose thyself: auto-documentation for crashes Message-ID: <200808291144.54193.jhb@freebsd.org> In-Reply-To: <BDDFB834-C15F-4E48-B1D1-B644940FBE42@strauser.com> References: <BDDFB834-C15F-4E48-B1D1-B644940FBE42@strauser.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday 29 August 2008 11:13:57 am Kirk Strauser wrote: > I was having flaky system problems that were driving me to > distraction. Yesterday, I finally got a panic message with an > instruction pointer, used addr2line to see that the failure was in > uma_zfree_internal, searched Google, and learned that it was probably > due to bad RAM. Half any hour later, memtest86 found the defective > stick and the problem was solved. > > This led me to thinking, though: the OS already had all the > information needed to figure out where the problem was. If there had > been an explanation inside that function definition, FreeBSD could > have automatically gone to the file, searched for that explanation, > and told me why my system had probably crashed. > > I propose that we: > > 1) Settle on a standard comment format for metainformation. There are > already standards like Doxygen if we didn't want to home-roll something. > > 2) Write a program that takes an instruction pointer and outputs the > comment for the associated function. > > 3) Modify /etc/rc.d/savecore to run the program from #2. > > For instance, suppose the comments in sys/vm/uma_core.c looked like: > > /* > * Frees an item to an INTERNAL zone or allocates a free bucket > * > * Arguments: > * zone The zone to free to > * item The item we're freeing > * udata User supplied data for the dtor > * skip Skip dtors and finis > * > * Failure: > * Failures in this function are commonly due to defective RAM. > */ > static void > uma_zfree_internal(uma_zone_t zone, void *item, void *udata, > enum zfreeskip skip, int flags) > { > ... > } > > If I'd seen that failure message in my syslog, I would have avoided a > few days of teeth gnashing. What do you think? I think something > like this could be extremely useful. Benefits: > > - There would be zero impact on performance because it would only > touch comments and not any running code whatsoever. > - It would require minimal work. > - It could be done incrementally. Document known common failure > points and add others with time. > - It wouldn't affect any other systems. See /usr/sbin/crashinfo for a start. I have patches to enable it from /etc/rc.d/savecore after generating a patch (still need to test them though). -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200808291144.54193.jhb>