Date: Fri, 29 Aug 2008 10:13:57 -0500 From: Kirk Strauser <kirk@strauser.com> To: current@freebsd.org Subject: System, diagnose thyself: auto-documentation for crashes Message-ID: <BDDFB834-C15F-4E48-B1D1-B644940FBE42@strauser.com>
next in thread | raw e-mail | index | archive | help
I was having flaky system problems that were driving me to distraction. Yesterday, I finally got a panic message with an instruction pointer, used addr2line to see that the failure was in uma_zfree_internal, searched Google, and learned that it was probably due to bad RAM. Half any hour later, memtest86 found the defective stick and the problem was solved. This led me to thinking, though: the OS already had all the information needed to figure out where the problem was. If there had been an explanation inside that function definition, FreeBSD could have automatically gone to the file, searched for that explanation, and told me why my system had probably crashed. I propose that we: 1) Settle on a standard comment format for metainformation. There are already standards like Doxygen if we didn't want to home-roll something. 2) Write a program that takes an instruction pointer and outputs the comment for the associated function. 3) Modify /etc/rc.d/savecore to run the program from #2. For instance, suppose the comments in sys/vm/uma_core.c looked like: /* * Frees an item to an INTERNAL zone or allocates a free bucket * * Arguments: * zone The zone to free to * item The item we're freeing * udata User supplied data for the dtor * skip Skip dtors and finis * * Failure: * Failures in this function are commonly due to defective RAM. */ static void uma_zfree_internal(uma_zone_t zone, void *item, void *udata, enum zfreeskip skip, int flags) { ... } If I'd seen that failure message in my syslog, I would have avoided a few days of teeth gnashing. What do you think? I think something like this could be extremely useful. Benefits: - There would be zero impact on performance because it would only touch comments and not any running code whatsoever. - It would require minimal work. - It could be done incrementally. Document known common failure points and add others with time. - It wouldn't affect any other systems. -- Kirk Strauser
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BDDFB834-C15F-4E48-B1D1-B644940FBE42>