From owner-freebsd-current@FreeBSD.ORG Fri Aug 29 15:14:02 2008 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 78FF21065673 for ; Fri, 29 Aug 2008 15:14:02 +0000 (UTC) (envelope-from kirk@strauser.com) Received: from kanga.honeypot.net (kanga.honeypot.net [IPv6:2001:470:a80a:1:21f:d0ff:fe22:b8a8]) by mx1.freebsd.org (Postfix) with ESMTP id 2B3DB8FC17 for ; Fri, 29 Aug 2008 15:14:02 +0000 (UTC) (envelope-from kirk@strauser.com) Received: from localhost (localhost [127.0.0.1]) by kanga.honeypot.net (Postfix) with ESMTP id 6E98B2E139 for ; Fri, 29 Aug 2008 10:14:01 -0500 (CDT) X-Virus-Scanned: amavisd-new at honeypot.net Received: from kanga.honeypot.net ([127.0.0.1]) by localhost (kanga.honeypot.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fQtqVZIJHXOk for ; Fri, 29 Aug 2008 10:13:59 -0500 (CDT) Received: from pooh.honeypot.net (pooh.honeypot.net [10.0.5.130]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by kanga.honeypot.net (Postfix) with ESMTPSA id 1F6A72E12F for ; Fri, 29 Aug 2008 10:13:58 -0500 (CDT) Message-Id: From: Kirk Strauser To: current@freebsd.org Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v926) Date: Fri, 29 Aug 2008 10:13:57 -0500 X-Mailer: Apple Mail (2.926) Cc: Subject: System, diagnose thyself: auto-documentation for crashes X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Aug 2008 15:14:02 -0000 I was having flaky system problems that were driving me to distraction. Yesterday, I finally got a panic message with an instruction pointer, used addr2line to see that the failure was in uma_zfree_internal, searched Google, and learned that it was probably due to bad RAM. Half any hour later, memtest86 found the defective stick and the problem was solved. This led me to thinking, though: the OS already had all the information needed to figure out where the problem was. If there had been an explanation inside that function definition, FreeBSD could have automatically gone to the file, searched for that explanation, and told me why my system had probably crashed. I propose that we: 1) Settle on a standard comment format for metainformation. There are already standards like Doxygen if we didn't want to home-roll something. 2) Write a program that takes an instruction pointer and outputs the comment for the associated function. 3) Modify /etc/rc.d/savecore to run the program from #2. For instance, suppose the comments in sys/vm/uma_core.c looked like: /* * Frees an item to an INTERNAL zone or allocates a free bucket * * Arguments: * zone The zone to free to * item The item we're freeing * udata User supplied data for the dtor * skip Skip dtors and finis * * Failure: * Failures in this function are commonly due to defective RAM. */ static void uma_zfree_internal(uma_zone_t zone, void *item, void *udata, enum zfreeskip skip, int flags) { ... } If I'd seen that failure message in my syslog, I would have avoided a few days of teeth gnashing. What do you think? I think something like this could be extremely useful. Benefits: - There would be zero impact on performance because it would only touch comments and not any running code whatsoever. - It would require minimal work. - It could be done incrementally. Document known common failure points and add others with time. - It wouldn't affect any other systems. -- Kirk Strauser