From owner-freebsd-current@FreeBSD.ORG Fri Aug 29 19:12:46 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B58601065675 for ; Fri, 29 Aug 2008 19:12:46 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 2E5E48FC12 for ; Fri, 29 Aug 2008 19:12:46 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m7TJCPIm098064; Fri, 29 Aug 2008 15:12:39 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-current@freebsd.org Date: Fri, 29 Aug 2008 11:44:54 -0400 User-Agent: KMail/1.9.7 References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200808291144.54193.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]); Fri, 29 Aug 2008 15:12:39 -0400 (EDT) X-Virus-Scanned: ClamAV 0.93.1/8117/Fri Aug 29 10:55:12 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.3 required=4.2 tests=AWL,BAYES_00, DATE_IN_PAST_03_06,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Kirk Strauser Subject: Re: System, diagnose thyself: auto-documentation for crashes X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Aug 2008 19:12:46 -0000 On Friday 29 August 2008 11:13:57 am Kirk Strauser wrote: > I was having flaky system problems that were driving me to > distraction. Yesterday, I finally got a panic message with an > instruction pointer, used addr2line to see that the failure was in > uma_zfree_internal, searched Google, and learned that it was probably > due to bad RAM. Half any hour later, memtest86 found the defective > stick and the problem was solved. > > This led me to thinking, though: the OS already had all the > information needed to figure out where the problem was. If there had > been an explanation inside that function definition, FreeBSD could > have automatically gone to the file, searched for that explanation, > and told me why my system had probably crashed. > > I propose that we: > > 1) Settle on a standard comment format for metainformation. There are > already standards like Doxygen if we didn't want to home-roll something. > > 2) Write a program that takes an instruction pointer and outputs the > comment for the associated function. > > 3) Modify /etc/rc.d/savecore to run the program from #2. > > For instance, suppose the comments in sys/vm/uma_core.c looked like: > > /* > * Frees an item to an INTERNAL zone or allocates a free bucket > * > * Arguments: > * zone The zone to free to > * item The item we're freeing > * udata User supplied data for the dtor > * skip Skip dtors and finis > * > * Failure: > * Failures in this function are commonly due to defective RAM. > */ > static void > uma_zfree_internal(uma_zone_t zone, void *item, void *udata, > enum zfreeskip skip, int flags) > { > ... > } > > If I'd seen that failure message in my syslog, I would have avoided a > few days of teeth gnashing. What do you think? I think something > like this could be extremely useful. Benefits: > > - There would be zero impact on performance because it would only > touch comments and not any running code whatsoever. > - It would require minimal work. > - It could be done incrementally. Document known common failure > points and add others with time. > - It wouldn't affect any other systems. See /usr/sbin/crashinfo for a start. I have patches to enable it from /etc/rc.d/savecore after generating a patch (still need to test them though). -- John Baldwin