From owner-freebsd-current@FreeBSD.ORG Wed Nov 9 14:14:05 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 89BED106564A for ; Wed, 9 Nov 2011 14:14:05 +0000 (UTC) (envelope-from oliver.pntr@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id 262B08FC17 for ; Wed, 9 Nov 2011 14:14:04 +0000 (UTC) Received: by qyc1 with SMTP id 1so5872863qyc.13 for ; Wed, 09 Nov 2011 06:14:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=30Qc0q5bbwQyIS6LfY6Z48J+VqLf6EUfAkvhFIY0Kb0=; b=f9gB4U0FN7K4hcrnLfgSp2Cnf4JIyrvJbJe2GvYCvlz1PICqiJqqGKwSsFMwLX4K9u luFFVwMW6zaFCji4LG0S7atum7mjCqOgsIBf5pExlI8N3YiWBZQz4qwmRL1poJzxQ3h3 MVL8HgjBPOJDsUL1MoqCVNH33lo97ohIzILJw= MIME-Version: 1.0 Received: by 10.182.74.37 with SMTP id q5mr582365obv.32.1320848043901; Wed, 09 Nov 2011 06:14:03 -0800 (PST) Received: by 10.182.42.104 with HTTP; Wed, 9 Nov 2011 06:14:03 -0800 (PST) In-Reply-To: References: <4EB9C469.9070208@freebsd.org> <4EB9E6FE.3060102@freebsd.org> Date: Wed, 9 Nov 2011 15:14:03 +0100 Message-ID: From: Oliver Pinter To: Arnaud Lacombe Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-current@freebsd.org Subject: Re: Using Instruction Pointer address in debug interfaces [Was: Re: vm_page_t related KBI [Was: Re: panic at vm_page_wire with FreeBSD 9.0 Beta 3]] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Nov 2011 14:14:05 -0000 On 11/9/11, Arnaud Lacombe wrote: > Hi, > > On Tue, Nov 8, 2011 at 9:35 PM, Julian Elischer wrote: >> On 11/8/11 5:52 PM, Arnaud Lacombe wrote: >>> >>> Hi, >>> >>> On Tue, Nov 8, 2011 at 7:08 PM, Julian Elischer >>> wrote: >>>> >>>> On 11/8/11 10:49 AM, Arnaud Lacombe wrote: >>>>> >>>>> Hi, >>>>> To avoid future complaints about the fact that I would be only "talk" >>>>> without "action", I did implement what I suggested above. As it is >>>>> quite a large patch-set, I will not post it directly here, however, it >>>>> is available on github: >>>>> >>>>> https://github.com/lacombar/freebsd/tree/master/topic/kern-lock-debug >>>>> >>>>> It convert a bunch of debug interface to use the caller instruction >>>>> pointer, as well as a proof-of-concept teaching printf(9) to convert >>>>> IP to symbol_name+offset. >>>>> >>>>> It translates in a direct saving of about +250kB on i386's GENERIC, >>>>> just in kernel text size. Even the worst case, ie LOCK_DEBUG == 0, >>>>> translates to a save of +80kB. >>>>> >>>>> Please note that this is still WIP code. >>>> >>>> A couple of comments. >>>> Firstly, the idea of a printf method to print the IP as symbol+offset is >>>> an >>>> interesting idea >>>> that should be followed up in its own right. >>>> >>> FWIW, I have no credit in this idea. It has been in Linux for ages and >>> ages. >> >> yeah as I said at work I use linux and BSD... >> the linux stuff that just prints out IP really annoys me. >> >> the list stuff and netgraph debug (which should be off in any production >> system) > this is, I guess, where we do not agree. You find it acceptable to run > totally different code in production and during debug. I do not. This > is completely insane, even more nowadays where heavy parallelism > increases the likelihood of races, and subtle change in the code, even > optimization, can cause total behavioral change (ie. Heisenbug). > > For the record, we have been tracking for more than 2 months (first > occurrences happened a year ago) an mbuf corruption in the network > stack, present in all released code since at least FreeBSD 7[0]. Each > time we think it is fixed, we are proven wrong by customers within a > few days when the system crashes again. Even the last attempt which > was believed to be bullet-proof failed and crashes daily. > > All that to say that production code should embed enough facilities to > allow the system to be fully debugged with a runtime cost as low as > possible, and a code-size cost as low as possible[1]. I should be able > to connect on a production machine, turn a knob, an see what is going > wrong, without the customer noticing. > > In the worst case, when you have to enable debug-only code, it must > not be done by making the non-debug case more expensive, but wrap > around. The whole original point of the patches was that LOCK_FILE and > LOCK_LINE are a bad answer to a wrong problem. > > `__FILE__, __LINE__' and the bloat introduced is not the problem, > `const char *file, int line' in way too much prototypes is. > > Now, you make me realize that `const char *file, int line' should just > be removed, not replaced by `unsigned long' or anything else. It's > likely to be done in another iteration. > >> just require you to be able to see the console. and have sources nearby. >> if you need the IP use gdb. >> > "console debugging" is yet another abomination which should be hunted > down. Just try to do any useful work at high-pps on a serial > console... > >> it's just what you are used to. You are obviously from the dark side >> ^H^H^H^H^H^H linux. >> > My obedience is totally irrelevant to the problem. > > However, if you want to know, my heart tends to be with BSDs. > Unfortunately, it's a sad love-story where your Beloved keeps > deceiving you day after day. You want to change small bits at a time, > make several iteration of progress to make things brighter, but your > Beloved refuses any change because of too much inertia. Sad. > >> so you are used to doing it that way.. but don't expect us to change just >> because that's what Linux does. >> > again, mentioning Linux is totally irrelevant. Use of Instruction > Pointer are implementation details for a not so intrusive solution to > the problem I pointed out, and which you are totally missing. > > Now, please answer this: do you find any of the bloat to the non-debug > case (ie. passing a NULL pointer and a 0 integer, when `LOCK_DEBUG == > 0') worth the extra debugability comfort to be acceptable ? > > If you do, then your focus is on making things comfortable for > developers, at the expense 100's of users, rather than making things > comfortable for 100's of users, at the expense of developers. > >> When we have a problem at work on teh Linux driver, my first step is >> always >> to try duplicate it on FreeBSD because: >> > well, you're lucky FreeBSD supports your device! Lately, we got lately > a shiny multi-queue network cards with bypass mechanism... that is not > supported in FreeBSD. So currently, we got an expensive paper-weight. > >> 1/ half the time freebsd will just immediatly assert on something and >> present you with the bug.. done. >> > well, certainly not from a release build; assertion are disabled. > >> 2/ I can run gdb through firewire on it on ANY standard unmodified kernel >> and find it, where on Linux I need to >> get a whole universe of stupid patches all aligned and MAYBE I might be >> able >> to see what is going on. >> if it's on redhat I need to do this, on ubuntu that, on suse something >> else >> ,and on different revisions >> of the kernel it all changes anyhow.. >> > machine (even x86-64 machines) I run FreeBSD on have no firewire, > neither do my desktops, this limit the usability of the feature. > Moreover, I do not use mass-distros either, except for desktop[3], but > small embedded firmware (ala. openwrt), even on "middle-end" system, > and stick to vanilla kernel (when not playing with -next). > >>> That said, IP address are barely used in FreeBSD, there is no legacy. >>> As such, the API should not use `unsigned long' but `void *'[0]; this >>> is the natural type returned by `__builtin_return_address()' and the >>> `&&' operator. This would allow to introduce a modifier to `%p' to do >>> the translation. >> >> possibly intptr_t is what should be used. but I'd expect Bruce to drop in >> here and let us us know. >> > whatever will suit you :) I started by using `uintptr_t', but they > cannot be printed in a portable manner and the printf(9) stuff was not > ready yet. > > - Arnaud > > [0]: I am able to crash any kernel between 7-STABLE to 9.STABLE within > minutes, with the right pattern and (mainstream and well supported) > hardware. Maybe this bug related to sk(4) driver? > > [1]: for example, when I read in `sys/kern/kern_fail.c': > > * Failpoints allow for injecting fake errors into running code on the fly, > * without modifying code or recompiling with flags. Failpoints are always > * present, and are very efficient when disabled. > > and see that "very efficient when disabled" translates into a > __predict_false() conditional, well, sorry, but I am very dubious. > Entering the branch _is_ still a cost. A most efficient way of doing > this would be using gcc's `asm goto' statement, in which case the hot > path only ever see a `nop', eventually patched at runtime. > > [2]: soekris-like boards do not qualify for "embedded" :) > > [3]: FreeBSD (8-STABLE) is way to limited and un-integrated to be > anywhere but useful, not to speak about kernel bug which leave the > system so fracked up that you have no other choice but to hard-reboot. > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >