From owner-freebsd-current@FreeBSD.ORG Wed Nov 9 05:29:54 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3EDA6106564A; Wed, 9 Nov 2011 05:29:54 +0000 (UTC) (envelope-from lacombar@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 949A48FC0A; Wed, 9 Nov 2011 05:29:53 +0000 (UTC) Received: by wyg36 with SMTP id 36so1718192wyg.13 for ; Tue, 08 Nov 2011 21:29:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=Lfjvl0AYeTf5zULG75U3ipxeKkbO1RKsnNIxrtSrnYA=; b=pS5/eNpCeQSWvfF8gnsdW6QRz3rcqaxbx/ZE/Je+71O/ok9fv3B9FsuQJ6BFFBB+qI NOcHc3q5y7bl/v1UAjD9MvsRcSpXUQDdZdjncWDY99goYP00Fnr5r8os2GM8Uv8vqxW+ Vj7t0OPBtfGExW34dllthsVIyh9qQChuIVGqk= MIME-Version: 1.0 Received: by 10.180.101.97 with SMTP id ff1mr913624wib.42.1320816592630; Tue, 08 Nov 2011 21:29:52 -0800 (PST) Received: by 10.180.81.200 with HTTP; Tue, 8 Nov 2011 21:29:52 -0800 (PST) In-Reply-To: <4EB9E6FE.3060102@freebsd.org> References: <4EB9C469.9070208@freebsd.org> <4EB9E6FE.3060102@freebsd.org> Date: Wed, 9 Nov 2011 00:29:52 -0500 Message-ID: From: Arnaud Lacombe To: Julian Elischer Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-current@freebsd.org Subject: Re: Using Instruction Pointer address in debug interfaces [Was: Re: vm_page_t related KBI [Was: Re: panic at vm_page_wire with FreeBSD 9.0 Beta 3]] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Nov 2011 05:29:54 -0000 Hi, On Tue, Nov 8, 2011 at 9:35 PM, Julian Elischer wrote: > On 11/8/11 5:52 PM, Arnaud Lacombe wrote: >> >> Hi, >> >> On Tue, Nov 8, 2011 at 7:08 PM, Julian Elischer >> =A0wrote: >>> >>> On 11/8/11 10:49 AM, Arnaud Lacombe wrote: >>>> >>>> Hi, >>>> To avoid future complaints about the fact that I would be only "talk" >>>> without "action", I did implement what I suggested above. As it is >>>> quite a large patch-set, I will not post it directly here, however, it >>>> is available on github: >>>> >>>> https://github.com/lacombar/freebsd/tree/master/topic/kern-lock-debug >>>> >>>> It convert a bunch of debug interface to use the caller instruction >>>> pointer, as well as a proof-of-concept teaching printf(9) to convert >>>> IP to symbol_name+offset. >>>> >>>> It translates in a direct saving of about +250kB on i386's GENERIC, >>>> just in kernel text size. Even the worst case, ie LOCK_DEBUG =3D=3D 0, >>>> translates to a save of +80kB. >>>> >>>> Please note that this is still WIP code. >>> >>> A couple of comments. >>> Firstly, the idea of a printf method to print the IP as symbol+offset i= s >>> an >>> interesting idea >>> that should be followed up in its own right. >>> >> FWIW, I have no credit in this idea. It has been in Linux for ages and >> ages. > > yeah as I said =A0at work I use linux and BSD... > the linux stuff that just prints out IP really annoys me. > > the list stuff and netgraph debug (which should be off in any production > system) this is, I guess, where we do not agree. You find it acceptable to run totally different code in production and during debug. I do not. This is completely insane, even more nowadays where heavy parallelism increases the likelihood of races, and subtle change in the code, even optimization, can cause total behavioral change (ie. Heisenbug). For the record, we have been tracking for more than 2 months (first occurrences happened a year ago) an mbuf corruption in the network stack, present in all released code since at least FreeBSD 7[0]. Each time we think it is fixed, we are proven wrong by customers within a few days when the system crashes again. Even the last attempt which was believed to be bullet-proof failed and crashes daily. All that to say that production code should embed enough facilities to allow the system to be fully debugged with a runtime cost as low as possible, and a code-size cost as low as possible[1]. I should be able to connect on a production machine, turn a knob, an see what is going wrong, without the customer noticing. In the worst case, when you have to enable debug-only code, it must not be done by making the non-debug case more expensive, but wrap around. The whole original point of the patches was that LOCK_FILE and LOCK_LINE are a bad answer to a wrong problem. `__FILE__, __LINE__' and the bloat introduced is not the problem, `const char *file, int line' in way too much prototypes is. Now, you make me realize that `const char *file, int line' should just be removed, not replaced by `unsigned long' or anything else. It's likely to be done in another iteration. > just require you to be able to see the console. and have sources nearby. > if you need the IP use gdb. > "console debugging" is yet another abomination which should be hunted down. Just try to do any useful work at high-pps on a serial console... > it's just what you are used to. You are obviously from the dark side > ^H^H^H^H^H^H linux. > My obedience is totally irrelevant to the problem. However, if you want to know, my heart tends to be with BSDs. Unfortunately, it's a sad love-story where your Beloved keeps deceiving you day after day. You want to change small bits at a time, make several iteration of progress to make things brighter, but your Beloved refuses any change because of too much inertia. Sad. > so you are used to doing it that way.. but don't expect us to change just > because that's what Linux does. > again, mentioning Linux is totally irrelevant. Use of Instruction Pointer are implementation details for a not so intrusive solution to the problem I pointed out, and which you are totally missing. Now, please answer this: do you find any of the bloat to the non-debug case (ie. passing a NULL pointer and a 0 integer, when `LOCK_DEBUG =3D=3D 0') worth the extra debugability comfort to be acceptable ? If you do, then your focus is on making things comfortable for developers, at the expense 100's of users, rather than making things comfortable for 100's of users, at the expense of developers. > When we have a problem at work on teh Linux driver, my first step is alwa= ys > to try duplicate it on FreeBSD because: > well, you're lucky FreeBSD supports your device! Lately, we got lately a shiny multi-queue network cards with bypass mechanism... that is not supported in FreeBSD. So currently, we got an expensive paper-weight. > 1/ half the time freebsd will just immediatly assert on something and > present you with the bug.. done. > well, certainly not from a release build; assertion are disabled. > 2/ I can run gdb through firewire on it on ANY standard unmodified kernel > and find it, where on Linux I need to > get a whole universe of stupid patches all aligned and MAYBE I might be a= ble > to see what is going on. > if it's on redhat I need to do this, on ubuntu that, on suse something el= se > ,and on different revisions > of the kernel it all changes anyhow.. > machine (even x86-64 machines) I run FreeBSD on have no firewire, neither do my desktops, this limit the usability of the feature. Moreover, I do not use mass-distros either, except for desktop[3], but small embedded firmware (ala. openwrt), even on "middle-end" system, and stick to vanilla kernel (when not playing with -next). >> That said, IP address are barely used in FreeBSD, there is no legacy. >> As such, the API should not use `unsigned long' but `void *'[0]; this >> is the natural type returned by `__builtin_return_address()' and the >> `&&' operator. This would allow to introduce a modifier to `%p' to do >> the translation. > > possibly intptr_t is what should be used. but I'd expect Bruce to drop in > here and let us us know. > whatever will suit you :) I started by using `uintptr_t', but they cannot be printed in a portable manner and the printf(9) stuff was not ready yet. - Arnaud [0]: I am able to crash any kernel between 7-STABLE to 9.STABLE within minutes, with the right pattern and (mainstream and well supported) hardware. [1]: for example, when I read in `sys/kern/kern_fail.c': * Failpoints allow for injecting fake errors into running code on the fly, * without modifying code or recompiling with flags. Failpoints are always * present, and are very efficient when disabled. and see that "very efficient when disabled" translates into a __predict_false() conditional, well, sorry, but I am very dubious. Entering the branch _is_ still a cost. A most efficient way of doing this would be using gcc's `asm goto' statement, in which case the hot path only ever see a `nop', eventually patched at runtime. [2]: soekris-like boards do not qualify for "embedded" :) [3]: FreeBSD (8-STABLE) is way to limited and un-integrated to be anywhere but useful, not to speak about kernel bug which leave the system so fracked up that you have no other choice but to hard-reboot.