Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Nov 2011 15:14:03 +0100
From:      Oliver Pinter <oliver.pntr@gmail.com>
To:        Arnaud Lacombe <lacombar@gmail.com>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Using Instruction Pointer address in debug interfaces [Was: Re: vm_page_t related KBI [Was: Re: panic at vm_page_wire with FreeBSD 9.0 Beta 3]]
Message-ID:  <CAPjTQNEU1cpoTauHvAm9tMY1eJagn-%2BmTLitV3errBq5YcZBng@mail.gmail.com>
In-Reply-To: <CACqU3MUjxFOBGF4zib=Wz6Vrt=Jrsur03g_U4-g4hg0iEnpVSA@mail.gmail.com>
References:  <CACqU3MU_Bk%2BcObCiUa2XtM7fLkSpSDzOZqoZ=khNOR-_6ptYYQ@mail.gmail.com> <4EB9C469.9070208@freebsd.org> <CACqU3MWkcKZ2cjh_VJueWUpOS7dzhu%2BMh8yYWVA8X9B9ykOW7w@mail.gmail.com> <4EB9E6FE.3060102@freebsd.org> <CACqU3MUjxFOBGF4zib=Wz6Vrt=Jrsur03g_U4-g4hg0iEnpVSA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11/9/11, Arnaud Lacombe <lacombar@gmail.com> wrote:
> Hi,
>
> On Tue, Nov 8, 2011 at 9:35 PM, Julian Elischer <julian@freebsd.org> wrote:
>> On 11/8/11 5:52 PM, Arnaud Lacombe wrote:
>>>
>>> Hi,
>>>
>>> On Tue, Nov 8, 2011 at 7:08 PM, Julian Elischer<julian@freebsd.org>
>>>  wrote:
>>>>
>>>> On 11/8/11 10:49 AM, Arnaud Lacombe wrote:
>>>>>
>>>>> Hi,
>>>>> To avoid future complaints about the fact that I would be only "talk"
>>>>> without "action", I did implement what I suggested above. As it is
>>>>> quite a large patch-set, I will not post it directly here, however, it
>>>>> is available on github:
>>>>>
>>>>> https://github.com/lacombar/freebsd/tree/master/topic/kern-lock-debug
>>>>>
>>>>> It convert a bunch of debug interface to use the caller instruction
>>>>> pointer, as well as a proof-of-concept teaching printf(9) to convert
>>>>> IP to symbol_name+offset.
>>>>>
>>>>> It translates in a direct saving of about +250kB on i386's GENERIC,
>>>>> just in kernel text size. Even the worst case, ie LOCK_DEBUG == 0,
>>>>> translates to a save of +80kB.
>>>>>
>>>>> Please note that this is still WIP code.
>>>>
>>>> A couple of comments.
>>>> Firstly, the idea of a printf method to print the IP as symbol+offset is
>>>> an
>>>> interesting idea
>>>> that should be followed up in its own right.
>>>>
>>> FWIW, I have no credit in this idea. It has been in Linux for ages and
>>> ages.
>>
>> yeah as I said  at work I use linux and BSD...
>> the linux stuff that just prints out IP really annoys me.
>>
>> the list stuff and netgraph debug (which should be off in any production
>> system)
> this is, I guess, where we do not agree. You find it acceptable to run
> totally different code in production and during debug. I do not. This
> is completely insane, even more nowadays where heavy parallelism
> increases the likelihood of races, and subtle change in the code, even
> optimization, can cause total behavioral change (ie. Heisenbug).
>
> For the record, we have been tracking for more than 2 months (first
> occurrences happened a year ago) an mbuf corruption in the network
> stack, present in all released code since at least FreeBSD 7[0]. Each
> time we think it is fixed, we are proven wrong by customers within a
> few days when the system crashes again. Even the last attempt which
> was believed to be bullet-proof failed and crashes daily.
>
> All that to say that production code should embed enough facilities to
> allow the system to be fully debugged with a runtime cost as low as
> possible, and a code-size cost as low as possible[1]. I should be able
> to connect on a production machine, turn a knob, an see what is going
> wrong, without the customer noticing.
>
> In the worst case, when you have to enable debug-only code, it must
> not be done by making the non-debug case more expensive, but wrap
> around. The whole original point of the patches was that LOCK_FILE and
> LOCK_LINE are a bad answer to a wrong problem.
>
> `__FILE__, __LINE__' and the bloat introduced is not the problem,
> `const char *file, int line' in way too much prototypes is.
>
> Now, you make me realize that `const char *file, int line' should just
> be removed, not replaced by `unsigned long' or anything else. It's
> likely to be done in another iteration.
>
>> just require you to be able to see the console. and have sources nearby.
>> if you need the IP use gdb.
>>
> "console debugging" is yet another abomination which should be hunted
> down. Just try to do any useful work at high-pps on a serial
> console...
>
>> it's just what you are used to. You are obviously from the dark side
>> ^H^H^H^H^H^H linux.
>>
> My obedience is totally irrelevant to the problem.
>
> However, if you want to know, my heart tends to be with BSDs.
> Unfortunately, it's a sad love-story where your Beloved keeps
> deceiving you day after day. You want to change small bits at a time,
> make several iteration of progress to make things brighter, but your
> Beloved refuses any change because of too much inertia. Sad.
>
>> so you are used to doing it that way.. but don't expect us to change just
>> because that's what Linux does.
>>
> again, mentioning Linux is totally irrelevant. Use of Instruction
> Pointer are implementation details for a not so intrusive solution to
> the problem I pointed out, and which you are totally missing.
>
> Now, please answer this: do you find any of the bloat to the non-debug
> case (ie. passing a NULL pointer and a 0 integer, when `LOCK_DEBUG ==
> 0') worth the extra debugability comfort to be acceptable ?
>
> If you do, then your focus is on making things comfortable for
> developers, at the expense 100's of users, rather than making things
> comfortable for 100's of users, at the expense of developers.
>
>> When we have a problem at work on teh Linux driver, my first step is
>> always
>> to try duplicate it on FreeBSD because:
>>
> well, you're lucky FreeBSD supports your device! Lately, we got lately
> a shiny multi-queue network cards with bypass mechanism... that is not
> supported in FreeBSD. So currently, we got an expensive paper-weight.
>
>> 1/ half the time freebsd will just immediatly assert on something and
>> present you with the bug.. done.
>>
> well, certainly not from a release build; assertion are disabled.
>
>> 2/ I can run gdb through firewire on it on ANY standard unmodified kernel
>> and find it, where on Linux I need to
>> get a whole universe of stupid patches all aligned and MAYBE I might be
>> able
>> to see what is going on.
>> if it's on redhat I need to do this, on ubuntu that, on suse something
>> else
>> ,and on different revisions
>> of the kernel it all changes anyhow..
>>
> machine (even x86-64 machines) I run FreeBSD on have no firewire,
> neither do my desktops, this limit the usability of the feature.
> Moreover, I do not use mass-distros either, except for desktop[3], but
> small embedded firmware (ala. openwrt), even on "middle-end" system,
> and stick to vanilla kernel (when not playing with -next).
>
>>> That said, IP address are barely used in FreeBSD, there is no legacy.
>>> As such, the API should not use `unsigned long' but `void *'[0]; this
>>> is the natural type returned by `__builtin_return_address()' and the
>>> `&&' operator. This would allow to introduce a modifier to `%p' to do
>>> the translation.
>>
>> possibly intptr_t is what should be used. but I'd expect Bruce to drop in
>> here and let us us know.
>>
> whatever will suit you :) I started by using `uintptr_t', but they
> cannot be printed in a portable manner and the printf(9) stuff was not
> ready yet.
>
>  - Arnaud
>
> [0]: I am able to crash any kernel between 7-STABLE to 9.STABLE within
> minutes, with the right pattern and (mainstream and well supported)
> hardware.

Maybe this bug related to sk(4) driver?

>
> [1]: for example, when I read in `sys/kern/kern_fail.c':
>
>  * Failpoints allow for injecting fake errors into running code on the fly,
>  * without modifying code or recompiling with flags.  Failpoints are always
>  * present, and are very efficient when disabled.
>
> and see that "very efficient when disabled" translates into a
> __predict_false() conditional, well, sorry, but I am very dubious.
> Entering the branch _is_ still a cost. A most efficient way of doing
> this would be using gcc's `asm goto' statement, in which case the hot
> path only ever see a `nop', eventually patched at runtime.
>
> [2]: soekris-like boards do not qualify for "embedded" :)
>
> [3]: FreeBSD (8-STABLE) is way to limited and un-integrated to be
> anywhere but useful, not to speak about kernel bug which leave the
> system so fracked up that you have no other choice but to hard-reboot.
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAPjTQNEU1cpoTauHvAm9tMY1eJagn-%2BmTLitV3errBq5YcZBng>