Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Dec 2012 21:32:45 -0800
From:      Peter Wemm <peter@wemm.org>
To:        Alfred Perlstein <bright@mu.org>
Cc:        "arch@freebsd.org" <arch@freebsd.org>, Adrian Chadd <adrian@freebsd.org>, Rui Paulo <rpaulo@freebsd.org>, Alfred Perlstein <alfred@ixsystems.com>
Subject:   Re: UPDATE Re: making use of userland dtrace on FreeBSD
Message-ID:  <CAGE5yCrnoNhOh3VaYU3bO6BwA=bpxD5QzkZvD%2BHaUwvXNQ%2BUfw@mail.gmail.com>
In-Reply-To: <50DBD193.7080505@mu.org>
References:  <50D49DFF.3060803@ixsystems.com> <50DBC7E2.1070505@mu.org> <CAGE5yCq46NFKKzSUZq=jz0NwEnWdjPTK_0fpZ%2BwWV9FA0BSQCg@mail.gmail.com> <50DBD193.7080505@mu.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Dec 26, 2012 at 8:41 PM, Alfred Perlstein <bright@mu.org> wrote:
> On 12/26/12 8:21 PM, Peter Wemm wrote:
>>
>> On Wed, Dec 26, 2012 at 8:00 PM, Alfred Perlstein <bright@mu.org> wrote:
>>
>>> What would be the drawbacks?  I don't want to hurt freebsd for heavy
>>> performance, but I think this functionality should work out of the box
>>> for
>>> most people.
>>
>> The drawbacks are mostly performance related.  It defeats a certain
>> hardware optimizations for call/return on leaf functions.  It'll
>> mostly affect things like math, crypto, compression and multimedia
>> libraries (that's ffmpeg, bzip2/gzip/libarchive, openssl, etc) but, we
>> generally don't seem to care about that sort of performance anyway, so
>> what's one more loss?
>
>
> Can you clarify some?  If it was somewhat easy to re-add
> -fomit-frame-pointer to critical libraries like this, then that would be OK?

No, you can't add MD flags like this.  The way to do it is see things
like PIC, WARNS, etc where you can do overrides of defaults on a
directory basis, and respect the system-wide user overrides.

Remember, -fno-omit-frame-pointer is the default on i386 (except at
high -O levels with gcc, I dont know where clang, the default
compiler, draws the line).  Other platforms don't even have frame
pointers.  You can't just scatter that switch around the place.

> To be honest, I'm not sure if you're serious about "generally don't seem to
> care" or just feel defeated on the issue and we should care.

We took quite a performance beating because of not using the
tuned-by-perl assembler code in openssl on amd64, for example.  This
flows through to benchmarks on things like apache throughput with
mod_ssl.  Or throughput on stunnel(1).

My drive-by comment about not seeming to care any more is that people
(except for Bruce) generally don't actually measure the performance
impact of their changes any more.  The last time this was widespread
was when Kris Kennaway used to be constantly abusing machines and
reporting the effects as measured by ministat(1).

If somebody were to say "this change makes world take 15% longer to
compile but makes no meaningful affect on things like bzip2, openssl
throughput etc" and posted the actual ministat output to back it up
then there wouldn't even be a question on performance at all.  It'd
only be "is 15% more build time worth ubiquitous dtrace?"  And thats a
far easier thing to answer.

A hand-wave leads to bikesheds.  Actual numbers are bikeshed repellant.

I myself have killed patches that turned out to be premature
optimizations because it actually didn't make any difference.  For
example, I never committed the lazy tlb shootdown to AMD64 because it
made things slower on the hardware of the day - opteron silicon had
*hardware* address space tags on their TLB and the lazy shootdown code
just added more synchronization work that just added overhead..  eg:
buildworld was around 2% slower with the patches.

Another example was the mtxpool code that caused cache line thrashing.
If we cared about performance that would never have gone in. Sure, it
compiled and worked, but the costs weren't quantified till much later
and we realized how much trouble they were beyond a certain usage
level.

What's 2%?  It multiplies out.. 2% here, 1% there.. 3% over there,
0.5% somewhere else.. before you know it, there's a pretty big overall
hit.

>> Of course it wouldn't be required with dwarf unwinding awareness, but
>> we don't have that.
>>
>> We have -fno-omit-frame-pointer on the amd64 kernel whenever debugging
>> is compiled in because there's no unwinder for doing stack traces.  We
>> need a dwarf2+ unwinder and somebody to instrument the call frame
>> state through the remaining assembler code.
>>
> How much work is that exactly?  I've only been a gdb user, not a hacker.

gdb has a stack unwinder.  kdb/ddb/stack(9) do not.  There's well
established GPL code to do it, as well as libunwind and variants.
Basically what this code has to do is run the dwarf2+ state machine to
find all the call/return frames instead of assuming the compiler did
it.  Heck, even glibc has a dwarf2 unwinder built into it as part of
their exception processing system.

I'm not entirely sure what more work src/lib/libelf and
src/lib/libdwarf need.  It looks like its got just enough implemented
to support the ctfconvert etc and doesn't have an unwinder in it.

-- 
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell
bitcoin:188ZjyYLFJiEheQZw4UtU27e2FMLmuRBUE



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGE5yCrnoNhOh3VaYU3bO6BwA=bpxD5QzkZvD%2BHaUwvXNQ%2BUfw>