Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Jan 2013 09:23:26 -0800
From:      Adrian Chadd <adrian@freebsd.org>
To:        Jason Evans <jasone@freebsd.org>
Cc:        freebsd-hackers@freebsd.org, Hakisho Nukama <nukama@gmail.com>
Subject:   Re: L1 cache thrashing affects performance of HIMENO benchmark
Message-ID:  <CAJ-Vmom77_Pbu9Crutu-ROEyiH6h9UuPjHSsCS2Jyg3viXu53w@mail.gmail.com>
In-Reply-To: <CAJ-Vmo=nwQiDryXnU9RC3OOguM1XuN8xtSh0yjBGXOK3vgX2sg@mail.gmail.com>
References:  <CA%2Bzcas1V0KzcBKjq1u3Rwu_Nm7hVkG%2BG%2BeHpHRDQG0_NcXoOWA@mail.gmail.com> <CAJ-Vmo=O9v==Xpm8F2UUjkEAOaHbKydLMpn6Bz-5rRBt%2B2TEAg@mail.gmail.com> <DFFCA030-3206-4EB2-88C8-262AB298FF9F@freebsd.org> <CAJ-Vmo=nwQiDryXnU9RC3OOguM1XuN8xtSh0yjBGXOK3vgX2sg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
... can someone please file a FreeBSD PR with some example workloads
and the dfbsd list summary?

I'd like to make sure we don't lose this particular thread.

It may be worth "teaching" jemalloc to offset the allocation sizes so
they don't hit this degenerate cache case, then do a bunch of testing
to ensure nothing has regressed.

Thanks,



adrian


On 5 January 2013 18:03, Adrian Chadd <adrian@freebsd.org> wrote:
> On 5 January 2013 13:54, Jason Evans <jasone@freebsd.org> wrote:
>
>>> Jason - any comments?
>>
>> There are many variations on this class of performance problem, and the =
short of it is that only the application can have adequate understanding of=
 data structure layout and access patterns to reliably make optimal use of =
the cache.  However, it is possible for the allocator to lay out memory in =
a more haphazard fashion than jemalloc, phkmalloc, etc. do, such that the a=
pplication can be cache-oblivious and (usually) not suffer worst case conse=
quences as happened in this case.  Extent-based allocators like dlmalloc of=
ten get this "for free" for a significant range of allocation sizes.  jemal=
loc could be modified to this end, but a full solution would necessarily in=
crease internal fragmentation.  It might be worth experimenting with noneth=
eless.
>
> For at least this particular computational workload, the loss in
> throughput based on cache thrashing is significant enough to learn
> FreeBSD a negative mark in computational workloads.
>
> It'd be interesting to see which other workloads FreeBSD behaves poorly i=
n.
>
> In fact, it'd be doubly interesting to get some people who _do_
> computational workloads to do some profiling using oprofile/pmc and
> report back. Maybe if we wrote a wiki page on how to do this kind of
> profiling and how to interpret the results.
>
> In any case, yes - I think it's worth pursuing this further as it's
> very likely not the only workload that exhibits this kind of cache
> unhappiness.
>
>
>
> Adrian



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmom77_Pbu9Crutu-ROEyiH6h9UuPjHSsCS2Jyg3viXu53w>