FreeBSD Mail Archives

Date:      Mon, 17 Feb 2003 09:40:10 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Alex Rousskov <rousskov@measurement-factory.com>
Cc:        Pawel Jakub Dawidek <nick@garage.freebsd.pl>, Scott Long <scott_long@btc.adaptec.com>, Sam Leffler <sam@errno.com>, Brad Knowles <brad.knowles@skynet.be>, freebsd-current@freebsd.org
Subject:   Polygraph Considered Evil 8^) (was: Re: 5-STABLE Roadmap)
Message-ID:  <3E511E7A.8225ABA9@mindspring.com>
References:  <20030216184257.GZ10767@garage.freebsd.pl> <3E4FFDD3.9050802@btc.adaptec.com> <20030216214322.GB10767@garage.freebsd.pl> <Pine.BSF.4.53.0302162130370.46493@measurement-factory.com>

index | next in thread | previous in thread | raw e-mail

Alex Rousskov wrote:
> Polygraph is relatively easy to setup on FreeBSD for standard tests,
> using two PCs. Testing with more PCs, with non-standard workloads,
> and/or on a regular basis requires writing scripts and can get pretty
> evolved (which let's us sell a pre-configured appliance that does
> Polygraph test management :).

First, I just have a slight editorial comment, about cheating on
Polygraph.

One issue I have with Polygraph is that it intentionally works
for a very long time to get worst case performance out of caches;
basically, it cache-busts on purpose.  Then the test runs.  This
seems to be an editorial comment on end-to-end guarantees, much
more than it seems a valid measurement of actual cache performance.

If you change squid to force a random page preplacement, then you
end up with a bounded worst case which is a better number than you
would be able to get with your best (in terms of the real-world
performance) algorithm (e.g. LRU or whatever), because you make it
arbitrarily hard to characterize what that would be.

NetApp has a tunable in their cache product which might as well be
labelled "get a good Polygraph score"; all it does is turn on
random page replacement, so that the Polygraph code is unable to
characterize "what would constitute worst case performance on this
cache?", and then intentionally exercise that code path, which is
what it would do, otherwise (i.e. pick a working set slightly larger
than the cache size so everythings a miss, etc.).

Basically, most of the case numbers are 99.xx% miss rates.  With
this modification, that number drops down to closer to 80%.

That's kind of evil; but at least it's a level playing field, and
we can make a FreeBSD-specific patch for SQUID to get better numbers
for FreeBSD.  8-) 8-).

> > Yes, on website kernel patches are avaliable for tunning, but for new
> > releases of 4.x this isn't necessary, all could be configure with kernel
> > options and sysctls (for 4.8):
> >
> >       options         MAXFILES=16384
> >       options         NMBCLUSTERS=32678

These I understand, though I think they are on the low end.

> >       options         HZ=1000

This one, I don't understand at all.  The web page says it's for faster
dummynet processing.  But maybe this is an artifact of using NETISR.

> >       kern.ipc.somaxconn=1024

This one, either: it's really very small.

> >       net.inet.ip.portrange.last=40000

This one is OK, but small.  It only effects outbound connections; got
to wonder why it isn't 65536, though.

> >       net.inet.tcp.delayed_ack=0

This seems designed to get a good connection rate.

> >       net.inet.tcp.msl=3000

And this seems designed to get a bad one.  You are aware that, by
default, NT systems cheat on the MSL, right?  For gigabit, this is
a larger number than you want, I think.

> One of our kernel patches optimizes handling of 1000s of IP aliases
> per FreeBSD box. The patch is required for older 4.x kernels to
> perform at decent levels. IIRC, the patch does not work for recent
> kernels, probably because of the SYN cache changes. I do not know
> whether any alias-related optimizations are still needed for recent
> kernels though. Perhaps the SYN cache solves the original scalability
> problem.

The hash is a reasonable modification; it'd probably be better handled
through the routing code, since it has to be hashed there anyway, if
you planned on using a lot of IP aliases.

I haven't looked at the client code, but you are aware that adding
IP aliases doesn't really do anything, unless you managed your
port space your self, manually, with a couple of clever tricks?  In
other words, you are going to be limited to your total number of
outbound connections as your ports space (e.g. ~40K), because the
port autoallocation takes place in the same space as the INADDR_ANY
space?  I guess this doesn't matter, if your maxopenfiles is only 16K,
since that's going to end up bounding you well before you run out of
ports...

> Please note that a couple of the results I looked at are invalid from
> PolyMix workload rules/design point of view.

Yes... the MSL setting, for one.  Only Windows gets to cheat.  ;^).

> The first thing to check
> is that you have huge numbers of request in waiting queue, compared to
> active transactions (shown on the same "xact_lvl" graph). Most likely,
> you overloaded the device under test, and most request ended up in
> queues instead of on the wire.

Probably the "best" way to handle this is to apply the Duke University
update of the Rice University LRP code.  You will be *much* better
numbers from your FreeBSD box, if you do that.  By a factor of 4, most
likely.  8-).

> I may be missing something though -- I am just looking at your
> results without much knowledge of their history/purpose... See last
> cache-off results for valid examples:
>         http://www.measurement-factory.com/results/
> 
> If you have any Polygraph-specific questions, I would be happy to
> answer them, especially if it can help FreeBSD folks in any way.

IMO, Polygraph is probably not something you want to include in a
standard suite, if the intent is to get numbers that are good for
FreeBSD PR (Sorry, Alex, but it's true: you have to do significant
and clever and sometimes obtuse and counterintuitive things in order
to get good Polygraph numbers for comparison).

I don't think that anything you do in this regard is going to be able
to give you iMimic or NetApp level numbers, which are created by
professional benchmark-wranglers, so any comparison values you get
will liekly be poor, compared to commercial offerings.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E511E7A.8225ABA9>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation