Date: Mon, 17 Feb 2003 09:40:10 -0800 From: Terry Lambert <tlambert2@mindspring.com> To: Alex Rousskov <rousskov@measurement-factory.com> Cc: Pawel Jakub Dawidek <nick@garage.freebsd.pl>, Scott Long <scott_long@btc.adaptec.com>, Sam Leffler <sam@errno.com>, Brad Knowles <brad.knowles@skynet.be>, freebsd-current@freebsd.org Subject: Polygraph Considered Evil 8^) (was: Re: 5-STABLE Roadmap) Message-ID: <3E511E7A.8225ABA9@mindspring.com> References: <20030216184257.GZ10767@garage.freebsd.pl> <3E4FFDD3.9050802@btc.adaptec.com> <20030216214322.GB10767@garage.freebsd.pl> <Pine.BSF.4.53.0302162130370.46493@measurement-factory.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Alex Rousskov wrote: > Polygraph is relatively easy to setup on FreeBSD for standard tests, > using two PCs. Testing with more PCs, with non-standard workloads, > and/or on a regular basis requires writing scripts and can get pretty > evolved (which let's us sell a pre-configured appliance that does > Polygraph test management :). First, I just have a slight editorial comment, about cheating on Polygraph. One issue I have with Polygraph is that it intentionally works for a very long time to get worst case performance out of caches; basically, it cache-busts on purpose. Then the test runs. This seems to be an editorial comment on end-to-end guarantees, much more than it seems a valid measurement of actual cache performance. If you change squid to force a random page preplacement, then you end up with a bounded worst case which is a better number than you would be able to get with your best (in terms of the real-world performance) algorithm (e.g. LRU or whatever), because you make it arbitrarily hard to characterize what that would be. NetApp has a tunable in their cache product which might as well be labelled "get a good Polygraph score"; all it does is turn on random page replacement, so that the Polygraph code is unable to characterize "what would constitute worst case performance on this cache?", and then intentionally exercise that code path, which is what it would do, otherwise (i.e. pick a working set slightly larger than the cache size so everythings a miss, etc.). Basically, most of the case numbers are 99.xx% miss rates. With this modification, that number drops down to closer to 80%. That's kind of evil; but at least it's a level playing field, and we can make a FreeBSD-specific patch for SQUID to get better numbers for FreeBSD. 8-) 8-). > > Yes, on website kernel patches are avaliable for tunning, but for new > > releases of 4.x this isn't necessary, all could be configure with kernel > > options and sysctls (for 4.8): > > > > options MAXFILES=16384 > > options NMBCLUSTERS=32678 These I understand, though I think they are on the low end. > > options HZ=1000 This one, I don't understand at all. The web page says it's for faster dummynet processing. But maybe this is an artifact of using NETISR. > > kern.ipc.somaxconn=1024 This one, either: it's really very small. > > net.inet.ip.portrange.last=40000 This one is OK, but small. It only effects outbound connections; got to wonder why it isn't 65536, though. > > net.inet.tcp.delayed_ack=0 This seems designed to get a good connection rate. > > net.inet.tcp.msl=3000 And this seems designed to get a bad one. You are aware that, by default, NT systems cheat on the MSL, right? For gigabit, this is a larger number than you want, I think. > One of our kernel patches optimizes handling of 1000s of IP aliases > per FreeBSD box. The patch is required for older 4.x kernels to > perform at decent levels. IIRC, the patch does not work for recent > kernels, probably because of the SYN cache changes. I do not know > whether any alias-related optimizations are still needed for recent > kernels though. Perhaps the SYN cache solves the original scalability > problem. The hash is a reasonable modification; it'd probably be better handled through the routing code, since it has to be hashed there anyway, if you planned on using a lot of IP aliases. I haven't looked at the client code, but you are aware that adding IP aliases doesn't really do anything, unless you managed your port space your self, manually, with a couple of clever tricks? In other words, you are going to be limited to your total number of outbound connections as your ports space (e.g. ~40K), because the port autoallocation takes place in the same space as the INADDR_ANY space? I guess this doesn't matter, if your maxopenfiles is only 16K, since that's going to end up bounding you well before you run out of ports... > Please note that a couple of the results I looked at are invalid from > PolyMix workload rules/design point of view. Yes... the MSL setting, for one. Only Windows gets to cheat. ;^). > The first thing to check > is that you have huge numbers of request in waiting queue, compared to > active transactions (shown on the same "xact_lvl" graph). Most likely, > you overloaded the device under test, and most request ended up in > queues instead of on the wire. Probably the "best" way to handle this is to apply the Duke University update of the Rice University LRP code. You will be *much* better numbers from your FreeBSD box, if you do that. By a factor of 4, most likely. 8-). > I may be missing something though -- I am just looking at your > results without much knowledge of their history/purpose... See last > cache-off results for valid examples: > http://www.measurement-factory.com/results/ > > If you have any Polygraph-specific questions, I would be happy to > answer them, especially if it can help FreeBSD folks in any way. IMO, Polygraph is probably not something you want to include in a standard suite, if the intent is to get numbers that are good for FreeBSD PR (Sorry, Alex, but it's true: you have to do significant and clever and sometimes obtuse and counterintuitive things in order to get good Polygraph numbers for comparison). I don't think that anything you do in this regard is going to be able to give you iMimic or NetApp level numbers, which are created by professional benchmark-wranglers, so any comparison values you get will liekly be poor, compared to commercial offerings. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E511E7A.8225ABA9>