Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Nov 1995 12:45:40 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        jgreco@brasil.moneng.mei.com (Joe Greco)
Cc:        terry@lambert.org, luigi@labinfo.iet.unipi.it, hackers@FreeBSD.org
Subject:   Re: Multiple http servers - howto ?
Message-ID:  <199511141945.MAA20656@phaeton.artisoft.com>
In-Reply-To: <199511141851.MAA29115@brasil.moneng.mei.com> from "Joe Greco" at Nov 14, 95 12:51:44 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> > You're still doing round-robin address assignment, which expects that
> > clients will behave statistically identical to one another.  And they
> > won't, even if the TTL is honored.
> 
> Somebody else who doesn't really understand that when N is a random function
> that may not be random for small values of x, still is random enough for
> large values of x....  :-)
> 
> The TTL hack simply reduces the definition of "may not be random for small
> values of x".

For P(N1(X) == N2(X)) << 1.

I quantified this as "statistically identical client behaviour".  If
the distribution of clients by duration is not uniform, then the
effective "randomness" is reduced.

> If you are trying to tell me that if I have 4 addresses and 5,000 sites do 
> a DNS lookup on me, I will state that at least 1,000 sites will get assigned
> to each address.  That does not imply that the loading will be identical or
> totally equal, but it should be reasonably distributed.  I may not care if
> the distribution is 1000/1000/1000/2000, because it is still better than
> 5000 against a single box - and I would bet that it would be more evenly
> distributed than I am suggesting, most of the time.

The loading due to each client will be non-identical.  For P(Nn(X)) for
'n' hosts, the probability of divergence is given by:
	n * session duration / sample interval.

I guess if you don't check your per server connection load too often
relative to the TTL, it will be better balanced on average.  8-).

This is the problem with connecting to machines instead of to services.

In any case, the point is that distribution of server load by DNS is
non-optimal, since it assumes all clients are equal in terms of duration
and/or server load factor, etc..  The actual tendency is for a loaded
server to service requests slower, and so become more loaded if a truly
round-robin assignment scheme is used.

Using a machine connection assignment oriented mechanism (like picking
your DNS response to a potential client), you want to assign clients
based on inverse of relative server load to optimize per client response
times.

Maybe you could wire a special DNS server that knew WWW server loads per
some sample reporting interval?  This would still be inferior to a
dynamic load balancing mechanism, like service connection instead of
machine connectin, but it wouldn't require protocol changes to implement.
You'd end up with no load increase on an over-used server, though load
decrease would not fall off proportionally to unloading of other servers
in the same group of assigns, like it would with service connection or
some meta-protocol for client handoff between identical service providers.

The other thing that isn't taken into account is topology management for
geographically seperate servers: no way to get the least loaded server
cosest to your location to reduce overall network congestion.

Actually, someone could probably get a nice little paper out of building
an inverse load preferential DNS (and load reporting daemons) if they
wanted to.  8-).


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199511141945.MAA20656>