Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Nov 2002 17:09:30 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Nate Lawson <nate@root.org>
Cc:        hackers@freebsd.org
Subject:   Re: Changing socket buffer timeout to a u_long?
Message-ID:  <3DDD83CA.4A910E59@mindspring.com>
References:  <Pine.BSF.4.21.0211211637000.69388-100000@root.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Nate Lawson wrote:
> On Thu, 21 Nov 2002, Terry Lambert wrote:
> > FWIW: upping the roll-over rate is not a good reason to increase
> > the size of fields, unless you want to increase the TCP sequence
> > number filed to 64 bits?  ...it has exactly the same issues at
> > high data rates.
> 
> That's what the timestamp option does and I think it was a good idea,
> given the range of systems TCP needs to work well on.

Setting your HZ to 100,000 instead of 100, and then complaining
because a timer field with a resolution specified in ticks instead
of an interval length can't handle a value which is way to large
for a fast transport seems a bit silly to me.

Call me crazy, but the timer field should not be in ticks; in fact,
the timer field really should not exist, per se, it should be on a
fixed interval timer queue, instead of linked into a callout wheel,
and then if it fires, it fires, along with every other timer of that
interval.

At the very least, if you are going to crank the HZ so that things
you multiply by HZ overflow their fields, maybe it's time to scale
those fields by some factor in addition to HZ, rather than bloating
everything?

The thing is already an int in -current; jumping it larger makes
no sense at all to me, unless you are being paid to screw over
FreeBSD by decreasing the high end load it can scale to, for no
good reason.  Unless you have a good reason these fields should
not be scaled in terms of MSL instead of HZ ticks, for example?

When I was originally chasing 1,000,000 simultaneous TCP connections
on a single 4G RAM FreeBSD box, one of the biggest and most obvious
bottlenecks that I never dealt with is that when FreeBSD moved from
the historical fixed interval timer list code to the callout wheel,
it really screwed over the TCP timer code bigtime: the overhead went
way, way up, and "just increase the size of the callout wheel" only
works up to the point where "entries *2 * HZ > MSL".

Eventually, I got to the point I could support 1.6M simultaneous
TCP connections on a single FreeBSD box with 4G of RAM -- 800,000
load balanced clients against a back end server farm, if the data
was simply switched through at L4 -- but most of the excess time
that was in the code was in the timer code, traversing obvious
misses through the wheel lists on each timer firing.

This was because the lists were not -- *could not be* -- ordered,
such that you could stop traversing on the first "later than now"
entry, because the lists were not fixed interval (as they were in
older releases of BSD).

The crap doesn't scale, and piling more crap on top of it, at the
added expense of making it not scale *even worse* is not the way
to fix the problem.

PS: Adding *any* TCP options is bad karma, for networking equipment;
the cost in terms of in transit overhead is immense, if you are
trying to use the code later to build a switch or a load balancer.
Doing that sort of thing is fine -- as long as you know beforehand
that what you are doing is making the code less general purpose,
and everyone buys into that idea.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3DDD83CA.4A910E59>