FreeBSD Mail Archives

Date:      Tue, 19 Jun 2001 12:05:14 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Rik van Riel <riel@conectiva.com.br>
Cc:        Matt Dillon <dillon@earth.backplane.com>, Matthew Hagerty <mhagerty@voyager.net>, freebsd-hackers@FreeBSD.ORG
Subject:   Re: Article: Network performance by OS
Message-ID:  <3B2FA26A.EA68DDCA@mindspring.com>
References:  <Pine.LNX.4.21.0106161712060.2056-100000@imladris.rielhome.conectiva>

Rik van Riel wrote:
> 
> On Sat, 16 Jun 2001, Matt Dillon wrote:
> 
> >     This is old.  The guys running the tests blew it in so many ways
> >     that you might as well have just rolled some dice.  There's a slashdot
> >     article on it too, and quite a few of the reader comments on these
> >     bozos are correct.  I especially like comment #41.  Don't worry,
> >     FreeBSD stacks up just fine in real environments.
> 
> The only thing that worries me a bit is that both FreeBSD
> and Linux needed to be tuned at all to run these things,
> even if it was just the maximum file descriptor setting.
> 
> A lot of this tuning could easily be done dynamically
> (and is done dynamically on linux 2.4), but lots of it
> still has static maximums which have to be tuned by hand.
> Compile-time tuning for stuff which can be dynamically
> allocated (and freed) is IMHO a big sillyness in the OS.

Use of zalloci() permits allocations to occur at interrupt,
such as allocations for replacement mbuf's in receive rings.

It would be very difficult to maintain FreeBSD's GigaBit
ethernet performance without this type of thing.

One of the things that worries me about the new mbuf
allocator is how it behaves with regard to lock inversion
in a driver lock at interrupt time.  I'm not saying there
is definitely a problem, but this is really tricky code,
and the lock manager has poor deadlock avoidance
characteristics when it comes to inversion, since it does
not allocate locks onto a DAG arc that would permit cycle
detection among N processes with N+1 (or more) locks.

Because the allocations as a result of a zalloci() zone
occur through the use of a page fault against a preallocated
and contiguous KVA range, there's really very little, short
of a full rewrite, which would permit allocations to still
occur at interrupt, while at the same time ensuring that
the zone remained recoverable.

Frankly, with a number of minor modifications, and a bunch
more INVARIANTS code to guard against inversion, we could
allocate KVA space for mbufs, sockets, tcpcb's, and inpcb's
(and udpcb's, though they are not as important to me), and
have some possibility of recovering them to the system.

This would have the effect of rendering the memory no
longer type stable, but if it meant we could continue to
allocate at interrupt context, it would be worth having
a clearner going behind emptying full buckets back to the
system.

> Yes, this report was completely useless as a benchmark,
> but it DID highlight a point where Linux and BSD can be
> improved: dynamic allocation (and freeing) of things
> like file descriptors and socket structures.

Many of these default "limitations" are intentional, both
in terms of running out of KVA space (personally, I run
with a 3G KVA space, which also limits user processes to
1G of space, which is opposite of the normal arrangement),
and in terms of administration.

Burning this space for zone allocations, you still need
to come to a decision about what size to make each zone,
given the limitations of zones, and the interrupt allocation
requirement discussed above.

From an administrative perspective, you have to make a
trade-off on whether on not you can weather a denial of
service attack which expolits a vulnerability, such as
no default limitation on the number of sockets or open
file descriptors a process is permitted to consume.  In
having no limitations on this, you open yourself to
failure under what, under ordinary circumstances, would
have to be considered grossly abnormal loads.

I have done a number of Windows installs, and among other
things, it will ask you to characterize the load you
expect, which I am sure results in some non-defaults for
a number of tuning parameters.

Similarly, it has opportunity to notice the network
hardware installed: if you install a GigaBit Ethernet
card, it's probably a good be that you will be running
heavy network services off the machine.  If you install
SCSI disks, it's a pretty good bet you will be serving
static content, either as a file server, or as an FTP
or web server.

Tuning for mail services is different; the hardware
doesn't really tell you that's the use to which you will
put the box.

On the other hand, some of the tuning was front-loaded
by the architecture of the software being better suited
to heavy-weight threads implementations.  Contrary to
their design claims, they are effectively running in a
bunch of different processes.  Linux would potentially
beat NT on this mix, simply because NT has more things
running in the background to cause context switches to
the non-shared address spaces of other tasks.  Put the
same test to a 4 processor box with 4 NIC cartds, and I
have no doubt that an identically configured NT box will
beat the Linux box hands down.

A common thread in these complaints that the results
were somehow "FreeBSD's fault", rather than the fault of
tuning and architecture of the application being run,
is, frankly, ridiculous.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3B2FA26A.EA68DDCA>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation