Date: Tue, 19 Jun 2001 12:05:14 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Rik van Riel <riel@conectiva.com.br> Cc: Matt Dillon <dillon@earth.backplane.com>, Matthew Hagerty <mhagerty@voyager.net>, freebsd-hackers@FreeBSD.ORG Subject: Re: Article: Network performance by OS Message-ID: <3B2FA26A.EA68DDCA@mindspring.com> References: <Pine.LNX.4.21.0106161712060.2056-100000@imladris.rielhome.conectiva>
next in thread | previous in thread | raw e-mail | index | archive | help
Rik van Riel wrote: > > On Sat, 16 Jun 2001, Matt Dillon wrote: > > > This is old. The guys running the tests blew it in so many ways > > that you might as well have just rolled some dice. There's a slashdot > > article on it too, and quite a few of the reader comments on these > > bozos are correct. I especially like comment #41. Don't worry, > > FreeBSD stacks up just fine in real environments. > > The only thing that worries me a bit is that both FreeBSD > and Linux needed to be tuned at all to run these things, > even if it was just the maximum file descriptor setting. > > A lot of this tuning could easily be done dynamically > (and is done dynamically on linux 2.4), but lots of it > still has static maximums which have to be tuned by hand. > Compile-time tuning for stuff which can be dynamically > allocated (and freed) is IMHO a big sillyness in the OS. Use of zalloci() permits allocations to occur at interrupt, such as allocations for replacement mbuf's in receive rings. It would be very difficult to maintain FreeBSD's GigaBit ethernet performance without this type of thing. One of the things that worries me about the new mbuf allocator is how it behaves with regard to lock inversion in a driver lock at interrupt time. I'm not saying there is definitely a problem, but this is really tricky code, and the lock manager has poor deadlock avoidance characteristics when it comes to inversion, since it does not allocate locks onto a DAG arc that would permit cycle detection among N processes with N+1 (or more) locks. Because the allocations as a result of a zalloci() zone occur through the use of a page fault against a preallocated and contiguous KVA range, there's really very little, short of a full rewrite, which would permit allocations to still occur at interrupt, while at the same time ensuring that the zone remained recoverable. Frankly, with a number of minor modifications, and a bunch more INVARIANTS code to guard against inversion, we could allocate KVA space for mbufs, sockets, tcpcb's, and inpcb's (and udpcb's, though they are not as important to me), and have some possibility of recovering them to the system. This would have the effect of rendering the memory no longer type stable, but if it meant we could continue to allocate at interrupt context, it would be worth having a clearner going behind emptying full buckets back to the system. > Yes, this report was completely useless as a benchmark, > but it DID highlight a point where Linux and BSD can be > improved: dynamic allocation (and freeing) of things > like file descriptors and socket structures. Many of these default "limitations" are intentional, both in terms of running out of KVA space (personally, I run with a 3G KVA space, which also limits user processes to 1G of space, which is opposite of the normal arrangement), and in terms of administration. Burning this space for zone allocations, you still need to come to a decision about what size to make each zone, given the limitations of zones, and the interrupt allocation requirement discussed above. From an administrative perspective, you have to make a trade-off on whether on not you can weather a denial of service attack which expolits a vulnerability, such as no default limitation on the number of sockets or open file descriptors a process is permitted to consume. In having no limitations on this, you open yourself to failure under what, under ordinary circumstances, would have to be considered grossly abnormal loads. I have done a number of Windows installs, and among other things, it will ask you to characterize the load you expect, which I am sure results in some non-defaults for a number of tuning parameters. Similarly, it has opportunity to notice the network hardware installed: if you install a GigaBit Ethernet card, it's probably a good be that you will be running heavy network services off the machine. If you install SCSI disks, it's a pretty good bet you will be serving static content, either as a file server, or as an FTP or web server. Tuning for mail services is different; the hardware doesn't really tell you that's the use to which you will put the box. On the other hand, some of the tuning was front-loaded by the architecture of the software being better suited to heavy-weight threads implementations. Contrary to their design claims, they are effectively running in a bunch of different processes. Linux would potentially beat NT on this mix, simply because NT has more things running in the background to cause context switches to the non-shared address spaces of other tasks. Put the same test to a 4 processor box with 4 NIC cartds, and I have no doubt that an identically configured NT box will beat the Linux box hands down. A common thread in these complaints that the results were somehow "FreeBSD's fault", rather than the fault of tuning and architecture of the application being run, is, frankly, ridiculous. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3B2FA26A.EA68DDCA>