From owner-freebsd-hackers Mon Dec 20 17:26:19 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from maxwell.syr.edu (maxwell.syr.edu [128.230.129.5]) by hub.freebsd.org (Postfix) with ESMTP id 02A4A153E0 for ; Mon, 20 Dec 1999 17:26:08 -0800 (PST) (envelope-from cmsedore@maxwell.syr.edu) Received: from qwerty.maxwell.syr.edu (qwerty.maxwell.syr.edu [128.230.129.248]) by maxwell.syr.edu (8.9.1a/8.9.1) with ESMTP id UAA14603; Mon, 20 Dec 1999 20:22:00 -0500 (EST) Date: Mon, 20 Dec 1999 20:24:28 -0500 (EST) From: Chris Sedore To: Jason Evans Cc: Alfred Perlstein , Kevin Day , "Ronald F. Guilmette" , hackers@FreeBSD.ORG Subject: Re: Practical limit for number of TCP connections? In-Reply-To: <19991220164517.F26743@sturm.canonware.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 20 Dec 1999, Jason Evans wrote: > On Sun, Dec 19, 1999 at 03:01:41PM -0500, Chris Sedore wrote: > > > > > > On Sat, 18 Dec 1999, Alfred Perlstein wrote: > > [...TRIM...] > > > Using a thread per connection has always been a bogus way of programming, > > > it's easy, but it doesn't work very well. > > > > Ahem. Well, that kind of depends on the threads implementation, how many > > connections you're talking about, and likely some other factors too. > > I've got an NT box that handles about 1000 concurrent connections with > > 1000 (plus a few) threads doing the work. Runs fine, performs very well. > > > > I wouldn't argue that it is the most scalable solution to problems, but it > > is easier, and scales proportionally to the quality of the threads > > implementation. > > 1000 simultaneous connections really isn't that many, but even at 1000 > threads, you could likely achieve much better performance by using thread > pools or a simple poll() loop rather than one thread per connection. Why? > Locality, locality, locality. Consider that each thread has its own stack, > which in the best of worlds would be 4K, but is more likely at least 16K. > Now, start switching between threads to handle relatively small amounts of > I/O for each connection, and consider what that does to the VM, not to > mention the memory hierarchy of the hardware. You might as well not even > have L2 cache, because the program will thrash the cache so badly. Of > course, you won't see worst case performance if client activity is unevenly > distributed, but you just can't get past the fact that the memory footprint > of one thread per connection is larger than a bounded pool of threads. In my case, load is reasonably distributed. Is poll() really that much better than select()? I thought that, excepting bit flag manipulations, it worked basically the same way on the kernel end. > Some threads implementations are better than others at handling such > abuses, but the performance of such an approach will almost always suffer > in comparison to a design that takes locality into consideration. True enough. In some cases, this may not be that much of an issue, though. Imagine a thread-per-connection that does much of its work in a limited call tree, with much of its work context within 8k (+/-) of the current stack pointer. It has to pull this into cache every time that thread is activated. In a thread pool implementation, it would likely have to move about the same 16k into the cache, only from a "context structure" which would be probably approximately as open to thrashing as the thread stack. Add to that the fact that thread-pool applications often utilize more synchronization primitives. > I disagree with your assessment that scalability of one thread per > connection is proportional to the quality of the threads implementation. > An ideal threaded program would have exactly as many threads as available > processors, and the threads would always be runnable. Of course, > real-world applications almost never work that way, but the goal of a > programmer should be to have as few threads as possible while still > achieving maximal parallelism. If connection scalability is an issue, > using one thread per connection ignores a critical aspect of high > performance threaded application design. I don't disagree with any of what you have written. I'd expect you to concede that it is true that the scalability is proportional. That is, LinuxThreads (that is, rfork()) is probably not anything like optimally scalable, but something like the last FreeBSD KSE model that I saw bouncing around on -arch would do alot better. I was really responding to the assertion that thread-per-connection is broken as a methodology. I've written programs both ways, mixed the two, etc. My point was that at least one OS has no problem coping with 1000 threads essentially blocked on sockets, so the base argument of "it doesn't scale well" is, without further elaboration, hollow. I would agree that there is a crossover point for performance, probably below 1000 threads. I was just injecting a little real-world experience with this in an application which is somewhere above "small" and somewhere below "really large". People are still doing things like this with process-per-connection, so threads sound much better as an alternative. -Chris To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message