From owner-freebsd-current Fri Sep 19 23:27:14 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id XAA13663 for current-outgoing; Fri, 19 Sep 1997 23:27:14 -0700 (PDT) Received: from usr02.primenet.com (tlambert@usr02.primenet.com [206.165.6.202]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id XAA13658 for ; Fri, 19 Sep 1997 23:27:11 -0700 (PDT) Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id XAA21428; Fri, 19 Sep 1997 23:27:07 -0700 (MST) From: Terry Lambert Message-Id: <199709200627.XAA21428@usr02.primenet.com> Subject: Re: FYI: regarding our rfork(2) To: Shimon@i-Connect.Net (Simon Shapiro) Date: Sat, 20 Sep 1997 06:27:06 +0000 (GMT) Cc: tlambert@primenet.com, current@FreeBSD.ORG, nate@mt.sri.com In-Reply-To: from "Simon Shapiro" at Sep 19, 97 10:39:27 pm X-Mailer: ELM [version 2.4 PL23] Content-Type: text Sender: owner-freebsd-current@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > The original design (which worked well on Slowlaris and Linux) was to have > a listner thread waiting for requests. and a group of ``worker threads'' > which sit and wait for work to be done. When a request arrives, it is > handed to a worker thread and the master thread goes back to listening for > more requests. the worker thread does what it does, replies to the > requestor and then goes back into a queue, waiting for the next call to > serve. This was the model I used when I did the process architecture for the "Pathworks for VMS (NetWare)" product. It worked well for VMS because: (1) most VMS machines weren't SMP capable, (2) AST overhead is *vastly* lower than context switch overhead in VMS (the threading was a call-conversion implementation using AST's, more like SunOS 4.x's liblwp than pthreads: aioread/write beats non-blocking I/O for call concurrency), and (3) VMS didn't have a file system cache. The library code that implements the FS cache on the current VMS came out of our project. > We tried the approach of spawning new threads on demand but this is simply > too slow. The thread-per-client model is flawed in a *lot* of ways that the "work to do engine" model is not. 8-). > Our call rate, on a Pentium-90 went from 43/sec (Linux) to 14/Sec > (FreeBSD-2.2). We then decided to toss the multithreading and re-build the > server as multi-process. Same concept but having separate processes, using > shared memory to comunicate. > > The result? Throughput went up to about 140/sec, and reliability is > excellent; We have aborted a load test after two weeks of runtime. You should be able to obtain the same wins under Linux; if you are still using it, I recommend converting it, as well. I would be *very* interested in knowing what your Solaris rate was, considering the high context switch overhead for blocking kernel calls... I assume you ran the same number of kernel and usre threads, and used the "tightly bound" relationship model? If I were a betting person, I'd bet "more than 43 and less than 140". > Why the story? Although multi-threading sounds good for certain things, > when you compare it to the traditional Unix model, it really has very > little benefit. Even in performance. > > What we gain instead is the need/requirement to ``play O/S'' in the user > program (or library); This is not entirely fair. "Real threading" will gain you concurrency wins. I don't think the pthreads implementation under FreeBSD really qualifies as "real threading". Using async I/O with aioread/write/wait/cancel will yeild significantly higher concurrency than using non-blocking descriptors and select. This is because AIO will schedule the work, and non-blocking descriptors and select will only tell you when it is possible to schedule the work without blocking. The difference is the latency between the time that work is scheduled, and the time it completes. I am a firm believer in async call gates, either via aioread/aiowrite, or a formal alternate gate that may apply to all system calls. Some optimizations are possible to reduce reaping latency (namely tagging) for calls that can run to completion without sleeping. Probably the majority of your performance win was related to increased concurrency of the calls in the multiple process case. Unless the system was seriously loaded with other tasks (unlikely for such a dedicated use), the performance losses were not quantum-competition releated, but concurrency related. Of course, I could be wrong, and the box could be your main www server as well as your phone switch... 8-). > My two cents worht of opinion; If we want to maintain Unix in the spirit > and make FreeBSD useful to a large audience of programmers, keep the > original semantics. If I want a messy, complex, tricky environment (multi > threading included), I can use NiceTry and get about half my management > very, very happy (``now we are as mediocre as anyone else''). > But this is just a biased opinion. > > Where I would LOVE to have more than one thread is in the kernel. Kernel threading, unless the user programs using the async call gate themselves, in the same way that users can program with aio* on SunOS and SVR4, is the only mechanism capable of supporting SMP scalability in the context of a single process. In the user-async programming case, kernel contexts, be they aio contexts or async call gate contexts, can be scheduled to run on different CPUs. But threading is an easier model to grasp for doing concurrent work than interleaved system calls. I think it would be a mistake to not abstract the issues in overlapped prgramming to a procedural interface -- in other words, a user space threads implementation. Not everyone can think in terms of concurrent soloutions for programming problems. Those who can are the people who write threads libraries because it makes your brain itch to think that way too long... or they are entirely warped by the ordeal and become professional mathematicians; mostly topologists and group theorists. 8-) 8-). Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.