From owner-freebsd-hackers@FreeBSD.ORG Wed Mar 19 21:05:30 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3D7F610656C1 for ; Wed, 19 Mar 2008 21:05:30 +0000 (UTC) (envelope-from scode@hyperion.scode.org) Received: from hyperion.scode.org (hyperion.scode.org [85.17.42.115]) by mx1.freebsd.org (Postfix) with ESMTP id AC4E18FC16 for ; Wed, 19 Mar 2008 21:05:29 +0000 (UTC) (envelope-from scode@hyperion.scode.org) Received: by hyperion.scode.org (Postfix, from userid 1001) id 4C1A02394A6; Tue, 18 Mar 2008 22:14:54 +0100 (CET) Date: Tue, 18 Mar 2008 22:14:53 +0100 From: Peter Schuller To: freebsd-hackers@freebsd.org Message-ID: <20080318211453.GA37126@hyperion.scode.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="UlVJffcvxoiEqYs2" Content-Disposition: inline User-Agent: Mutt/1.5.17 (2007-11-01) Subject: Asynchronous syscalls for massive concurrency X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2008 21:05:30 -0000 --UlVJffcvxoiEqYs2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello, Reading the recent thread on -current regarding dropping KSE and the evolution of application design with respect to concurrency, prompted me to want to ask what people think regarding the desirability and/or feasability of making FreeBSD support an asynchronous interface to *all* system calls. In particular, I would love to hear from experienced kernel developers whether this is a total pipe dream in terms of practically implementing it semi-efficiently with a reasonable amount of work, in a real-world operating system like FreeBSD. It is something that I have secretly coveting for for a long time (generally, not for FreeBSD specifically). Let me explain my motivation. Some will probably disagree, others will think I am pointing out the obvious, but I still want to summarize it here in the hopes of triggering a fruitful and/or interesting discussion. If this has been discussed, feel free to point me in the right direction. Apologies if this reads like a rant (will se if anyone bothers to read it to begin with ;). State machines vs. concurrency in application design: It is my belief that, in general, from a design perspective there is a desire to model a fundamentally concurrent process in the form of concurrency, rather than a state machine. It is my belief that, ignoring efficiency, concurrency tends to lead to cleaner, more modularized and more easily maintainable code. "Concurrency" above does not necessarily refer to operating system threads using the pthread model. I will mention implementation issues later on. Right now, I am arguing strictly from the design perspective of how I want to structure my code. In the simplest cases, a state machine will be pretty small and easy to understand (let's say the state machine parsing simple line-based messages in an IRC server). As the problem grows more complex, you will invariable wish to modularize the code; introduce abstractions; moving details away from the core loop. This can be done to any extent; you might generalize it a bit and be fairly satisfied. Until you want to introduce even more features and you feel the need to further generalize. You end up reaching a point where the framework for writing a modularized state machine is approaching the featurefullness of actual concurrency. If your language supports continuations, you may very well simply implement the state machine in terms of these. In this case, you pretty much have userland co-operative threading. That is one end of the extreme; the other end is the simple core state machine loop without modularizations. In between, you have various levels of generality in explicit state management. In this way, I view "true" concurrency modelling as essentially being the perfect state machine, completely hidden from application code. From this perspective, manually writing explicit state management is essentially a less powerful and less expressive way of achieving the same goal, that I would only resort to if "true" concurrency is not possible or practical. I will readily admit that I have not written a lot of state machine style code, exactly because of my view expressed above. I have however written, and write on a daily basis, a pretty considerable amount of multi-threaded (or otherwise concurrent) code. Some of it could reasonable be re-done in the form of a state machine, while in other cases the complexity would be absolutely tremendous without using some form of state management framework that at least approaches true concurrency (and even then complexity would be considerably higher than now). This is my reason for fundamentally preferring a concurrent approach, and would love for it to be supported by the language/environment being used, to an extent such that there is no need to fall back to explicit state management. Enabling massive concurrency by asynchronous syscalls: Support for asynchronous syscalls, I believe, would help *a lot* in realizing practical massive concurrency in a variety of environments, =66rom C/C++ to more higher level languages. C/C++ and similar w/ pthread concurrency model: One can already come *fairly* far by writing code in a stack conscious manner. Most of the C/C++ I do is written with this in mind, and having tens of thousands of threads or more is not a problem at all, because the stack size can be very very conservative. This is not to say that it is necessarily the most efficient approach (context switching overhead, cache localitly, etc), but it does get you pretty far - particularly when the application is extremely concurrent in relation to the actual amount of work done in total (that is, when your problem is not CPU efficiency, but the ability to e.g. handle 30 000 clients, most of which are mostly idle). That said, there are still issues. Regardless of whether explicit memory management is used, or garbage collection, there is a desire to reduce contention. Many mallocs have thread-local pools (including jemalloc). But of course, if you want to be running with a 32 kbyte stack size, suddenly you do not have a lot of room for thread-local resources of that type. Having asynchronous syscalls, in combination with sufficient support from the runtime/threading libraries, would allow for these resources to scale with the number of *cores*, rather than the number of threads. Even if the nature of the C language more or less requires, as far as I can tell, that you retain a per-thread native stack, it would be a huge plus in several situations to be able to scale based on memory availability relative to stack size, without worrying about secondary affects like higher contention in malloc. Higher level languages with native threads (e.g. Python, Ruby 1.9, some Common Lisps): A similar situation exists here as with C/C++. Higher level languages without native threads (Ruby 1.8, erlang, chicken, etc): This is my primary motivation. Several languages attempt to get away with using non-native threads, and end up being pretty scalable in terms of concurrency - but at a cost. Only certain operations are possible to do in a non-blocking fashion (basically, I/O). In the case of e.g. Ruby and Chicken, the problem is simply lived with, and other syscalls block the interpreter, causing the language to be much less usable than it otherwise would have been, in a variety of "production" situations. In the case of erlang, this problem is dealt with by requiring blocking operations to be performed in a separate process, with pretty expensive communication going on between those processes and the core virtual machine. The fact that you can't just randomly do your syscall, or blocking library call, right off the bat means the threshold, in terms of effort, in interfacing with native non-erlang code is pretty high. (Of course in erlang a major purpose is to have this separation on purpose, to ensure that the core virtual machine is not tained by broken native code. I am ignoring that here.)=20 The reason why your call blocks the entire {Ruby,Chicken,...} interpreter is not that somebody *wants* this to happen, but because of the effort required to fix this problem - especially if you are not willing to take the massive performance hit of requireing native threads. Ruby 1.9 will have native threading, but at a certain cost in terms of concurrency overhead, but also in terms of language power (continuations are no longer feasable to support). Are asynchronous system calls just a bad idea to begin with? Am I failing to identify major problems with this approach?[1] Is it just not feasable at the kernel implementation level? How much of this overlaps with KSE? Would it be correct to say that the primary complexity in implementing this in the kernel, stems from exactly the problem I claim with application development - a need for additional reliance on explicit state management, in the blocking cases where, right now, a lot is kept implicit in the process' kernel stack? If so I guess one could look at it as moving that problem from the applications, to the kernel (and/or the runtime library). [1] One problem I can think of is that one will loose concurrency in memory access, so that the impact of high-latency paging (to/from disk) will likely be a lot higher, since your other threads (in the same native thread/process) will not be able to execute during said paging. A single page-in (and to a lesser extent page-out) effectively blocks any number of "threads". --=20 / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org --UlVJffcvxoiEqYs2 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.8 (FreeBSD) iEYEARECAAYFAkfgMMwACgkQDNor2+l1i307AgCeNfRhz41hFV8j6qCFQaI3EUoX BJsAnioU0VOarlWNjN5+JiVOuLcDW5cT =8dLq -----END PGP SIGNATURE----- --UlVJffcvxoiEqYs2--