From owner-freebsd-chat Sat Jul 17 9:22: 2 1999 Delivered-To: freebsd-chat@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id 2915714C80 for ; Sat, 17 Jul 1999 09:21:38 -0700 (PDT) (envelope-from tlambert@usr08.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.8.8/8.8.8) id JAA19245; Sat, 17 Jul 1999 09:21:37 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp03.primenet.com, id smtpd019220; Sat Jul 17 09:21:32 1999 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id JAA12578; Sat, 17 Jul 1999 09:21:26 -0700 (MST) From: Terry Lambert Message-Id: <199907171621.JAA12578@usr08.primenet.com> Subject: Re: Known MMAP() race conditions ... ? To: davids@webmaster.com (David Schwartz) Date: Sat, 17 Jul 1999 16:21:26 +0000 (GMT) Cc: tlambert@primenet.com, unknown@riverstyx.net, chat@FreeBSD.ORG In-Reply-To: <000101becfec$605cd280$021d85d1@youwant.to> from "David Schwartz" at Jul 16, 99 05:36:14 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-chat@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I was under the impression that disk I/O was still blocking in FreeBSD's > libc_r. I was also under the impression that the resolver was blocking. > > If disk I/O really is non-blocking, I would expect the performance to > suffer because thread context switches in a user-space threads > implementation are generally more expensive than a kernel thread blocking on > I/O. Why is this? In a kernel threads implementation, each thread competes as a seperate process with all other kernel threads. A "process" is a group of one or more kernel threads. For all legacy code ("processes"), one kernel thread competes with all other kernel threads, including those in threaded programs ("multithreaded processes") based on the number of processes out there. For the sake of avoiding confusion, we should talk about the competition for quantum in terms of "Kernel Schedulable Entities" or KSE's. For a given program which is backed by multiple KSE's, there is no guarantess of affinity or adjacency. Any attempt to ensure affinity in blocking operations could result in starvation for other KSE's -- that is, KSE's are difficult to group, such that when you go from one KSE to another in the kernel scheduler, as a result of sleeping, instead of an involuntary context switch, the chance that you are going to have to do a full address map and register reload ("context switch between ''processes''") is equal to: total KSE's : (total KSE's - process KSE's) In other words, context switch overhead is generally equal to the same overhead as if you were running multiple processes instead of multiple KSE's per process. Furthermore, even if you were to do preferential scheduling KSE's from a single group (equivalent to a thread group in a proces, or a "multithreaded process"), and you used a "quantum counting" technique to ensure against starvation of other KSE's, such that if the scheduler activation was the result of a sleep rather than a quantum clock tick ("LBOLT"), you will, statitically, achieve only a best case average utilization of quantum/2 before a context switch that requires a full task switch. Comparatively, using a user space scheduler, the quantum is fully utilized (a factor of two improvement in reduction of context switch overhead), and the context switch between threads is the same as the best case for kernel threads, which is a register reload. One need only apply this same metric to cache busting, TLB shootdown, processor migration, and other dangers, and a strong picture favoring user space threading appears. Your claim that user space threads are definitionally more expensive than kernel space threads is verifiably false. Further, in the very best possible case, we see that overhead (not including protection domain crossing) for kernel threads only begins to approach user space theads as: (total KSE's - process KSE's) / total KSE's approaches 1. This approach is asymptotic at best, since we have the minimum system support daemons in the process queue cometing for quantum. > User-space threads are not inherently bad, they just have different > tradeoffs than kernel threads. Yes. Better tradeoffs. > > Yes. The NFS code can return "EWOULDBLOCK", if the operation would > > block. > > But does libc_r do this? As I see it, there are two answers, and both are > at least somewhat bad: > > 1) Yes. Which means that a significant fraction of disk I/O will > require extra user-space thread context switches. > > 2) No. Which means that slow I/O will stall all the threads. Conversion does not have to be to blocking calls on non-blocking descriptors. It can be to non-blocking calls on descriptors. No, this is not currently done, but the effect of #1 is not so poor as you would think; it is an additional overhead of 6 uS, if the data is not in cache, and an additional overhead of 0 uS, if it is. Smart programmers will organize their code to trigger predictive read-ahead so that the data will be in cache. As the "NULL" system call latency degreases, so does the overhead. Compare this to a: (total KSE's - process KSE's) / total KSE's * 100 percent risk of taking a fll context switch overhead in the kernel case, and the management overhead of ensuring that there are no user space threads in ready-to-run state that are stalled for lack of kernel space threads to back them, the additional scheduler overhead in the more complex scheme, and the assymetric CPU availability associated with differential CPU load when you attempt to implement CPU affinity. > > No. Both cases should result in an EWOULDBLOCK and a threads context > > switch, pending the data being present to be read, since non-blocking > > I/O is being substituted. > > Which means unneccesary context switches, when simply waiting > would be better. Threads context switches, not process context switches. > The problem is, if you want to avoid the occasional long delay, you have to > accept extra context switches all the time. Not necessarily the worst thing > in the world, but it's a tradeoff. You are confusing "pool retention time" (latency) with stalling. The problem you are not addressing is that latency merely implies that the I/O requests are interleaved, and satisfied after some delay, while allowing multiple outstanding requests. Stalling, on the other hand, means that no scheduled work is occurring. You aren't stalled if you are waiting for a scheduled DMA to complete, only if you are waiting to schedule a DMA. > > > I see them all the time. 'gethostbyname' is a good example. > > > > Are you forcing the use of TCP for this? This results in a spin > > loop. > > > > Please obtain and compile the libresolver from bind 8.x, which is > > reentrant, and link it before you link libc_r. > > I am calling 'gethostbyname'. Is that wrong? Yes. > Bind's license, unfortunately, prohibits me from linking to it. Once I > wrote my own resolver library, this problem goes away. But not everyone can > spend the time to do that to optimize for a platform. It's the same license on the 4.x resolver in libc. I don't see how you are prevented from linking with one, but not the other. The ISC wrote (and licensed) both. > That's not what I'm saying. I'm saying it's a painful tradeoff. What you > want is a thread to block if the I/O takes too long. You don't have that > choice. If a kernel thread sleeps, for any reason, you have, with a high statistical rpobability, lost your quantum and taken a full context switch overhead. Disk wait queue completions do not run in the context of the kernel threads making the call, they run at interrupt level. > > If it's "not that bad", then it won't take 10 years to fix. > > Yes, I've been waiting for fixes in FreeBSD's threads implementation for > more than a year now. The vast majority of them have taken place, and I'm > fairly happy with the current state of FreeBSD's threads support. I can't speak for the FreeBSD developement process. > However, it is really not as good as the threads support on many other > operating systems, including NT. If you need stellar threads support, > FreeBSD is not the operating system you probably want to use. I still think this is based on a false premise. > At the current rate of progress though, this could change in a few months. Well, that's something, I suppose. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message