Date: Tue, 24 Oct 2000 14:31:54 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: seth@pengar.com (Seth Leigh) Cc: tlambert@primenet.com (Terry Lambert), jasone@canonware.com (Jason Evans), smp@FreeBSD.ORG Subject: Re: SA project (was Re: SMP project status) Message-ID: <200010241431.HAA28242@usr05.primenet.com> In-Reply-To: <3.0.6.32.20001024094101.00c4d798@hobbiton.shire.net> from "Seth Leigh" at Oct 24, 2000 09:41:01 AM
next in thread | previous in thread | raw e-mail | index | archive | help
> >A scheduler activation, on the other hand, can reactivate in the > >specific process (picking another thread from the thread group), > >so long as there is quantum remaining, and thus actually deliver > >on the promise of reduced context switch overhead. A side benefit > >is that cache coherency is also maintained, and there is a reduced > >interprocessor arbitration overhead. > > This may seem petty, but if we always use the whole quantum, won't this > have the effect of driving down the priority of any multi-threaded > application with respect to single-threaded apps? No. If you want to build an application that competes unfairly with other processes for system resources, the correct approach is to use multiple processes in order to get multiple quanta, OR you can define a new scheduler class that implements fair share scheduling or some other scheduling algorithm that gives your program an unfair advantage in being selected for quanta. Realize that the benefit of not paying the context switch overhead will reduce overall system utilization. Realize also that you have a hidden assumption here, which is not necessarily true: that you will always have threads that are ready to run, and are not all blocked pending I/O or other kernel operations. Unless you run one thread in a spin loop, this will most likely never really be the case. Consider how you would fix the context and cache thrashing problem on a Linux or an SVR4 derived system: you could preferrentially choose a thread in your thread group, when making your scheduling decision. But this leads to starvation of other processes, should you make a coding error and go into a spin loop (or simply have a lot of work to do in user space which is CPU bound, such as rendering images). Alternately, you might implement round-robin scheduling or some other scheduling policy, and group threads in a single process next to each other in the runnable queue. But if you have even a moderate number of threads, then you will damage interactive response, perhaps to a considerable degree. Effectively, you are left with a very hard problem. The size of the quantum was chosen such that interactive response would not be damaged, even if a process used the entirety of its allotted CPU doing something compute intensive. Even if you still balked at the "unfairness" of being unable to have your one program compete with sendmail or inted as if it were 16,000 processes (for some definition of "unfairness" 8-)), you could choose to weight it based on system calls currently blocked, so it becomes the amount of time on average that your threads remain blocked. I personally wouldn't do this. If I were worried about my threaded application, I would either use rtprio to force the issue, or I would "manufacture" my server load: for example, it's very rare to see an Oracle server doing anything other than simply running Oracle. > You will pardon me if I ask dumb questions. After dabbling and reading > about it for a long time, I have finally started working on my first major > multi-threaded application, and so I am thinking a lot about them but I am > not necessarily a guru. Additionally, I aspire someday to being a kernal > guy, so I want to learn how these things work. You'd do well to study schedulers and context switch overhead, and cache-busting. The scheduling algorithms are actually much more complicated than they first appear, and it's not obvious on first glance that kernel threads, as an implementation, interact badly with schedulers, or where the system overhead really lives. Sun has a number of good papers on threading that I would recommend looking up on their web site. Really, threading tends to make some types of programming easier, but isn't terribly useful, unless you are trying to achieve SMP scaling. Even then, many OSs do it wrong. There's a long standing fiction that SMP systems start failing to scale at 4 processors, that they reach a point of diminishing returns at a relatively small scale. This isn't really true: mostly it's an implementation failure when you see a limit that small. John Sokol actually gave a nice presentation on using finite state automatons in place of threading in the "AfterBurner" web server product, and backed it up with some nice numbers; but his solution, while incredibly capable on low end hardware, would not scale to better ability on SMP. I don't think he ever hit a CPU binding limitation, but if he were to do so, the only thing he'd be able to do would be to throw bigger iron at it. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200010241431.HAA28242>