From owner-freebsd-hackers Wed Aug 8 0:26:57 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from robin.mail.pas.earthlink.net (robin.mail.pas.earthlink.net [207.217.120.65]) by hub.freebsd.org (Postfix) with ESMTP id CD93437B401 for ; Wed, 8 Aug 2001 00:26:52 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from mindspring.com (dialup-209.245.139.128.Dial1.SanJose1.Level3.net [209.245.139.128]) by robin.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id AAA14369; Wed, 8 Aug 2001 00:26:42 -0700 (PDT) Message-ID: <3B70E9DB.B16F409C@mindspring.com> Date: Wed, 08 Aug 2001 00:27:23 -0700 From: Terry Lambert Reply-To: tlambert2@mindspring.com X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: void Cc: freebsd-hackers@freebsd.org Subject: Re: Allocate a page at interrupt time References: <200108070739.f777dmi08218@mass.dis.org> <3B6FB0AE.8D40EF5D@mindspring.com> <20010807221509.A24999@firedrake.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG void wrote: > > Can you name one SMP OS implementation that uses an > > "interrupt threads" approach that doesn't hit a scaling > > wall at 4 (or fewer) CPUs, due to heavier weight thread > > context switch overhead? > > Solaris, if I remember my Vahalia book correctly (isn't that a favorite > of yours?). As usual, IMO... Yes, I like the Vahalia book; I did technical review of it for Prentice Hall before its publication. Solaris hits the wall a little later, but it still hits the wall. On Intel hardware, it has historically hit it at the same 4 CPUs where everyone else tends to hit it, for the same reasons; as of Solaris 2.6, they have adopted the hybrid per CPU pool model recommended in Vahalia (Chapter 12). While I'm at it, I suppose I should recommend reading the definitive Solaris internals book, to date: Solaris Internals, Core Kernel Architecture Jim Mauro, Richard McDougall Prentice Hall ISBN: 0-13-022496-0 Solaris does use interrupt threads for some interrupts; I don't like the idea, for the reasons stated previously. Solaris claims to scale to 64 processors while maintaining SMP, rather than real or virtual NUMA. It's been my own experience that this scaling claim is not entirely accurate, if what you are doing is a lot of kernel processing. On the other hand, if you are running a lot of non-intersecting user space code (e.g. JVM's or CGI's), it's not as bad (and realized that FreeBSD is not that bad in the same situation, either: it's just not as common in practice as it is in theory). It should be noted that Solaris Interrupt threads are only used for interrupts of priority 10 and below: higher priority interrupts are _NOT_ handled by threads (interrupts at a priority level from 11 to 15). 10 is the clock interrupt. It should also be noted that Solaris maintains a per processor pool of interrupt threads for each of the lower priority interrupts, with a global thread that is used for handling of the clock interrupt. This is _very_ different than taking an interrupt thread, and rescheduling it on an arbitrary CPU, and as others have pointed out, the hardware used to do the scheduling is very different. In the 32 processor Sequent boxes, the actual system bus was different, and directly supported message passing. There is also specific hardware support for handling interrupts via threads, which is really not applicable to x86 or even the Alpha architectures on which FreeBSD currently runs, nor to the IA64 architecture (port in progress). In particular, there is a single system wide table, introduced with the UltraSPARC, that doesn't need to be locked to support interrupt handling. Also, the Sun system is still an IPL system, using level based blocking, rather than masking, and these threads can find themselves blocks on a mutex or condition variable for a relatively long time; if this happens, it resumes the previous thread _but does not drop its IPL below that of the suspended thread_, which is basically the Djikstra Banker's Algorithm method of avoiding priority inversion on interrupts (i.e. ugly). Finally, the Sun system "borrows" the context of the interrupted process (thread) for interrupt handling (the LWP). This is very similar to the technique employed with kernel vs. user space thread associations within the Windows kernels (this was one of the steps I was referring to when I said that NT had dealt with a number of scaling issues before it needed to, so that they would not turn into problems on 8-way and higher systems). Personally, I think that the Sun system is extremely succeptible to receiver livelock (Network interrupts are at 7, and disk interrupts are at 5, which means that so long as you are getting pounded with network interrupts for e.g. NFS read or write requests, you're not going to service the disk interrupts that will let you dispose of the traffic, nor will you run the user space code for things like CGI's or Apache servers trying to service a heavy load of requests for content). I'm also not terrifically impressed with their callout mechanism, when applied to networking, which has a preponderance of fixed, known interval timers, but FreeBSD's isn't really any better, which it comes to huge numbers of network connections, since it will end up hashing 2/4/6/8/... into the same bucket, unordered, which means traversing a large list of timers which are not going to end up expiring (callout wheels are not a good thing to mix with fixed interval timers of relatively long durations, like the 2MSL timers that live in the networking code, or most especially the TIME_WAIT timers). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message