Date: Tue, 29 Jun 1999 11:05:02 -0600 (MDT) From: Kevin Van maren <vanmaren@cs.utah.edu> To: freebsd-smp@FreeBSD.ORG Subject: Re: high-efficiency SMP locks - submission for review Message-ID: <199906291705.LAA20627@zane.cs.utah.edu>
next in thread | raw e-mail | index | archive | help
I'm really glad to see that there is so much activity on the list! Just a quick summary (from my point of view): 1) Multi-threading the kernel will require locking, not just spl(). 2) Locking will likely slow down uni-processor systems. 3) Removing the BGL will allow more parallelism in the kernel for multiple system-bound applications under SMP. 4) It will be a LOT of work to re-write the kernel to be thread-safe. Changing the execution environment will violate a lot of assumptions. 5) Very few people both know enough about the kernel internals and multiprocessor/multi-threaded locking to do the job "right". 6) The method most likely to succeed will be evolutionary; there is simply too much code to change everything at once and get it all working. SMP will scale with a BGL as long as we minimize the system time. If system time with 4 CPUs is under 10%, the kernel is not the problem. Since it is not always practical to make the kernel fast enough to do that for many applications/workloads, we need to move enough out of the BGL so that we can get the necessary parallelism. Linux lost to NT because of the slow Linux protocol stack tested. The review said that a multi-threaded stack would have made a difference. However, Linux lost on the UP case too, so I doubt that's the only problem. The fact is, if the protocol stack was fast enough, it wouldn't need to be multi-threaded. I would like to see how FreeBSD does on that same test -- it has a much faster TCP/IP stack than Linux (especially using sendfile for the static pages!) Terry Lambert said: > No one is currently bothering with anything but the Intel MESI > coherency model for SMP, anyway, so I don't understand the > relevence of bus coherency to the argument. This is mostly true. Even on the IA64. Section 4.4.6.2 of the manual says that I-caches are not coherent with other I-caches or D-caches. But at least the D-caches are coherent (on my first glance, I thought they weren't either, which really worried me!) On the x86, you do need to lock the bus to guarantee operations are atomic, with the exception of xchg (but not the variants), which is guaranteed to be atomic. They also must be naturally-aligned. To clarify: there DO exist some dual-processor 486 systems. They use APICs, and in *theory* can run FreeBSD without too much difficulty (there is no mptable, the processor APICs are at different addresses, so you have to know which processor you are on to access the APIC, and the AP initialization is a little different). I don't think anyone cares enough to implement the support code, and Intel discontinued the parts necessary to build them, so it probably won't be too painful to break possible support for them. Alfred Perlstein said: > getpid would read the contents of a page mapped into the process' > address space, ie, the kernel sharing info with processes through > shared mappings? Because the cost of setting up the mappings, and the wasted page of memory, greatly exceeds the cost of a system call that is used at most (usually) once per process. The only time getpid() is called several times is during (bad) synthetic benchmarks. Having libc cache the value would be a more viable solution; it would have to trap fork() calls, however, to invalidate the stored pid. Compiling and storing 8 kernels for the loader to choose from sounds like a bad idea as well; it may be practical for CD-ROM installation, although I think it is more likely the user will select the right one. Kevin Van Maren To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199906291705.LAA20627>