Date: Sun, 7 Aug 2005 09:41:32 +0100 From: Doug Rabson <dfr@nlsystems.com> To: Marcel Moolenaar <marcel@FreeBSD.org> Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/ia64/ia64 exception.S interrupt.c machdep.c mp_machdep.c pmap.c trap.c vm_machdep.c src/sys/ia64/include proc.h smp.h Message-ID: <200508070941.33821.dfr@nlsystems.com> In-Reply-To: <200508062028.j76KSJtM019032@repoman.freebsd.org> References: <200508062028.j76KSJtM019032@repoman.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Excellent! When trying to think about per-cpu VHPT in the past, I could never quite see how to handle the collision chains sanely. The solution described below seems ideal. On Saturday 06 August 2005 21:28, Marcel Moolenaar wrote: > marcel 2005-08-06 20:28:19 UTC > > FreeBSD src repository > > Modified files: > sys/ia64/ia64 exception.S interrupt.c machdep.c > mp_machdep.c pmap.c trap.c vm_machdep.c > sys/ia64/include proc.h smp.h > Log: > Improve SMP support: > o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU > uses to look up translations it can't find in the TLB. As such, > the VHPT serves as a level 1 cache (the TLB being a level 0 > cache) and best results are obtained when it's not shared between > CPUs. The collision chain (i.e. the hash bucket) is shared between > CPUs, as all buckets together constitute our collection of PTEs. To > achieve this, the collision chain does not point to the first PTE in > the list anymore, but to a hash bucket head structure. The head > structure contains the pointer to the first PTE in the list, as well > as a mutex to lock the bucket. Thus, each bucket is locked > independently of each other. With at least 1024 buckets in the VHPT, > this provides for sufficiently finei-grained locking to make the > ssolution scalable to large SMP machines. > o Add synchronisation to the lazy FP context switching. We do this > with a seperate per-thread lock. On SMP machines the lazy high > FP context switching without synchronisation caused inconsistent > state, which resulted in a panic. Since the use of the high FP > registers is not common, it's possible that races exist. The ia64 > package build has proven to be a good stress test, so this will get > plenty of exercise in the near future. > o Don't use the local ID of the processor we want to send the IPI > to as the argument to ipi_send(). use the struct pcpu pointer > instead. The reason for this is that IPI delivery is unreliable. It > has been observed that sending an IPI to a CPU causes it to receive a > stray external interrupt. As such, we need a way to make the delivery > reliable. The intended solution is to queue requests in the target > CPU's per-CPU structure and use a single IPI to inform the CPU that > there's a new entry in the queue. If that IPI gets lost, the CPU can > check it's queue at any convenient time (such as for each clock > interrupt). This also allows us to send requests to a CPU without > interrupting it, if such would be beneficial. > > With these changes SMP is almost working. There are still some > random process crashes and the machine can hang due to having the IPI > lost that deals with the high FP context switch. > > The overhead of introducing the hash bucket head structure results > in a performance degradation of about 1% for UP (extra pointer > indirection). This is surprisingly small and is offset by gaining > reasonably/good scalable SMP support. > > Revision Changes Path > 1.57 +8 -0 src/sys/ia64/ia64/exception.S > 1.50 +5 -0 src/sys/ia64/ia64/interrupt.c > 1.201 +30 -13 src/sys/ia64/ia64/machdep.c > 1.56 +29 -25 src/sys/ia64/ia64/mp_machdep.c > 1.161 +227 -272 src/sys/ia64/ia64/pmap.c > 1.114 +12 -7 src/sys/ia64/ia64/trap.c > 1.91 +1 -0 src/sys/ia64/ia64/vm_machdep.c > 1.15 +2 -1 src/sys/ia64/include/proc.h > 1.10 +4 -2 src/sys/ia64/include/smp.h
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200508070941.33821.dfr>