From owner-freebsd-smp Fri Mar 29 11:29:39 2002 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 5189E37B405; Fri, 29 Mar 2002 11:29:18 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g2TJTIi59785; Fri, 29 Mar 2002 11:29:18 -0800 (PST) (envelope-from dillon) Date: Fri, 29 Mar 2002 11:29:18 -0800 (PST) From: Matthew Dillon Message-Id: <200203291929.g2TJTIi59785@apollo.backplane.com> To: freebsd-smp@freebsd.org Cc: John Baldwin Subject: Syscall contention tests return, userret() bugs/issues. Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Ok, now that the last mess has been dealt with I have resumed my examination of the critical path. NOTE!!!!! While I do want to fix this stuff, the patch included below is *NOT* a fix, it exists for testing purposes only and is missing things like, oh, necessary memory barriers for certain architectures. This examination has also turned up two possible bugs in userret(), which I describe later. CONTENTION TESTS My main goal is to try to remove all memory contention from the syscall, interrupt, and trap critical paths for two processes on an SMP build. I am focusing on the syscall path right now. The recent td_ucred stuff fixed some of this. We still have three pieces of contended memory in the standard syscall path. These are fairly important because the contention occurs in the critical path for every system call made. * the syscall counter * userret() / CURSIG handling * userret() / NEEDRESCHED handling I have rerun my contention tests with the most recent kernel + some hacks (FOR TESTING PURPOSES ONLY RIGHT NOW!). The kernel was compiled *without* WITNESS or DIAGNOSTIC. [------- SMP build -----] 1-Process 2-Process atomic syscall counter 860836 699591 -18.7% NOTE A no syscall counter 862822 768512 -10.9% NOTE A per-cpu syscall counter 882311 786000 -10.9% NOTE A per-cpu syscall counter 1004050 1001611 -0.2% NOTE B NOTE A: locking removed from userret for CURSIG case NOTE B: locking removed from userret for CURSIG *and* NEEDRESCHED test There are two interesting pieces of information I get out of this test. First, the difference between the atomic-locked syscall counter and the per-cpu syscall counter in the one-process case is fairly minor, but that it blows up in our faces the moment one runs more then one process. Specifically, that one atomic op results in a 8.5% loss in performance when two cpu's are contending for the global. This loss is cumulative so you can imagine the loss in a 4-cpu system. For some odd reason completely unknown to me, performance actually increased when I went from *NO* syscall counter to the per-cpu syscall counter! I have no clue as to why. I tested it several times! The second interesting piece of information is the performance gain that occurs when I don't get the sched_lock in userret() to do the initial NEEDRESCHED test (I'm already not getting Giant in the CURSIG code in userret() for all of the tests). Performance increases radically, by almost 16.3%, in the single-process case and, more importantly, the two-process case shows that there is no longer any contention between the processors. Not only that, but I think we can be proud of the fact that it is possible to achieve sub-1uS syscall overheads under SMP (well, the test box is a 1.1GHz machine :-)) It means that we are doing something right. WORKING ON THE ISSUE My conclusion is that it should be possible for us to remove critical path contention at least for system calls in fairly short order. I would like to persue this - that is, find real solutions for the three areas that my hacks show to be an issue (in regards to system calls). So I am opening up debate on how to fix these areas: * system call counter (and counters in the kernel in general) My current hack does not properly protect the counter. Properly speaking, due to not being a single instruction, the per-cpu must be protected by critical_enter()/critical_exit(). I am thinking of adding a PCPU_ADD_INT() macro to all architectures and putting all 'global' system counters in the per-cpu area, i.e. embed the struct vmmeter structure in the per-cpu area. sysctl would combine the information from all the per-cpu areas (i.e. for systat / vmstat reporting). ( I am not covering things like interface counters here, just global system counters ). Comments? * userret / CURSIG case. My current hack is somewhat messy. Presumably we will eventually be able to remove Giant from this path. The question is: When? And what is the current state of work in that area? Also, it seems to me that a signal can come in after the check (with or without my hack) and not get recognized until the next system call or interrupt. This is bad. Comments? John? * userret / NEEDRESCHED case. This case seems rather odd to me. I understand why we might need to obtain the sched_lock as a memory barrier but what happens if NEEDRESCHED is set after we release the sched_lock? Interrupts are enabled at the point where userret() is called, there is nothing preventing either a signal or a NEEDRESCHED from occuring after we've tested it but before we've returned to usermode, causing us to not honor either until the next interrupt or syscall. Has anyone grappled with this issue yet? -Matt Matthew Dillon (PATCH USED TO REPRODUCE THE ABOVE TESTS) Index: i386/i386/trap.c =================================================================== RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v retrieving revision 1.220 diff -u -r1.220 trap.c --- i386/i386/trap.c 21 Mar 2002 19:27:15 -0000 1.220 +++ i386/i386/trap.c 29 Mar 2002 19:02:16 -0000 @@ -941,7 +941,14 @@ int args[8]; u_int code; + /* + * WARNING! FOR TESTING ONLY, per-cpu increment is not protected by + * critical_*() (add PCPU_INCR_INT macro?) + */ +#if 0 atomic_add_int(&cnt.v_syscall, 1); +#endif + ++*PCPU_PTR(v_syscall); #ifdef DIAGNOSTIC if (ISPL(frame.tf_cs) != SEL_UPL) { Index: kern/subr_trap.c =================================================================== RCS file: /home/ncvs/src/sys/kern/subr_trap.c,v retrieving revision 1.212 diff -u -r1.212 subr_trap.c --- kern/subr_trap.c 21 Mar 2002 06:11:08 -0000 1.212 +++ kern/subr_trap.c 29 Mar 2002 18:43:43 -0000 @@ -72,13 +72,33 @@ struct ksegrp *kg = td->td_ksegrp; int sig; +#if 0 mtx_lock(&Giant); PROC_LOCK(p); while ((sig = CURSIG(p)) != 0) postsig(sig); PROC_UNLOCK(p); mtx_unlock(&Giant); +#else + PROC_LOCK(p); + if ((sig = CURSIG(p)) != 0) { + PROC_UNLOCK(p); + mtx_lock(&Giant); + PROC_LOCK(p); + while ((sig = CURSIG(p)) != 0) + postsig(sig); + PROC_UNLOCK(p); + mtx_unlock(&Giant); + } else { + PROC_UNLOCK(p); + } +#endif +#if 1 + if (td->td_priority != kg->kg_user_pri || + (ke->ke_flags & KEF_NEEDRESCHED) || + (p->p_sflag & PS_PROFIL)) { +#endif mtx_lock_spin(&sched_lock); td->td_priority = kg->kg_user_pri; if (ke->ke_flags & KEF_NEEDRESCHED) { @@ -106,8 +126,12 @@ ticks = ke->ke_sticks - oticks; mtx_unlock_spin(&sched_lock); addupc_task(ke, TRAPF_PC(frame), (u_int)ticks * psratio); - } else + } else { mtx_unlock_spin(&sched_lock); + } +#if 1 + } +#endif } /* Index: sys/pcpu.h =================================================================== RCS file: /home/ncvs/src/sys/sys/pcpu.h,v retrieving revision 1.7 diff -u -r1.7 pcpu.h --- sys/pcpu.h 20 Mar 2002 18:01:52 -0000 1.7 +++ sys/pcpu.h 29 Mar 2002 19:26:39 -0000 @@ -58,6 +58,10 @@ u_int pc_cpuid; /* This cpu number */ u_int pc_cpumask; /* This cpu mask */ u_int pc_other_cpus; /* Mask of all other cpus */ + /* + * XXX embedd vmmeter structure here? + */ + u_int pc_v_syscall; /* system calls counter */ SLIST_ENTRY(pcpu) pc_allcpu; struct lock_list_entry *pc_spinlocks; #ifdef KTR_PERCPU To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message