From owner-freebsd-smp  Fri Mar 29 11:29:39 2002
Delivered-To: freebsd-smp@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5189E37B405; Fri, 29 Mar 2002 11:29:18 -0800 (PST)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.11.6/8.9.1) id g2TJTIi59785;
	Fri, 29 Mar 2002 11:29:18 -0800 (PST)
	(envelope-from dillon)
Date: Fri, 29 Mar 2002 11:29:18 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200203291929.g2TJTIi59785@apollo.backplane.com>
To: freebsd-smp@freebsd.org
Cc: John Baldwin <jhb@freebsd.org>
Subject: Syscall contention tests return, userret() bugs/issues.
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-smp.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-smp>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-smp>
X-Loop: FreeBSD.org

    Ok, now that the last mess has been dealt with I have resumed my
    examination of the critical path.

    NOTE!!!!!  While I do want to fix this stuff, the patch included below
    is *NOT* a fix, it exists for testing purposes only and is missing
    things like, oh, necessary memory barriers for certain architectures.
    This examination has also turned up two possible bugs in userret(),
    which I describe later.

				CONTENTION TESTS

    My main goal is to try to remove all memory contention from the syscall,
    interrupt, and trap critical paths for two processes on an SMP build.
    I am focusing on the syscall path right now.

    The recent td_ucred stuff fixed some of this.  We still have three
    pieces of contended memory in the standard syscall path.  These are
    fairly important because the contention occurs in the critical path
    for every system call made.

    * the syscall counter
    * userret() / CURSIG handling
    * userret() / NEEDRESCHED handling

    I have rerun my contention tests with the most recent kernel + some
    hacks (FOR TESTING PURPOSES ONLY RIGHT NOW!).  The kernel was compiled
    *without* WITNESS or DIAGNOSTIC.

                        [------- SMP build -----]
                        1-Process       2-Process

atomic syscall counter  860836          699591  -18.7%          NOTE A
no syscall counter      862822          768512  -10.9%          NOTE A
per-cpu syscall counter 882311          786000  -10.9%          NOTE A
per-cpu syscall counter 1004050         1001611 -0.2%           NOTE B

NOTE A: locking removed from userret for CURSIG case
NOTE B: locking removed from userret for CURSIG *and* NEEDRESCHED test 

    There are two interesting pieces of information I get out of this
    test.  First, the difference between the atomic-locked syscall counter
    and the per-cpu syscall counter in the one-process case is fairly
    minor, but that it blows up in our faces the moment one runs more
    then one process.  Specifically, that one atomic op results in a 8.5% 
    loss in performance when two cpu's are contending for the global.  This
    loss is cumulative so you can imagine the loss in a 4-cpu system.

    For some odd reason completely unknown to me, performance actually
    increased when I went from *NO* syscall counter to the per-cpu syscall
    counter!  I have no clue as to why.  I tested it several times!

    The second interesting piece of information is the performance gain
    that occurs when I don't get the sched_lock in userret() to do the
    initial NEEDRESCHED test (I'm already not getting Giant in the CURSIG
    code in userret() for all of the tests).  Performance increases
    radically, by almost 16.3%, in the single-process case and, more
    importantly, the two-process case shows that there is no longer
    any contention between the processors.  Not only that, but I think
    we can be proud of the fact that it is possible to achieve
    sub-1uS syscall overheads under SMP (well, the test box is a 1.1GHz
    machine :-))  It means that we are doing something right.

				WORKING ON THE ISSUE

    My conclusion is that it should be possible for us to remove critical
    path contention at least for system calls in fairly short order.  I would
    like to persue this - that is, find real solutions for the three areas
    that my hacks show to be an issue (in regards to system calls).

    So I am opening up debate on how to fix these areas:

    * system call counter (and counters in the kernel in general)

	My current hack does not properly protect the counter.  Properly
	speaking, due to not being a single instruction, the per-cpu
	must be protected by critical_enter()/critical_exit().

	I am thinking of adding a PCPU_ADD_INT() macro to all architectures
	and putting all 'global' system counters in the per-cpu area, i.e.
	embed the struct vmmeter structure in the per-cpu area.  sysctl
	would combine the information from all the per-cpu areas (i.e. for
	systat / vmstat reporting).

	( I am not covering things like interface counters here, just global
	system counters ).

	Comments?

    * userret / CURSIG case.

	My current hack is somewhat messy.  Presumably we will eventually be
	able to remove Giant from this path.  The question is:  When?  And
	what is the current state of work in that area?  Also, it seems to
	me that a signal can come in after the check (with or without my
	hack) and not get recognized until the next system call or interrupt.
	This is bad.

	Comments? John?

    * userret / NEEDRESCHED case.

	This case seems rather odd to me.  I understand why we might need to
	obtain the sched_lock as a memory barrier but what happens if 
	NEEDRESCHED is set after we release the sched_lock?  Interrupts
	are enabled at the point where userret() is called, there is
	nothing preventing either a signal or a NEEDRESCHED from occuring
	after we've tested it but before we've returned to usermode,
	causing us to not honor either until the next interrupt or syscall.

	Has anyone grappled with this issue yet?

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

		(PATCH USED TO REPRODUCE THE ABOVE TESTS)

Index: i386/i386/trap.c
===================================================================
RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v
retrieving revision 1.220
diff -u -r1.220 trap.c
--- i386/i386/trap.c	21 Mar 2002 19:27:15 -0000	1.220
+++ i386/i386/trap.c	29 Mar 2002 19:02:16 -0000
@@ -941,7 +941,14 @@
 	int args[8];
 	u_int code;
 
+	/*
+	 * WARNING! FOR TESTING ONLY, per-cpu increment is not protected by
+	 * critical_*() (add PCPU_INCR_INT macro?)
+	 */
+#if 0
 	atomic_add_int(&cnt.v_syscall, 1);
+#endif
+	++*PCPU_PTR(v_syscall);
 
 #ifdef DIAGNOSTIC
 	if (ISPL(frame.tf_cs) != SEL_UPL) {
Index: kern/subr_trap.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/subr_trap.c,v
retrieving revision 1.212
diff -u -r1.212 subr_trap.c
--- kern/subr_trap.c	21 Mar 2002 06:11:08 -0000	1.212
+++ kern/subr_trap.c	29 Mar 2002 18:43:43 -0000
@@ -72,13 +72,33 @@
 	struct ksegrp *kg = td->td_ksegrp;
 	int sig;
 
+#if 0
 	mtx_lock(&Giant);
 	PROC_LOCK(p);
 	while ((sig = CURSIG(p)) != 0)
 		postsig(sig);
 	PROC_UNLOCK(p);
 	mtx_unlock(&Giant);
+#else
+	PROC_LOCK(p);
+	if ((sig = CURSIG(p)) != 0) {
+		PROC_UNLOCK(p);
+		mtx_lock(&Giant);
+		PROC_LOCK(p);
+		while ((sig = CURSIG(p)) != 0)
+			postsig(sig);
+		PROC_UNLOCK(p);
+		mtx_unlock(&Giant);
+	} else {
+		PROC_UNLOCK(p);
+	}
+#endif
 
+#if 1
+	if (td->td_priority != kg->kg_user_pri ||
+	    (ke->ke_flags & KEF_NEEDRESCHED) ||
+	    (p->p_sflag & PS_PROFIL)) {
+#endif
 	mtx_lock_spin(&sched_lock);
 	td->td_priority = kg->kg_user_pri;
 	if (ke->ke_flags & KEF_NEEDRESCHED) {
@@ -106,8 +126,12 @@
 		ticks = ke->ke_sticks - oticks;
 		mtx_unlock_spin(&sched_lock);
 		addupc_task(ke, TRAPF_PC(frame), (u_int)ticks * psratio);
-	} else
+	} else {
 		mtx_unlock_spin(&sched_lock);
+	}
+#if 1
+	}
+#endif
 }
 
 /*
Index: sys/pcpu.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/pcpu.h,v
retrieving revision 1.7
diff -u -r1.7 pcpu.h
--- sys/pcpu.h	20 Mar 2002 18:01:52 -0000	1.7
+++ sys/pcpu.h	29 Mar 2002 19:26:39 -0000
@@ -58,6 +58,10 @@
 	u_int		pc_cpuid;		/* This cpu number */
 	u_int		pc_cpumask;		/* This cpu mask */
 	u_int		pc_other_cpus;		/* Mask of all other cpus */
+	/*
+	 * XXX embedd vmmeter structure here?
+	 */
+	u_int		pc_v_syscall;		/* system calls counter */
 	SLIST_ENTRY(pcpu) pc_allcpu;
 	struct lock_list_entry *pc_spinlocks;
 #ifdef KTR_PERCPU

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message