From owner-freebsd-smp  Fri Mar 29 14: 7:22 2002
Delivered-To: freebsd-smp@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id C386A37B417; Fri, 29 Mar 2002 14:07:15 -0800 (PST)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.11.6/8.9.1) id g2TM7Fi67491;
	Fri, 29 Mar 2002 14:07:15 -0800 (PST)
	(envelope-from dillon)
Date: Fri, 29 Mar 2002 14:07:15 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200203292207.g2TM7Fi67491@apollo.backplane.com>
To: John Baldwin <jhb@FreeBSD.ORG>
Cc: freebsd-smp@FreeBSD.ORG
Subject: Re: RE: Syscall contention tests return, userret() bugs/issues.
References:  <XFMail.20020329155622.jhb@FreeBSD.org>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-smp.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-smp>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-smp>
X-Loop: FreeBSD.org

:>     single-instruction add to the current cpu's BLAH counter).  I could
:>     whip this up in about 10 seconds for i386 but would need help on the
:>     other architectures.
:
:Actually, I was thinking of just using a single counter but only doing unlocked
:increments on it.  It's just stats so it's not that important.  If people
:really want to make it important then we should worry about getting it perfect,
:but if it doesn't need to be perfect I wouldn't worry about it.

    This won't save you from stalls.  Whenever one cpu writes to the cache,
    on IA32, it's write-through to main-memory which invalidates all other
    cpu's caches for that cache line.  The result is that those cpu's will
    still stall on the read cycle.  It's easy to demonstrate.  If you
    change this:

#if 0
        atomic_add_int(&cnt.v_syscall, 1);
#endif
	++*PCPU_PTR(v_syscall);

    to this:

#if 0
        atomic_add_int(&cnt.v_syscall, 1);
	++*PCPU_PTR(v_syscall);
#endif
	++cnt.v_syscall;

    And do a compative syscall rate test on a two-cpu system running
    two getuid() processes, this happens:

				1 process	2 processes
    w/PCPU:			1004000		1000000
    w/++cnt.v_syscall:		1004000		 853000

    BEWM!  This is again because Intel has a write-through cache architecture.
    In a delayed-write cache architecture performance would not be impacted
    (depending on the cache algorithm the system might allow for multiple
    cpu's to have the same cache line marked master/dirty) but then you have
    other problems...  dirty cache lines on multiple cpu's never getting
    flushed, resulting in seriously broken statistics.

    This is why even obtaining a globally shared mutex in the non-contended
    case, like sched_lock or Giant, can wind up being quite expensive, and
    why we can't simply use ++<global_counter> in the critical path.

:>     get bogged down on the ktrace stuff (e.g. 3-4 week timeframe) and need
:>     help just ring me up.  I'll remind you then.  In the mean time I would
:>     like to work on the stats counter and NEEDRESCHED userret issue.
:
:The ktrace stuff is done, I just want to get the td_ucred stuff done first
:since it will include some slight changes to the ktrace stuff as far as
:arguments passed around to make suser and friends happier.  Once that is done
:I'll update the ktrace stuff polish it up (the ktrgenio case still needs some
:work to make sure records still stay in order) and then it can be committed.

    Cool.

:>     Hmm.  Ok, I see how it works.  int0x80_syscall calls syscall() and
:>     then jumps to doreti, which checks for KEF_ASTPENDING or KEF_NEEDRESCHED
:>     with interrupts disabled prior to the iret.
:
:Right.
:
:>     In that case, why is userret() checking KEF_NEEDRESCHED at all?  Is it
:>     needed for non-i386 architectures or is it simply a lazy optimization?
:>     (lazy == technical term).  It looks like the rescheduling code could be
:>     moved out of userret() and into ast() directly.
:
:Yes, it could.  In fact, most of userret() probably could just move into ast()
:now.  Part of this is due to the fact that ast() used to be an actual trap of
:type T_AST and relied on the fact that the trap return called userret().  The
:new ast() should only be checking NEEDRESCHED now inside of userret().  Either
:that or we merge ast() and userret() and call them userret() (I think userret()
:is the more sensible name personally.)
:
:>     Are signals dealt with the same way?  If not we have a race in the signal
:>     handling code.  If so we can probably move that to a KEF flag / ast()
:>     as well (which would either move the CURSIG/postsig loop from userret()
:>     or ast(), or move it and retain it in userret() as a lazy optimization
:>     whos initial CURSIG() check can be done without a lock.
:
:Signals set ASTPENDING.  Bruce does want to use a separate flag for signals
:so that we don't have to call CURSIG() except when a signal is actually pending.
:-- 
:
:John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/

    This looks like a good area of work.  If nobody has code worked up
    for it I can research the issue more, add the flag, and shift things
    into ast().  (After I finish the stage-2 cpu_critical*() stuff which
    re-inlines cpu_critical_enter() and cpu_critical_exit(), which I've
    been talking to Jake about).

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message