Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Mar 2002 17:17:52 -0500 (EST)
From:      John Baldwin <jhb@FreeBSD.org>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        freebsd-smp@FreeBSD.ORG
Subject:   Re: RE: Syscall contention tests return, userret() bugs/issues.
Message-ID:  <XFMail.20020329171752.jhb@FreeBSD.org>
In-Reply-To: <200203292207.g2TM7Fi67491@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 29-Mar-2002 Matthew Dillon wrote:
>:Actually, I was thinking of just using a single counter but only doing
>:unlocked
>:increments on it.  It's just stats so it's not that important.  If people
>:really want to make it important then we should worry about getting it
>:perfect,
>:but if it doesn't need to be perfect I wouldn't worry about it.
> 
>     This won't save you from stalls.  Whenever one cpu writes to the cache,
>     on IA32, it's write-through to main-memory which invalidates all other
>     cpu's caches for that cache line.  The result is that those cpu's will
>     still stall on the read cycle.  It's easy to demonstrate.  If you
>     change this:
> 
>#if 0
>         atomic_add_int(&cnt.v_syscall, 1);
>#endif
>       ++*PCPU_PTR(v_syscall);
> 
>     to this:
> 
>#if 0
>         atomic_add_int(&cnt.v_syscall, 1);
>       ++*PCPU_PTR(v_syscall);
>#endif
>       ++cnt.v_syscall;
> 
>     And do a compative syscall rate test on a two-cpu system running
>     two getuid() processes, this happens:
> 
>                               1 process       2 processes
>     w/PCPU:                   1004000         1000000
>     w/++cnt.v_syscall:                1004000          853000
> 
>     BEWM!  This is again because Intel has a write-through cache
> architecture.
>     In a delayed-write cache architecture performance would not be impacted
>     (depending on the cache algorithm the system might allow for multiple
>     cpu's to have the same cache line marked master/dirty) but then you have
>     other problems...  dirty cache lines on multiple cpu's never getting
>     flushed, resulting in seriously broken statistics.
> 
>     This is why even obtaining a globally shared mutex in the non-contended
>     case, like sched_lock or Giant, can wind up being quite expensive, and
>     why we can't simply use ++<global_counter> in the critical path.

Hmm, I'm not sure of the best way of handling this stat then.  ++*PCPU_PTR()
might actually be ok at least for all of our current archs.  (If you migrate
the update the counter of another CPU but that would be rare.)

>:>     In that case, why is userret() checking KEF_NEEDRESCHED at all?  Is it
>:>     needed for non-i386 architectures or is it simply a lazy optimization?
>:>     (lazy == technical term).  It looks like the rescheduling code could be
>:>     moved out of userret() and into ast() directly.
>:
>:Yes, it could.  In fact, most of userret() probably could just move into
>:ast()
>:now.  Part of this is due to the fact that ast() used to be an actual trap of
>:type T_AST and relied on the fact that the trap return called userret().  The
>:new ast() should only be checking NEEDRESCHED now inside of userret(). 
>:Either
>:that or we merge ast() and userret() and call them userret() (I think
>:userret()
>:is the more sensible name personally.)
>:
>:>     Are signals dealt with the same way?  If not we have a race in the
>:>     signal
>:>     handling code.  If so we can probably move that to a KEF flag / ast()
>:>     as well (which would either move the CURSIG/postsig loop from userret()
>:>     or ast(), or move it and retain it in userret() as a lazy optimization
>:>     whos initial CURSIG() check can be done without a lock.
>:
>:Signals set ASTPENDING.  Bruce does want to use a separate flag for signals
>:so that we don't have to call CURSIG() except when a signal is actually
>:pending.
>:-- 
>:
>:John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> 
>     This looks like a good area of work.  If nobody has code worked up
>     for it I can research the issue more, add the flag, and shift things
>     into ast().  (After I finish the stage-2 cpu_critical*() stuff which
>     re-inlines cpu_critical_enter() and cpu_critical_exit(), which I've
>     been talking to Jake about).

Sure, Bruce already has some patches for this stuff so you might ask him about
it.  Jake might also have some ideas on the topic, but sounds good to me.

-- 

John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.20020329171752.jhb>