Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Jun 2012 13:08:04 +0200
From:      Luigi Rizzo <rizzo@iet.unipi.it>
To:        "Robert N. M. Watson" <rwatson@freebsd.org>, Konstantin Belousov <kostikbel@gmail.com>, arch@freebsd.org
Subject:   scheduler/context switch cost (Re: Fast gettimeofday(2) and clock_gettime(2))
Message-ID:  <20120611110804.GA8085@onelab2.iet.unipi.it>
In-Reply-To: <E316FFF8-7718-45C8-88F7-3A725B54E976@freebsd.org>
References:  <20120606165115.GQ85127@deviant.kiev.zoral.com.ua> <alpine.BSF.2.00.1206110952570.78881@fledge.watson.org> <20120611091811.GA2337@deviant.kiev.zoral.com.ua> <E316FFF8-7718-45C8-88F7-3A725B54E976@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 11, 2012 at 10:22:31AM +0100, Robert N. M. Watson wrote:
> 
> On 11 Jun 2012, at 10:18, Konstantin Belousov wrote:
...
> > The per-process page looks almost undoable. I think that what could be
> > made working, although with some hacks, is per-CPU page, and the page
> > content update on context switch. This could benefit trivial system calls
> > like getpid(), getppid() and others, but obviously cause increased context
> > switch latency.
> > 
> > Per-CPU page would then solve the proposal of having an indicator of
> > other threads running. I am not sure how much do we care of the potential
> > information leak there.
> 
> FYI, the FreeBSD/MIPS kernel already makes use of an MD per-thread page using a reserved TLB entry switched on each kernel context switch. Interestingly, this model effectively conflicts (semantically) with the higher-level MI per-CPU mechanism. It would be nice to unify across the layers within the kernel, even if not all the way to userspace.

Since you mention context switch times:
when doing latency tests with netmap/VALE i notice horrible RTT
values (relatively speaking -- i am talking about 6us for VALE, and
perhaps 12-14us for netmap) which are most likely responsibility
of slow scheduler and the way we implement poll() in the kernel.

I still need to do some more measurements but for instance,
on ixgbe the delay between interrupt notification (the fast handler,
if you like) and the start of the interrupt thread is never
below 2500 ticks (on a 2.93 GHz machine) and usually around 6000-8000
ticks. This really seems high, and i wonder if it is an
inherent problem or it is a result of some implementation 
or design oversight.
A poll() that may need to block (thus needlessly calling selrecord,
then cleaned up when it finds a ready descriptor afterwards)
seems similarly slow (in the order of a couple of microseconds,
if i remember correctly).

Do people have experience on the performance of the scheduler etc.
and ideas on where to look to improve that ?

cheers
luigi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120611110804.GA8085>