From owner-freebsd-arch@FreeBSD.ORG  Mon Jun 11 10:49:21 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 28247106566B;
	Mon, 11 Jun 2012 10:49:21 +0000 (UTC)
	(envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
	by mx1.freebsd.org (Postfix) with ESMTP id D9DEB8FC0A;
	Mon, 11 Jun 2012 10:49:19 +0000 (UTC)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
	id 2036F7300A; Mon, 11 Jun 2012 13:08:04 +0200 (CEST)
Date: Mon, 11 Jun 2012 13:08:04 +0200
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: "Robert N. M. Watson" <rwatson@freebsd.org>,
	Konstantin Belousov <kostikbel@gmail.com>, arch@freebsd.org
Message-ID: <20120611110804.GA8085@onelab2.iet.unipi.it>
References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua>
	<alpine.BSF.2.00.1206110952570.78881@fledge.watson.org>
	<20120611091811.GA2337@deviant.kiev.zoral.com.ua>
	<E316FFF8-7718-45C8-88F7-3A725B54E976@freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <E316FFF8-7718-45C8-88F7-3A725B54E976@freebsd.org>
User-Agent: Mutt/1.4.2.3i
Cc: 
Subject: scheduler/context switch cost (Re: Fast gettimeofday(2) and
	clock_gettime(2))
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Jun 2012 10:49:21 -0000

On Mon, Jun 11, 2012 at 10:22:31AM +0100, Robert N. M. Watson wrote:
> 
> On 11 Jun 2012, at 10:18, Konstantin Belousov wrote:
...
> > The per-process page looks almost undoable. I think that what could be
> > made working, although with some hacks, is per-CPU page, and the page
> > content update on context switch. This could benefit trivial system calls
> > like getpid(), getppid() and others, but obviously cause increased context
> > switch latency.
> > 
> > Per-CPU page would then solve the proposal of having an indicator of
> > other threads running. I am not sure how much do we care of the potential
> > information leak there.
> 
> FYI, the FreeBSD/MIPS kernel already makes use of an MD per-thread page using a reserved TLB entry switched on each kernel context switch. Interestingly, this model effectively conflicts (semantically) with the higher-level MI per-CPU mechanism. It would be nice to unify across the layers within the kernel, even if not all the way to userspace.

Since you mention context switch times:
when doing latency tests with netmap/VALE i notice horrible RTT
values (relatively speaking -- i am talking about 6us for VALE, and
perhaps 12-14us for netmap) which are most likely responsibility
of slow scheduler and the way we implement poll() in the kernel.

I still need to do some more measurements but for instance,
on ixgbe the delay between interrupt notification (the fast handler,
if you like) and the start of the interrupt thread is never
below 2500 ticks (on a 2.93 GHz machine) and usually around 6000-8000
ticks. This really seems high, and i wonder if it is an
inherent problem or it is a result of some implementation 
or design oversight.
A poll() that may need to block (thus needlessly calling selrecord,
then cleaned up when it finds a ready descriptor afterwards)
seems similarly slow (in the order of a couple of microseconds,
if i remember correctly).

Do people have experience on the performance of the scheduler etc.
and ideas on where to look to improve that ?

cheers
luigi