From owner-freebsd-arch@FreeBSD.ORG Mon Jun 11 10:49:21 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 28247106566B; Mon, 11 Jun 2012 10:49:21 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id D9DEB8FC0A; Mon, 11 Jun 2012 10:49:19 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 2036F7300A; Mon, 11 Jun 2012 13:08:04 +0200 (CEST) Date: Mon, 11 Jun 2012 13:08:04 +0200 From: Luigi Rizzo To: "Robert N. M. Watson" , Konstantin Belousov , arch@freebsd.org Message-ID: <20120611110804.GA8085@onelab2.iet.unipi.it> References: <20120606165115.GQ85127@deviant.kiev.zoral.com.ua> <20120611091811.GA2337@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: Subject: scheduler/context switch cost (Re: Fast gettimeofday(2) and clock_gettime(2)) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jun 2012 10:49:21 -0000 On Mon, Jun 11, 2012 at 10:22:31AM +0100, Robert N. M. Watson wrote: > > On 11 Jun 2012, at 10:18, Konstantin Belousov wrote: ... > > The per-process page looks almost undoable. I think that what could be > > made working, although with some hacks, is per-CPU page, and the page > > content update on context switch. This could benefit trivial system calls > > like getpid(), getppid() and others, but obviously cause increased context > > switch latency. > > > > Per-CPU page would then solve the proposal of having an indicator of > > other threads running. I am not sure how much do we care of the potential > > information leak there. > > FYI, the FreeBSD/MIPS kernel already makes use of an MD per-thread page using a reserved TLB entry switched on each kernel context switch. Interestingly, this model effectively conflicts (semantically) with the higher-level MI per-CPU mechanism. It would be nice to unify across the layers within the kernel, even if not all the way to userspace. Since you mention context switch times: when doing latency tests with netmap/VALE i notice horrible RTT values (relatively speaking -- i am talking about 6us for VALE, and perhaps 12-14us for netmap) which are most likely responsibility of slow scheduler and the way we implement poll() in the kernel. I still need to do some more measurements but for instance, on ixgbe the delay between interrupt notification (the fast handler, if you like) and the start of the interrupt thread is never below 2500 ticks (on a 2.93 GHz machine) and usually around 6000-8000 ticks. This really seems high, and i wonder if it is an inherent problem or it is a result of some implementation or design oversight. A poll() that may need to block (thus needlessly calling selrecord, then cleaned up when it finds a ready descriptor afterwards) seems similarly slow (in the order of a couple of microseconds, if i remember correctly). Do people have experience on the performance of the scheduler etc. and ideas on where to look to improve that ? cheers luigi