From owner-freebsd-net@FreeBSD.ORG Fri Oct 14 10:39:35 2005 Return-Path: X-Original-To: net@FreeBSD.org Delivered-To: freebsd-net@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A0EF516A41F for ; Fri, 14 Oct 2005 10:39:35 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4273743D48 for ; Fri, 14 Oct 2005 10:39:35 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.48.2]) by phk.freebsd.dk (Postfix) with ESMTP id 41CDFBC66; Fri, 14 Oct 2005 10:39:31 +0000 (UTC) To: Bruce Evans From: "Poul-Henning Kamp" In-Reply-To: Your message of "Fri, 14 Oct 2005 19:41:48 +1000." <20051014192509.F80520@delplex.bde.org> Date: Fri, 14 Oct 2005 12:39:30 +0200 Message-ID: <12907.1129286370@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: Garrett Wollman , Andrew Gallatin , net@FreeBSD.org Subject: Re: Call for performance evaluation: net.isr.direct (fwd) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Oct 2005 10:39:35 -0000 In message <20051014192509.F80520@delplex.bde.org>, Bruce Evans writes: >On Fri, 14 Oct 2005, Poul-Henning Kamp wrote: > >> In message <17230.62415.991707.840932@grasshopper.cs.duke.edu>, Andrew Gallatin >> writes: >> >>> Linux already takes care of syncing the TSC between SMP cpus, so we >>> know it is possible. This seems like a much more doable optimization. >>> And it is likely to have other benefits.. > >The timestamps in mi_switch() are taken on the same CPU and only their >differences are used, so they don't even need to be synced. It they >use the TSC, then the TSCs just need to have the same almost-constant >frequency (or different almost-constant frequencies if timecounters >werre per-CPU). Actually, I think we need to go back a step further. The task of the scheduler is to hand out a finite resource according to a set policy. The finite resource is "instructions executed by a CPU". It used to be that CPUs ran at constant clock rates, and therefore implementors made the simplifying assumption that instructions = a * time for some random but constant a and made their scheduling decisions based on time. Today CPUs do not run on constant rates but they have counters which count the number of instruction cycles. Therefore talking about computer effort in terms of "CPU second" is like selling rubber band by the inch. The scheduler has a side job of accounting for CPU usage and the API for accesing this info has unfortunately been specified in terms of time rather than instructions. The best compromise solution therefore is to change the scheduler to make decisions based on the TSC ticks (or equivalent on other archs) and at regular intervals figure out how fast the CPU ran in the last period and convert the TSC ticks accumulated to a time unit suitable for resource accounting. The bad solution is to try to do timekeeping based on hardware counters which are unsuitable for the purpose, the TSC being the primary suspect here, and we will not do that. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.