From owner-cvs-src@FreeBSD.ORG Tue Jul 22 16:32:59 2003 Return-Path: Delivered-To: cvs-src@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6C11037B401; Tue, 22 Jul 2003 16:32:59 -0700 (PDT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id D04AC43FBF; Tue, 22 Jul 2003 16:32:58 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id 6913E2A7EA; Tue, 22 Jul 2003 16:32:58 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: "Poul-Henning Kamp" In-Reply-To: <16372.1058915887@critter.freebsd.dk> Date: Tue, 22 Jul 2003 16:32:58 -0700 From: Peter Wemm Message-Id: <20030722233258.6913E2A7EA@canning.wemm.org> cc: "Alan L. Cox" cc: src-committers@FreeBSD.org cc: Bosko Milekic cc: Bruce Evans cc: cvs-src@FreeBSD.org cc: Steve Kargl cc: cvs-all@FreeBSD.org cc: Marcel Moolenaar Subject: Re: cvs commit: src/sys/kern init_main.c kern_malloc.c md5c.c subr_autoconf.c subr_mbuf.c subr_prf.c tty_subr.c vfs_cluster.c vfs_subr.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Jul 2003 23:32:59 -0000 "Poul-Henning Kamp" wrote: > If Y < X, then you have by definition a performance gain. Only if you look at the classic model where you ignore things like speculation and assume that every instruction is executed exactly once etc. Mainframe optimization strategy is not necessarily applicable to to contemporary cpus. To consider: - costs of branches and branch prediction hits and misses - cache effects - memory bandwidth effects. eg: uninlining the VOP_* stuff costs a ~5% world slowdown due to extra memory IO for argument processing on i386. - speculative execution - not all the code is executed and so on. If adding 2K of code to the kernel for 3 inlines means that the fast path execution through the extra code is in fact faster in the usual case, then its worth it. We dont have to execute or cache all of that extra 2K of code. cache line granularity and hardware prefetch is limited to 64 or 128 bytes for a reason. I suspect Alan Cox already knows the answer to 'which is faster' in the vm_object_backing_scan() case and he's waiting for you to put your foot in it. :-) Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5