Date: Tue, 22 Jul 2003 16:32:58 -0700 From: Peter Wemm <peter@wemm.org> To: "Poul-Henning Kamp" <phk@phk.freebsd.dk> Cc: Marcel Moolenaar <marcel@xcllnt.net> Subject: Re: cvs commit: src/sys/kern init_main.c kern_malloc.c md5c.c subr_autoconf.c subr_mbuf.c subr_prf.c tty_subr.c vfs_cluster.c vfs_subr.c Message-ID: <20030722233258.6913E2A7EA@canning.wemm.org> In-Reply-To: <16372.1058915887@critter.freebsd.dk>
next in thread | previous in thread | raw e-mail | index | archive | help
"Poul-Henning Kamp" wrote: > If Y < X, then you have by definition a performance gain. Only if you look at the classic model where you ignore things like speculation and assume that every instruction is executed exactly once etc. Mainframe optimization strategy is not necessarily applicable to to contemporary cpus. To consider: - costs of branches and branch prediction hits and misses - cache effects - memory bandwidth effects. eg: uninlining the VOP_* stuff costs a ~5% world slowdown due to extra memory IO for argument processing on i386. - speculative execution - not all the code is executed and so on. If adding 2K of code to the kernel for 3 inlines means that the fast path execution through the extra code is in fact faster in the usual case, then its worth it. We dont have to execute or cache all of that extra 2K of code. cache line granularity and hardware prefetch is limited to 64 or 128 bytes for a reason. I suspect Alan Cox already knows the answer to 'which is faster' in the vm_object_backing_scan() case and he's waiting for you to put your foot in it. :-) Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030722233258.6913E2A7EA>