From owner-cvs-src@FreeBSD.ORG Thu Nov 29 02:01:21 2007 Return-Path: Delivered-To: cvs-src@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B391E16A41A; Thu, 29 Nov 2007 02:01:21 +0000 (UTC) (envelope-from bde@FreeBSD.org) Received: from repoman.freebsd.org (repoman.freebsd.org [IPv6:2001:4f8:fff6::29]) by mx1.freebsd.org (Postfix) with ESMTP id BA58913C4DD; Thu, 29 Nov 2007 02:01:21 +0000 (UTC) (envelope-from bde@FreeBSD.org) Received: from repoman.freebsd.org (localhost [127.0.0.1]) by repoman.freebsd.org (8.14.1/8.14.1) with ESMTP id lAT21LWN078967; Thu, 29 Nov 2007 02:01:21 GMT (envelope-from bde@repoman.freebsd.org) Received: (from bde@localhost) by repoman.freebsd.org (8.14.1/8.14.1/Submit) id lAT21LgY078966; Thu, 29 Nov 2007 02:01:21 GMT (envelope-from bde) Message-Id: <200711290201.lAT21LgY078966@repoman.freebsd.org> From: Bruce Evans Date: Thu, 29 Nov 2007 02:01:21 +0000 (UTC) To: src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org X-FreeBSD-CVS-Branch: HEAD Cc: Subject: cvs commit: src/sys/i386/isa prof_machdep.c src/sys/amd64/amd64 prof_machdep.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Nov 2007 02:01:21 -0000 bde 2007-11-29 02:01:21 UTC FreeBSD src repository Modified files: sys/i386/isa prof_machdep.c sys/amd64/amd64 prof_machdep.c Log: Don't use plain "ret" instructions at targets of jump instructions, since the branch caches on at least Athlon XP through Athlon 64 CPU's don't understand such instructions and guarantee a cache miss taking at least 10 cycles. Use the documented workaround "ret $0" instead ("nop; ret" also works, but "ret $0" is probably faster on old CPUs). Normal code (even asm code) doesn't branch to "ret", since there is usually some cleanup to do, but the __mcount, .mcount and .mexitcount entry points were optimized too well to have the minimum number of instructions (3 instructions each if profiling is not enabled) and they did this. I didn't see a significant number of cache misses for .mexitcount, but for the shared "ret" for __mcount and .mcount I observed cache misses costing 26 cycles each. For a send(2) syscall that makes about 70 function calls, the cost of these cache misses alone increased the syscall time from about 4000 cycles to about 7000 cycles. 4000 is for a profiling (GUPROF) kernel with profiling disabled; after this fix, configuring profiling only costs about 600 cycles in the 4000, which is consistent with almost perfect branch prediction in the mcounting calls. Revision Changes Path 1.31 +2 -2 src/sys/amd64/amd64/prof_machdep.c 1.32 +2 -2 src/sys/i386/isa/prof_machdep.c