From owner-freebsd-hackers@FreeBSD.ORG Sun Dec 12 16:52:44 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CC4FA106564A for ; Sun, 12 Dec 2010 16:52:44 +0000 (UTC) (envelope-from extrudedaluminiu@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 7DEA88FC15 for ; Sun, 12 Dec 2010 16:52:44 +0000 (UTC) Received: by qwj9 with SMTP id 9so5375003qwj.13 for ; Sun, 12 Dec 2010 08:52:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:date:from:x-x-sender :to:subject:message-id:user-agent:mime-version:content-type; bh=pX9MAS/c0Jchyms+ke4S6pb3Lyw5Pd34Zev56ZOY5zQ=; b=LTM2iPZVzqvBqjCZZiHEIL4+Iy/bIbKK+HjpUyDr5zZG7ZIxw55MJ0zNZq90q6KlOB M3NM+Yg6LbxDRkDIVyiShKDvP3T4Kepp6xalPtDiG2nujp8MTUvnb6+QPPjz9armaqNE sHozPeTbdz1DIFgFq9lBHE6m4pgmShRsmgRXE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:date:from:x-x-sender:to:subject:message-id:user-agent :mime-version:content-type; b=W1uBCTMnMuFSMYdfX1jY6iD/MDXDkIGbidAz7yKMDIb3JoWhMvyR6/JTlFyXCKYuXW ie6B1MeqLcI/8BqnRvzPMcKdu/tEGBsdTum7iyCTTMBvM0FiaY9Y/qGoTlSDWrLz+A9j fhSr//tK32eRoxSGU+gfDqNK4aYB1m+oGOTzM= Received: by 10.229.85.203 with SMTP id p11mr2855586qcl.76.1292172763504; Sun, 12 Dec 2010 08:52:43 -0800 (PST) Received: from batman.acm.jhu.edu (batman.acm.jhu.edu [128.220.251.35]) by mx.google.com with ESMTPS id y17sm3428061qci.33.2010.12.12.08.52.42 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 12 Dec 2010 08:52:43 -0800 (PST) Sender: Venkatesh Srinivas Date: Sun, 12 Dec 2010 11:52:42 -0500 (EST) From: Venkatesh Srinivas X-X-Sender: me@centaur.acm.jhu.edu To: freebsd-hackers@freebsd.org Message-ID: User-Agent: Alpine 2.00 (LFD 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Mailman-Approved-At: Sun, 12 Dec 2010 17:39:46 +0000 Subject: amd64 pmap pagecopy() optimization()? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Dec 2010 16:52:44 -0000 Hi, In svn r127653, a microoptimized pagecopy() implementation was added to amd64's support.S. The pagecopy() prefetches the entire page first and then uses a partly-unrolled loop of loads & non-temporal stores. The commit notes 'it is roughly four times faster than bcopy() for uncached pages'. Just wondering, how was this measured? I ported the routine to i386 and tried it out in userland, but found it between four and six times slower than the BSD and GNU libc bcopy()ies; I admit to not trying very hard to measure on only uncached pages though... Also, why prefetch the entire page before the load / NT store loop? If I read the Intel optimization guide correctly, a loop of prefetch(n+1) / load / store would be a better call? (I tried this on i386 also, it was a bit faster than the current style, but still nowhere near bcopy()...). Thanks! -- vs