From owner-freebsd-performance@FreeBSD.ORG  Wed Jun 25 19:49:58 2003
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2B64037B401
	for <freebsd-performance@freebsd.org>;
	Wed, 25 Jun 2003 19:49:58 -0700 (PDT)
Received: from stoneport.math.uic.edu (stoneport.math.uic.edu
	[131.193.178.160])
	by mx1.FreeBSD.org (Postfix) with SMTP id 709D44400D
	for <freebsd-performance@freebsd.org>;
	Wed, 25 Jun 2003 19:49:57 -0700 (PDT)
	(envelope-from djb-dsn-1056595829.71392@cr.yp.to)
Received: (qmail 71393 invoked by uid 1017); 26 Jun 2003 02:50:29 -0000
Date: 26 Jun 2003 02:50:29 -0000
Message-ID: <20030626025029.71392.qmail@cr.yp.to>
Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html.
From: "D. J. Bernstein" <djb@cr.yp.to>
To: freebsd-performance@freebsd.org
References: <20030625060629.51087.qmail@cr.yp.to>
	<20030625023621.N17881-100000@mail.chesapeake.net>
	<20030625094301.56349.qmail@cr.yp.to>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Subject: Re: ten thousand small processes
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Jun 2003 02:49:58 -0000

Jon Mini writes:
> I'm sorry, but you are way off here.  First of all, caches are *much
> larger* than the size of the processes you are talking about.

I'm sorry, but you are being misled by a naive model of CPU performance.
On a typical Pentium in our department, the following program becomes
three times faster when SPACING is changed from 4096 to 128:

   #define SPACING 4096
   char data[8 * SPACING];
   main()
   {
     int i;
     for (i = 0;i < 10000000;++i) {
       data[0] = data[SPACING];
       data[2 * SPACING] = data[3 * SPACING];
       data[4 * SPACING] = data[5 * SPACING];
       data[6 * SPACING] = data[7 * SPACING];
     }
   }

>From an asm programmer's perspective, when FreeBSD decides to spread a
small program's variables between

   * the beginning of a data page,
   * the beginning of a bss page,
   * the beginning of a malloc mmap page,
   * the beginning of a heap page,
   * the beginning of the next heap page,
   * the beginning of yet another heap page,

et cetera, it is actively trying (with varying degrees of success) to
damage cache performance in exactly the same way that this program does.

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago