From owner-freebsd-performance@FreeBSD.ORG Wed Jun 25 02:42:29 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6DDC137B401 for ; Wed, 25 Jun 2003 02:42:29 -0700 (PDT) Received: from stoneport.math.uic.edu (stoneport.math.uic.edu [131.193.178.160]) by mx1.FreeBSD.org (Postfix) with SMTP id B48C543FDF for ; Wed, 25 Jun 2003 02:42:28 -0700 (PDT) (envelope-from djb-dsn-1056534181.56349@cr.yp.to) Received: (qmail 56350 invoked by uid 1017); 25 Jun 2003 09:43:01 -0000 Date: 25 Jun 2003 09:43:01 -0000 Message-ID: <20030625094301.56349.qmail@cr.yp.to> Automatic-Legal-Notices: See http://cr.yp.to/mailcopyright.html. From: "D. J. Bernstein" To: freebsd-performance@freebsd.org References: <20030625060629.51087.qmail@cr.yp.to> <20030625023621.N17881-100000@mail.chesapeake.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Re: ten thousand small processes X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jun 2003 09:42:29 -0000 In case my subject line isn't clear enough: I'm not talking about browsers. I'm not talking about programs that ``use up stack space,'' or that would feel even slightly ``restricted'' by a 1GB limit on the heap, or that use threads, or that allocate many file descriptors per process. I'm talking about _small_ processes. I'm talking about programs that might have quite a bit of code, and might read and write quite a lot of data, but that don't use much memory per process. The problem is that I'm talking about ten thousand of these processes running at once. In this situation, if malloc() fritters away 24K of RAM in one process, it's actually frittering away more than 234 _megabytes_ of RAM. This memory is _not_ going to be ``used later.'' The behavior of malloc() here is not ``quite good.'' For the program that prompted this thread, malloc() achieved 4.3% fill, 95.7% external fragmentation. In this situation, squeezing variables into a single page---rather than spreading them among the first few bytes of several separate pages--- often produces huge improvements in CPU cache effectiveness. Yes, I know that doesn't make a difference for your browser; I'm not talking about your browser. In this situation, if the kernel is maintaining pages of memory that look like process 1: 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... process 2: 4 5 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... process 3: 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... then it's wasting an incredible amount of space. As I said before: Instead of allocating space for a bunch of nearly identical page tables (or anything else showing this type of waste), why not overlap page tables, with the changes copied on a process switch? ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago