From owner-freebsd-current  Sat Sep 14 02:00:54 1996
Return-Path: owner-current
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id CAA24911
          for current-outgoing; Sat, 14 Sep 1996 02:00:54 -0700 (PDT)
Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id CAA24897
          for <current@freebsd.org>; Sat, 14 Sep 1996 02:00:49 -0700 (PDT)
Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id KAA13017; Sat, 14 Sep 1996 10:31:45 +0200
From: Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Message-Id: <199609140831.KAA13017@labinfo.iet.unipi.it>
Subject: Re: pentium-optimized bzero and bcopy
To: bde@zeta.org.au (Bruce Evans)
Date: Sat, 14 Sep 1996 10:31:44 +0200 (MET DST)
Cc: current@freebsd.org
In-Reply-To: <199609140649.QAA06910@godzilla.zeta.org.au> from "Bruce Evans" at Sep 14, 96 04:49:02 pm
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Sender: owner-current@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On the bzero subject, does someone have statistics on the usage of
bzero on large (>2pages) chunks of memory ?

Here is why I am asking. If you declare

	char foo[A_LARGE_CONSTANT];

this goes into BSS, which is zero filled on demand. Hence, if you
only use a part of this large buffer, you do not consume physical
memory or swap space. (I have tried to force a core dump on many
programs, and I noticed that no single program "uses" less than 20
fully-zeroed pages.  I guess this is something which comes with
libc, probably because of some memory overallocation.)

On the other hand, you might think that it is cleaner to initialize
foo[], and call

	bzero(foo, sizeof(foo));

in your program. At this point, I think these pages become mapped
and zeroed, thus consuming memory, unless bzero() can intercept
such an occurrence and invoke a system call to unmap the required
pages.

This is not much of a problem for ordinary programs, or for library
code, as the system's architect might be aware of the difference
and allocate memory in the most efficient way. But how about user
programs ? As an example, I often run simulations using large hash
tables, and I think I could gain some performance from a modified
bzero(). However, I agree that if this is going to be a very rare
occurrence, then it is not worth changing things and (probably)
add another system call.

Any comments ?

	Luigi
====================================================================
Luigi Rizzo                     Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it       Universita' di Pisa
tel: +39-50-568533              via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522              http://www.iet.unipi.it/~luigi/
====================================================================