From owner-freebsd-smp  Sun Apr 30  6:23:44 2000
Delivered-To: freebsd-smp@freebsd.org
Received: from relativity.student.utwente.nl (wit389306.student.utwente.nl [130.89.234.166])
	by hub.freebsd.org (Postfix) with ESMTP id C28DF37B7F3
	for <smp@FreeBSD.ORG>; Sun, 30 Apr 2000 06:23:40 -0700 (PDT)
	(envelope-from djb@wit389306.student.utwente.nl)
Received: by relativity.student.utwente.nl (Postfix, from userid 1000)
	id 90BE75DEE; Sun, 30 Apr 2000 15:23:39 +0200 (CEST)
Date: Sun, 30 Apr 2000 15:23:39 +0200
From: Dave Boers <djb@ifa.au.dk>
To: Steve Passe <smp@csn.net>
Cc: smp@FreeBSD.ORG
Subject: Re: hlt instructions and temperature issues
Message-ID: <20000430152339.A453@relativity.student.utwente.nl>
Reply-To: djb@ifa.au.dk
References: <200004300350.VAA13194@Ilsa.StevesCafe.com> <20000430122943.A52481@relativity.student.utwente.nl>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0.1i
In-Reply-To: <20000430122943.A52481@relativity.student.utwente.nl>; from djb@ifa.au.dk on Sun, Apr 30, 2000 at 12:29:43PM +0200
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Sun, Apr 30, 2000 at 12:29:43PM +0200, Dave Boers wrote:
> I will now go do a different kind of testing (one I usually use to test
> new memory DIMM's: large matrix inversion in double precision). I will do
> some benchmarking with and without the hlt instruction. 
> 
> You'll all hear from me shortly. 

The test consists of filling a large (n x n) real-valued double precision
matrix with random double's between 0 and 1 (I used srandomdev()) and
adding the unit matrix to that, so that the matrix becomes diagonally
dominant, which will ensure that it's invertible. 

This matrix is inverted using LU factorization and the inverted result is
multiplied with the original. Then a test is performed to determine the
maximum difference between the result and the unit matrix. 

This is a test I usually use to check memory integrity. This test uses a
lot of memory very intensively and memory errors will hopefully trigger
errors in the result that are bigger than the standard numerical noise
(which is of the order of 1e12 for n = 2400). 

The system tested is an Abit BP6 dual celeron, not overclocked, 400 Mhz.
There is 256 Mb RAM in the system and n = 2400 is about the maximum I can
do without paging. The calculation is a single program, so effectively this
uses about half the system's processing power.

The program and the meschach matrix library used are in C and the
optimization level used for both is simply -O. 

I removed the apic line (and the #define's for that matter) from swtch.s so
that I now use exactly the same modifications as you, Steve. 

Here are the results: 

			       Unmodified kernel

Matrix size:  Output from time command:         T:            T/n**3:	 
-----------------------------------------------------------------------
n = 2400      1323.063u 2.476s 22:06.53 99.9%   1325.539 s.   9.5887e-8
n = 1500       326.161u 0.757s  5:27.16 99.9%    326.918 s.   9.6865e-8
n =  500        12.438u 0.070s  0:12.53 99.7%     12.508 s.   1.0006e-7

Average CPU temperature during calculations: 50 degrees Celcius. 


				Modified kernel

Matrix size:  Output from time command:         T:            T/n**3:	 
-----------------------------------------------------------------------
n = 2400      1381.708u 2.250s 23:04.52 99.9%	1383.958 s.   1.0011e-7
n = 1500       342.293u 0.781s  5:43.37 99.9%    343.074 s.   1.0165e-7
n =  500        12.982u 0.124s  0:13.12 99.8%     13.106 s.   1.0485e-7

Average CPU temperature during calculations: 45 degrees Celcius. 

It can be seen from the last column that the calculations roughly scale
with n**3, as they should because matrix inversion and multiplication 
are n**3 algorithms. 


				Differences 

Matrix size:   Relative difference: (T_modified/T_unmodified)*100-100 %
-----------------------------------------------------------------------
n = 2400                                                       + 4.41 %
n = 1500                                                       + 4.94 %
n = 500                                                        + 4.78 %

Conclusion: the modifications have made matrix inversion and multiplication
            less than 5 % slower. 

Comparing this to your results, Steve, for which we have 

    (36.9 / 38.7 minutes) * 100 - 100 % = - 4.65 %

I think we can draw the conclusion that for NCPU in the order of 2 to 4,
the hlt modifications have a plus or minus 5% impact on system performance,
depending on the application and system specifications. 

Note that the temperature even during full load is 10 % lower with the
modifications than without. Of course temperature results will vary widely.

I personally wouldn't worry about 5% performance difference, and go for the
lower temperature and increased stability. But others may have different
opinions. My proposal is therefore to make the hlt instruction a kernel
option for SMP systems. That way everyone can experiment for themselves and
possible problems may be detected. 

My system appears perfectly stable and since I removed the apic lines from
swtch.s I haven't seen a single stray irq 7 anymore. 

Regards, 

    Dave Boers. 


-- 
 djb@ifa.au.dk                              d.j.boers@tn.utwente.nl
 PGP key:  ftp://relativity.student.utwente.nl:/pub/pgpkeys/djb.asc


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message