From owner-freebsd-smp Sun Apr 30 6:23:44 2000 Delivered-To: freebsd-smp@freebsd.org Received: from relativity.student.utwente.nl (wit389306.student.utwente.nl [130.89.234.166]) by hub.freebsd.org (Postfix) with ESMTP id C28DF37B7F3 for ; Sun, 30 Apr 2000 06:23:40 -0700 (PDT) (envelope-from djb@wit389306.student.utwente.nl) Received: by relativity.student.utwente.nl (Postfix, from userid 1000) id 90BE75DEE; Sun, 30 Apr 2000 15:23:39 +0200 (CEST) Date: Sun, 30 Apr 2000 15:23:39 +0200 From: Dave Boers To: Steve Passe Cc: smp@FreeBSD.ORG Subject: Re: hlt instructions and temperature issues Message-ID: <20000430152339.A453@relativity.student.utwente.nl> Reply-To: djb@ifa.au.dk References: <200004300350.VAA13194@Ilsa.StevesCafe.com> <20000430122943.A52481@relativity.student.utwente.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <20000430122943.A52481@relativity.student.utwente.nl>; from djb@ifa.au.dk on Sun, Apr 30, 2000 at 12:29:43PM +0200 Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sun, Apr 30, 2000 at 12:29:43PM +0200, Dave Boers wrote: > I will now go do a different kind of testing (one I usually use to test > new memory DIMM's: large matrix inversion in double precision). I will do > some benchmarking with and without the hlt instruction. > > You'll all hear from me shortly. The test consists of filling a large (n x n) real-valued double precision matrix with random double's between 0 and 1 (I used srandomdev()) and adding the unit matrix to that, so that the matrix becomes diagonally dominant, which will ensure that it's invertible. This matrix is inverted using LU factorization and the inverted result is multiplied with the original. Then a test is performed to determine the maximum difference between the result and the unit matrix. This is a test I usually use to check memory integrity. This test uses a lot of memory very intensively and memory errors will hopefully trigger errors in the result that are bigger than the standard numerical noise (which is of the order of 1e12 for n = 2400). The system tested is an Abit BP6 dual celeron, not overclocked, 400 Mhz. There is 256 Mb RAM in the system and n = 2400 is about the maximum I can do without paging. The calculation is a single program, so effectively this uses about half the system's processing power. The program and the meschach matrix library used are in C and the optimization level used for both is simply -O. I removed the apic line (and the #define's for that matter) from swtch.s so that I now use exactly the same modifications as you, Steve. Here are the results: Unmodified kernel Matrix size: Output from time command: T: T/n**3: ----------------------------------------------------------------------- n = 2400 1323.063u 2.476s 22:06.53 99.9% 1325.539 s. 9.5887e-8 n = 1500 326.161u 0.757s 5:27.16 99.9% 326.918 s. 9.6865e-8 n = 500 12.438u 0.070s 0:12.53 99.7% 12.508 s. 1.0006e-7 Average CPU temperature during calculations: 50 degrees Celcius. Modified kernel Matrix size: Output from time command: T: T/n**3: ----------------------------------------------------------------------- n = 2400 1381.708u 2.250s 23:04.52 99.9% 1383.958 s. 1.0011e-7 n = 1500 342.293u 0.781s 5:43.37 99.9% 343.074 s. 1.0165e-7 n = 500 12.982u 0.124s 0:13.12 99.8% 13.106 s. 1.0485e-7 Average CPU temperature during calculations: 45 degrees Celcius. It can be seen from the last column that the calculations roughly scale with n**3, as they should because matrix inversion and multiplication are n**3 algorithms. Differences Matrix size: Relative difference: (T_modified/T_unmodified)*100-100 % ----------------------------------------------------------------------- n = 2400 + 4.41 % n = 1500 + 4.94 % n = 500 + 4.78 % Conclusion: the modifications have made matrix inversion and multiplication less than 5 % slower. Comparing this to your results, Steve, for which we have (36.9 / 38.7 minutes) * 100 - 100 % = - 4.65 % I think we can draw the conclusion that for NCPU in the order of 2 to 4, the hlt modifications have a plus or minus 5% impact on system performance, depending on the application and system specifications. Note that the temperature even during full load is 10 % lower with the modifications than without. Of course temperature results will vary widely. I personally wouldn't worry about 5% performance difference, and go for the lower temperature and increased stability. But others may have different opinions. My proposal is therefore to make the hlt instruction a kernel option for SMP systems. That way everyone can experiment for themselves and possible problems may be detected. My system appears perfectly stable and since I removed the apic lines from swtch.s I haven't seen a single stray irq 7 anymore. Regards, Dave Boers. -- djb@ifa.au.dk d.j.boers@tn.utwente.nl PGP key: ftp://relativity.student.utwente.nl:/pub/pgpkeys/djb.asc To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message