From owner-freebsd-bugs@FreeBSD.ORG Fri Nov 18 06:00:32 2005 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A1BAB16A41F for ; Fri, 18 Nov 2005 06:00:32 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4E4F443D45 for ; Fri, 18 Nov 2005 06:00:32 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.3/8.13.3) with ESMTP id jAI60WIc048668 for ; Fri, 18 Nov 2005 06:00:32 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.3/8.13.1/Submit) id jAI60WtR048667; Fri, 18 Nov 2005 06:00:32 GMT (envelope-from gnats) Date: Fri, 18 Nov 2005 06:00:32 GMT Message-Id: <200511180600.jAI60WtR048667@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: "Walter Roberts" Cc: Subject: Re: misc/89103: gcc segmentation fault errors X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Walter Roberts List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Nov 2005 06:00:32 -0000 The following reply was made to PR misc/89103; it has been noted by GNATS. From: "Walter Roberts" To: , Cc: Subject: Re: misc/89103: gcc segmentation fault errors Date: Fri, 18 Nov 2005 00:55:21 -0500 This is a multi-part message in MIME format. ------=_NextPart_000_0007_01C5EBDA.C0C39C10 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Ruled out hardware issue: 1. Ran memtest 86 -- 7 full cycles (18 hours +/-). 2. Reduced memory from 512Mb to 256Mb, repeated with different memory = chip. 3. Ran full burncpu, passed. Power supplies operating at nominal voltages. System is apparently not using swap space for this process. Replaced AMD K6 200 with old K6 slow processor=20 Same failure. CPU temps are <33C in all cases. I don't know the exact = numbers, but it's typically around 28C. This simply does not smell like a hardware problem, and I've been around = these beasts for a long time....the first machine I programmed used = magnetic CORE memory and had a whopping 8K memory with 12 bit words in = it. When I ran high energy physics codes on Intel processors quite a = few years ago, I got inconsistant answers using the same code (all = fortran) between the i386(Intel) /unix and other machines (DEC, Cray, = Tandem and i386(AMD)), and finally said that was hardware but couldn't = get INTEL to believe me until after several others of us discussed the = issue, all running the same code, and INTEL finally admitted that their = chips couldn't add (and quickly reported to the world that it only = affected certain 'scientific' uses which most people don't use, so they = were safe for balancing your checkbook). I'm willing to believe you, = but I'd like to know why you're so convinced this is a hardware issue. =20 The factors pointing against a hardware issue are: 1. The machine runs = everything else without a problem. 2. The machine ran non-stop = (non-reboot) on a UPS for over a half a year without a glitch, (take = that NT), and it seems to run f90 ok, and most cc's ok. 3. The system = runs very compute/memory intenstive monte carlo high energy physics code = that stores lots and lots of numbers to be written to files at the end = of the day and works consistantly. I would expect that if it weren't = working properly, something would be amiss elsewhere and would expect a = panic at some point, or the system to just plain stop working. 4. From = the archives it appears that more than one of us is havng a similar = problem. 5. This exact system ran for years without a glitch running = FreeBSD 2.2 and FreeBSD 3.2. =20 Is it safe to upgrade to GCC 4? Would that solve the problem? I'd be = happy to get it from gnu and try it, if it won't break anything. I = don't have the time I used to have to go messing in operating system = innards, much as I'd like to. It is certainly possible that a pointer is misprogrammed (or perhaps the = fixed point register in the AMD chip doesn't work right??) and picks up = something funny that causes the compiler to have the "segementation = fault 11" That fault is consistent! Thanks ------=_NextPart_000_0007_01C5EBDA.C0C39C10 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Ruled out hardware issue:
 
1.  Ran memtest 86 -- 7 full = cycles (18 hours=20 +/-).
2.  Reduced memory from 512Mb to = 256Mb,=20 repeated with different memory chip.
3.  Ran full burncpu, = passed.
 
Power supplies operating at nominal=20 voltages.
 
System is apparently not using swap = space for this=20 process.
 
Replaced AMD K6  200 with old K6 = slow=20 processor
 
Same failure.  CPU temps are = <33C in all=20 cases.  I don't know the exact numbers, but it's typically around=20 28C.
 
This simply does not smell like a = hardware problem,=20 and I've been around these beasts for a long time....the first machine I = programmed used magnetic CORE memory and had a whopping 8K memory with = 12 bit=20 words in it.   When  I ran high energy physics codes = on=20 Intel processors quite a few years ago, I got inconsistant=20 answers using the same code (all fortran) between = the i386(Intel)=20 /unix and other machines (DEC, Cray, Tandem and i386(AMD)), and=20 finally said that was hardware but couldn't get INTEL to believe me = until=20 after several others of us discussed the issue, all running the = same code,=20 and INTEL finally admitted that their chips couldn't add (and = quickly=20 reported to the world that it only affected certain 'scientific' uses = which most=20 people don't use, so they were safe for balancing your checkbook).  =   I'm willing to believe you, but I'd like to know why you're = so=20 convinced this is a hardware issue. 
 
The factors pointing against a hardware = issue=20 are:  1.  The machine runs everything else without a = problem. =20 2.  The machine ran non-stop (non-reboot) on a UPS for over a half = a year=20 without a glitch, (take that NT), and it seems to run f90 ok, and most = cc's=20 ok.  3.  The system runs very compute/memory intenstive monte = carlo=20 high energy physics code that stores lots and lots of numbers to be = written to=20 files at the end of the day and works consistantly.  I would expect = that if=20 it weren't working properly, something would be amiss elsewhere and = would expect=20 a panic at some point, or the system to just plain stop working.  = 4. =20 From the archives it appears that more than one of us is havng a similar = problem.  5.  This exact = system ran for=20 years without a glitch running FreeBSD 2.2 and FreeBSD 3.2.  =
 
Is it safe to upgrade to GCC 4?  = Would that=20 solve the problem?  I'd be happy to get it from gnu and try it, if = it won't=20 break anything.  I don't have the time I used to have to go messing = in=20 operating system innards, much as I'd like to.
 
It is certainly possible that a pointer = is=20 misprogrammed (or perhaps the fixed point  register in the AMD chip = doesn't=20 work right??) and picks up something funny that causes the compiler to = have the=20 "segementation fault  11"  That fault is = consistent!
 
Thanks
 
 
 
------=_NextPart_000_0007_01C5EBDA.C0C39C10--