Date: Fri, 18 Nov 2005 06:00:32 GMT From: "Walter Roberts" <wroberts@securenym.net> To: freebsd-bugs@FreeBSD.org Subject: Re: misc/89103: gcc segmentation fault errors Message-ID: <200511180600.jAI60WtR048667@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR misc/89103; it has been noted by GNATS. From: "Walter Roberts" <wroberts@securenym.net> To: <bug-followup@FreeBSD.org>, <wroberts@securenym.net> Cc: Subject: Re: misc/89103: gcc segmentation fault errors Date: Fri, 18 Nov 2005 00:55:21 -0500 This is a multi-part message in MIME format. ------=_NextPart_000_0007_01C5EBDA.C0C39C10 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Ruled out hardware issue: 1. Ran memtest 86 -- 7 full cycles (18 hours +/-). 2. Reduced memory from 512Mb to 256Mb, repeated with different memory = chip. 3. Ran full burncpu, passed. Power supplies operating at nominal voltages. System is apparently not using swap space for this process. Replaced AMD K6 200 with old K6 slow processor=20 Same failure. CPU temps are <33C in all cases. I don't know the exact = numbers, but it's typically around 28C. This simply does not smell like a hardware problem, and I've been around = these beasts for a long time....the first machine I programmed used = magnetic CORE memory and had a whopping 8K memory with 12 bit words in = it. When I ran high energy physics codes on Intel processors quite a = few years ago, I got inconsistant answers using the same code (all = fortran) between the i386(Intel) /unix and other machines (DEC, Cray, = Tandem and i386(AMD)), and finally said that was hardware but couldn't = get INTEL to believe me until after several others of us discussed the = issue, all running the same code, and INTEL finally admitted that their = chips couldn't add (and quickly reported to the world that it only = affected certain 'scientific' uses which most people don't use, so they = were safe for balancing your checkbook). I'm willing to believe you, = but I'd like to know why you're so convinced this is a hardware issue. =20 The factors pointing against a hardware issue are: 1. The machine runs = everything else without a problem. 2. The machine ran non-stop = (non-reboot) on a UPS for over a half a year without a glitch, (take = that NT), and it seems to run f90 ok, and most cc's ok. 3. The system = runs very compute/memory intenstive monte carlo high energy physics code = that stores lots and lots of numbers to be written to files at the end = of the day and works consistantly. I would expect that if it weren't = working properly, something would be amiss elsewhere and would expect a = panic at some point, or the system to just plain stop working. 4. From = the archives it appears that more than one of us is havng a similar = problem. 5. This exact system ran for years without a glitch running = FreeBSD 2.2 and FreeBSD 3.2. =20 Is it safe to upgrade to GCC 4? Would that solve the problem? I'd be = happy to get it from gnu and try it, if it won't break anything. I = don't have the time I used to have to go messing in operating system = innards, much as I'd like to. It is certainly possible that a pointer is misprogrammed (or perhaps the = fixed point register in the AMD chip doesn't work right??) and picks up = something funny that causes the compiler to have the "segementation = fault 11" That fault is consistent! Thanks ------=_NextPart_000_0007_01C5EBDA.C0C39C10 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META http-equiv=3DContent-Type content=3D"text/html; = charset=3Diso-8859-1"> <META content=3D"MSHTML 6.00.2900.2769" name=3DGENERATOR> <STYLE></STYLE> </HEAD> <BODY bgColor=3D#ffffff> <DIV><FONT face=3DArial size=3D2>Ruled out hardware issue:</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>1. Ran memtest 86 -- 7 full = cycles (18 hours=20 +/-).</FONT></DIV> <DIV><FONT face=3DArial size=3D2>2. Reduced memory from 512Mb to = 256Mb,=20 repeated with different memory chip.</FONT></DIV> <DIV><FONT face=3DArial size=3D2>3. Ran full burncpu, = passed.</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>Power supplies operating at nominal=20 voltages.</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>System is apparently not using swap = space for this=20 process.</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>Replaced AMD K6 200 with old K6 = slow=20 processor </FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>Same failure. CPU temps are = <33C in all=20 cases. I don't know the exact numbers, but it's typically around=20 28C.</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>This simply does not smell like a = hardware problem,=20 and I've been around these beasts for a long time....the first machine I = programmed used magnetic CORE memory and had a whopping 8K memory with = 12 bit=20 words in it. When I ran high energy physics codes = on=20 Intel processors quite a few years ago, I got inconsistant=20 answers using the same code (all fortran) between = the i386(Intel)=20 /unix and other machines (DEC, Cray, Tandem and i386(AMD)), and=20 finally said that was hardware but couldn't get INTEL to believe me = until=20 after several others of us discussed the issue, all running the = same code,=20 and INTEL finally admitted that their chips couldn't add (and = quickly=20 reported to the world that it only affected certain 'scientific' uses = which most=20 people don't use, so they were safe for balancing your checkbook). = I'm willing to believe you, but I'd like to know why you're = so=20 convinced this is a hardware issue. </FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>The factors pointing against a hardware = issue=20 are: 1. The machine runs everything else without a = problem. =20 2. The machine ran non-stop (non-reboot) on a UPS for over a half = a year=20 without a glitch, (take that NT), and it seems to run f90 ok, and most = cc's=20 ok. 3. The system runs very compute/memory intenstive monte = carlo=20 high energy physics code that stores lots and lots of numbers to be = written to=20 files at the end of the day and works consistantly. I would expect = that if=20 it weren't working properly, something would be amiss elsewhere and = would expect=20 a panic at some point, or the system to just plain stop working. = 4. =20 From the archives it appears that more than one of us is havng a similar = problem. </FONT><FONT face=3DArial size=3D2>5. This exact = system ran for=20 years without a glitch running FreeBSD 2.2 and FreeBSD 3.2. = </FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>Is it safe to upgrade to GCC 4? = Would that=20 solve the problem? I'd be happy to get it from gnu and try it, if = it won't=20 break anything. I don't have the time I used to have to go messing = in=20 operating system innards, much as I'd like to.</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>It is certainly possible that a pointer = is=20 misprogrammed (or perhaps the fixed point register in the AMD chip = doesn't=20 work right??) and picks up something funny that causes the compiler to = have the=20 "segementation fault 11" That fault is = consistent!</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>Thanks</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV></BODY></HTML> ------=_NextPart_000_0007_01C5EBDA.C0C39C10--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200511180600.jAI60WtR048667>