Date: Mon, 21 Nov 2005 16:46:48 -0500 From: Kris Kennaway <kris@obsecurity.org> To: Walter Roberts <wroberts@securenym.net> Cc: freebsd-bugs@FreeBSD.org Subject: Re: misc/89103: gcc segmentation fault errors Message-ID: <20051121214648.GC7696@xor.obsecurity.org> In-Reply-To: <200511180600.jAI60WtR048667@freefall.freebsd.org> References: <200511180600.jAI60WtR048667@freefall.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] On Fri, Nov 18, 2005 at 06:00:32AM +0000, Walter Roberts wrote: > The following reply was made to PR misc/89103; it has been noted by GNATS. > > From: "Walter Roberts" <wroberts@securenym.net> > To: <bug-followup@FreeBSD.org>, <wroberts@securenym.net> > Cc: > Subject: Re: misc/89103: gcc segmentation fault errors > Date: Fri, 18 Nov 2005 00:55:21 -0500 > > This is a multi-part message in MIME format. > > ------=_NextPart_000_0007_01C5EBDA.C0C39C10 > Content-Type: text/plain; > charset="iso-8859-1" > Content-Transfer-Encoding: quoted-printable > > Ruled out hardware issue: > > 1. Ran memtest 86 -- 7 full cycles (18 hours +/-). > 2. Reduced memory from 512Mb to 256Mb, repeated with different memory = > chip. > 3. Ran full burncpu, passed. > > Power supplies operating at nominal voltages. > > System is apparently not using swap space for this process. > > Replaced AMD K6 200 with old K6 slow processor=20 > > Same failure. CPU temps are <33C in all cases. I don't know the exact = > numbers, but it's typically around 28C. > > This simply does not smell like a hardware problem [Snip historical anecdotes] > I'm willing to believe you, = > but I'd like to know why you're so convinced this is a hardware issue. =20 Because I've been answering these questions for years, and I've seen dozens of people start out saying "I'm convinced it's not a hardware problem" and then working their way around to "it was a hardware problem, sorry for wasting your time". > The factors pointing against a hardware issue are: 1. The machine runs = > everything else without a problem. 2. The machine ran non-stop = > (non-reboot) on a UPS for over a half a year without a glitch, (take = > that NT), and it seems to run f90 ok, and most cc's ok. 3. The system = > runs very compute/memory intenstive monte carlo high energy physics code = > that stores lots and lots of numbers to be written to files at the end = > of the day and works consistantly. I would expect that if it weren't = > working properly, something would be amiss elsewhere and would expect a = > panic at some point, or the system to just plain stop working. 4. From = > the archives it appears that more than one of us is havng a similar = > problem. Not that I've seen. Where are these other reports? > 5. This exact system ran for years without a glitch running = > FreeBSD 2.2 and FreeBSD 3.2. =20 This kind of problem can be *very* workload-specific. i.e. everything will work fine except one task that tickles the machine in exactly the right way to trigger the hardware failure. Yes, I've seen exactly this scenario happen many times. > Is it safe to upgrade to GCC 4? Would that solve the problem? I'd be = > happy to get it from gnu and try it, if it won't break anything. I = > don't have the time I used to have to go messing in operating system = > innards, much as I'd like to. It won't fix a hardware problem, naturally. You can't use a non-system compiler to compile FreeBSD, although you could compile your own code with it. > It is certainly possible that a pointer is misprogrammed (or perhaps the = > fixed point register in the AMD chip doesn't work right??) and picks up = > something funny that causes the compiler to have the "segementation = > fault 11" That fault is consistent! I'm sure it's consistent on this machine, but you're really reaching by suggesting that it's a CPU bug affecting thousands of users :-) Kris P.S. Did you say in a previous email that the machine worked fine when it was running at a site at high altitude, but stopped working when you moved it and then upgraded it? That's a big clue that says something broke at that point (or before, but was masked by lower ambient temperatures, or something). [-- Attachment #2 --] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFDgkBIWry0BWjoQKURApoEAKCf8k8Rr7BmCSdba5re6bb815q9hACdHVsO UTFTHF+G/NJsWx7rQQp3ZFE= =9GkX -----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051121214648.GC7696>
