From owner-freebsd-current@FreeBSD.ORG Thu Jul 17 22:23:10 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 09CBFF7C; Thu, 17 Jul 2014 22:23:10 +0000 (UTC) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 88D2727FC; Thu, 17 Jul 2014 22:23:09 +0000 (UTC) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost.zedat.fu-berlin.de (Exim 4.82) with esmtp (envelope-from ) id <1X7u4q-000nl4-Ah>; Fri, 18 Jul 2014 00:23:00 +0200 Received: from g229111184.adsl.alicedsl.de ([92.229.111.184] helo=thor.walstatt.dynvpn.de) by inpost2.zedat.fu-berlin.de (Exim 4.82) with esmtpsa (envelope-from ) id <1X7u4q-003can-5d>; Fri, 18 Jul 2014 00:23:00 +0200 Date: Fri, 18 Jul 2014 00:22:52 +0200 From: "O. Hartmann" To: Willem Jan Withagen Subject: Re: [CURRENT]: weird memory/linker problem? Message-ID: <20140718002252.09f55fc1.ohartman@zedat.fu-berlin.de> In-Reply-To: <53B2D262.2040502@digiware.nl> References: <20140622165639.17a1ba1e.ohartman@zedat.fu-berlin.de> <20140623163115.03bdd675.ohartman@zedat.fu-berlin.de> <20140701150755.548ed6b9.ohartman@zedat.fu-berlin.de> <53B2D262.2040502@digiware.nl> Organization: FU Berlin X-Mailer: Claws Mail 3.10.1 (GTK+ 2.24.22; amd64-portbld-freebsd11.0) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/j9Oi4X442YBhPfD4RtIhknH"; protocol="application/pgp-signature" X-Originating-IP: 92.229.111.184 X-ZEDAT-Hint: A Cc: "Rang, Anton" , Adrian Chadd , FreeBSD CURRENT , Dimitry Andric X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jul 2014 22:23:10 -0000 --Sig_/j9Oi4X442YBhPfD4RtIhknH Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Am Tue, 01 Jul 2014 17:23:14 +0200 Willem Jan Withagen schrieb: > On 2014-07-01 16:48, Rang, Anton wrote: > > DOT =3D> DOD > > > > 444F54 =3D> 444F44 > > > > That's a single-bit flip. Bad memory, perhaps? >=20 > Very likely, especially if the system does not have ECC.... > It just happens on rare occasions that a alpha particle, power cycle, or= =20 > any things else disruptive damages a memory cell. And it could be that=20 > it requires a special pattern of accesses to actually exhibit the error. >=20 > In the past (199x's) 'make buildworld' used to be a rather good memory=20 > tester. But nowadays look at > http://www.memtest.org/ >=20 > This tool has found all of the bad memory in all the systems I used and=20 > or build for others... > Note that it might take a few runs and some more heat to actually=20 > trigger the faulty cell, but memtest86 will usually find it. >=20 > Note that on big systems with lots of memory it can take a loooooong=20 > time to run just one full testset to completion. >=20 > --WjW >=20 >=20 > > > > Anton > > > > -----Original Message----- > > From: owner-freebsd-current@freebsd.org [mailto:owner-freebsd-current@f= reebsd.org] On > > Behalf Of O. Hartmann Sent: Tuesday, July 01, 2014 8:08 AM > > To: Dimitry Andric > > Cc: Adrian Chadd; FreeBSD CURRENT > > Subject: Re: [CURRENT]: weird memory/linker problem? > > > > Am Mon, 23 Jun 2014 17:22:25 +0200 > > Dimitry Andric schrieb: > > > >> On 23 Jun 2014, at 16:31, O. Hartmann wr= ote: > >>> Am Sun, 22 Jun 2014 10:10:04 -0700 > >>> Adrian Chadd schrieb: > >>>> When they segfault, where do they segfault? > >> ... > >>> GIMP, LaTeX work, nothing special, but a bit memory consuming > >>> regrading GIMP) I tried updating the ports tree and surprisingly the > >>> tree is left over in a unclean condition while /usr/bin/svn segfault > >>> (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped= )). > >>> > >>> Using /usr/local/bin/svn, which is from the devel/subversion port, > >>> performs well, while FreeBSD 11's svn contribution dies as described.= It did not > >>> hours ago! > >> > >> I think what Adrian meant was: can you run svn (or another crashing > >> program) in gdb, and post a backtrace? Or maybe run ktrace, and see > >> where it dies? > >> > >> Alternatively, put a core dump and the executable (with debug info) in > >> a tarball, and upload it somewhere, so somebody else can analyze it. > >> > >> -Dimitry > >> > > > > It's me again, with the same weird story. > > > > After a couple of days silence, the mysterious entity in my computer is= back. This > > time it is again a weird compiler message of failure (trying to buildwo= rld): > > > > [...] > > c++ -O2 -pipe -O3 -O3 > > c++ -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include > > -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/i= nclude > > -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -= I. > > -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/cla= ng/include > > -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTAN= T_MACROS > > -fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=3D\"x86_64-unknown-fr= eebsd11.0\" > > -DLLVM_HOST_TRIPLE=3D\"x86_64-unknown-freebsd11.0\" -DDEFAULT_SYSROOT= =3D\"\" > > -Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=3Dc++= 11 > > -fno-exceptions -fno-rtti -Wno-c++11-extensions > > -c /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/= Host.cpp -o > > Host.o --- GraphWriter.o --- In file included > > from /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Suppor= t/GraphWriter.cpp:14: /usr/src/lib/clang/libllvmsupport/../../../contrib/ll= vm/include/llvm/Support/GraphWriter.h:269:10: > > error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O << > > DOD::EscapeString(Label); ^~~ > > DOT /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llv= m/Support/GraphWriter.h:35:11: > > note: 'DOT' declared here namespace DOT { // Private functions... ^ 1 = error > > generated. *** [GraphWriter.o] Error code 1 > > > > > > Well, in the past I saw many of those messages, especially not found la= bels of > > routines in shared objects/libraries or even those "funny" misspelled m= essages shown > > above. > > > > I can not reproduce them after a reboot, but as long as the system is r= unning with > > this error occured, it is sticky. So in order to compile the OS success= fully, I > > reboot. > > > > Does anyone have an idea what this could be? Since it affects at the mo= ment only one > > machine (the other CoreDuo has been retired in the meanwhile), it feels= a bit like a > > miscompilation on a certain type of CPU. > > > > Thanks for your patience, > > > > Oliver Hello all. Well, I'd like to update some informations. It doesn't relief the special c= oncern, but might be a kind of replenishment of experience. The box in question is now with only 4GB - and is oprable as expected. With= 8 GB, I see those reported weird bugs and they revealed themselfes as indeed bit flips.= I can not reproduce them, the occur spontanously, but I can raise the frequency by pe= rmutating the RAM sticks. So far. As reported, the memtest86+ test doesn't show anything = even after three days(!) of testing! The bos was built 2009 as a development system with 4GB RAM. That time, the= developer ordered special and expensive overclocker RAM, Ballistix, from Crucial. Usu= ally, I purchase JEDEC conform RAM - I have some allergic reaction to this stupid o= verclocking and "planned destruction with fun" of silica by overdriving it. Especially = when it concerns equipment we have to rely on. The box has then been upgraded with = further 4GB RAM (two sticks) of the same type and brand, consuming 2+ volts (as far as = I know). Last summer, after 4 years of problem less operation, suddenly I had to fig= ht with spontanous crashes and blamed FBSD CURRENT, but very quickly the memory was= revealed as to be the culprit. The funny thing was: the box "roasted" literally the upp= er 4 GB bank and they got that hot, you might have burned your fingers seriously when to= uched (I did!). The end of that game was, after a cascade of tests, swapping RAM sti= cks, that those sticks in the upper slots (B1 and B2) where destroyed! After I exchan= ged the RAM completely to JEDEC conform 8 GB, the system ran perfectly, until this summ= er again. When in end of May the temperatures went beyon 20 degree Celsius in my lab, the = bos started having the issues with this bit flips. I guess that there is a temperature triggered problem with the voltage regu= lation or something killing slowly the RAM modules/sticks. This is only a guess. As I= reported, the chipset itself reports 81 - 85 degree C (in BIOS and with healthd). This hi= gh temperature occured suddenly last year and I first thought that could be a mismeasureme= nt. After testing VBox and occupying all available memory without any obvious e= rror or crash, I tried compiling the OS and it seems that the notable load LLVM/CLANG rpod= uces building parallelised world/kernel triggers also this bit flip which results very fa= st in strange errors as reported earlier in this thread. The ultimate failure arose when = I tried to install a Windows 7 on a free harddrive with 8 GB: the install process died= with a file corruption or not-copied file. I didn't dare to try the FreeBSD installatio= n since I know from the past that even FreeBSD's copying also triggers very fast hardware = issues if any available (overheating and sibblings). With 4 GB only everything works as e= xpected, but 4 GB is a pain in the ass with ZFS and 11.0-CURRENT alone, not to mention the= pain when doing some memory intensive calculations/simulations or even VBox. At the end, there is a mixed conclusion. I realise that I can not trust the= expertise of memtest86+. There is no suitable "burn-in" test for FreeBSD consuming, stre= ssing, tortouring memory and bus systems as well as all cores of the CPU starting = with Core2Duo CPUs, since cpuburn/burncpu of the ports do not utilise AVX/SIMD or other "= hot" facilities of modern Intel-like CPUs or stressing the integrated memory controller in = a "brutal" way. Prime95 is only available for i386 - and that is a pity on amd64 and >= 4GB RAM. At the end, there is no reason to purchase again a Workstation-grade mainbo= ard, as advertised by ASUS, for instance, with this overclocking crap. I leave behi= nd a very bitter taste - for my personal view. Since the memory problems I realised d= o not reveal themselfes as "steady-state" problems, permanently, I fear data corruption = not indicated by any protection - so for the future, ECC is some kind of a must. And this= means, even for "low end" workstations, byebye cheap crappy Intel toy CPUs! At least a = XEON type, ECC capable processor is a prerequisite and I wish AMD had not followed the= cheap man's path ripping the ECC facilities off their consumer CPUs. It is a matter of = fact that even in the academic environment "cheap" ECCless systems are purchased for "cost effectiveness".=20 At the end, I personally wish for some massive tortouring tools like cpubur= n or something more sophisticated to stress the CPU to its limit - to test the reliability= , the cooling facilities and the energy support (power supply flaky under heavy load, etc= .?). FreeBSD's port do not have even the simplest Prime95 in a 64bit version as it is avai= lable for Linux or Windows. I'm sure, some professionals are capable of pulling toget= her some massive stresstest tools, but please could this be made available for the n= ot so professionals and more "common" users? Maybe a naive Christmas wish? I need to replace the system since I can not rely on that flaky box anymore= , even when using encrypted devices. That is, after a painful time and hopes, the final= conclusion. Regards and thanks for the patience reading this far, Oliver --Sig_/j9Oi4X442YBhPfD4RtIhknH Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBAgAGBQJTyEzDAAoJEOgBcD7A/5N8GcgH/1ULRP8IMJR+8fH8CJkYhArW +CmCH9WFp7IMisKKcjqzWsOjPz1rE5ubg6AA+aFP7yvyTW3IrWxF0YzpMVFiV3+6 BhO77RIxYcuVye+F+Hf5W5QcRdBdGjiZe0nGdTdF1SvEvjh5F6KChMkhWJkHJDZP zYYWmne/HAQxUIxRnc9PDOcdMANbqVCYOero9VhkexbzHuBsNIDELjsDuHUOZE7z 6opVrkznB5MVpawcaidxYVJeFO1odukA4UYxXHjfwtPgpL25dT8W04QsCPI+hShr wPFzciWw3hDJos3XTKKTtH9dX0OOPQwJViHVM/S1duGXZzEE8ReHHuQLO3qowAc= =nksm -----END PGP SIGNATURE----- --Sig_/j9Oi4X442YBhPfD4RtIhknH--