Date: Thu, 22 Oct 2015 14:36:02 -0700 From: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com> To: galtsev@kicp.uchicago.edu Cc: Andrea Venturoli <ml@netfence.it>, "questions@freebsd.org" <questions@freebsd.org>, Ernie Luzar <luzar722@gmail.com> Subject: Re: Spontaneous reboots with splash Message-ID: <CAOgwaMu%2B=EEn9OBtA64uGwedpF0OdwTfHGESrOU7argjFQZAgw@mail.gmail.com> In-Reply-To: <16867.128.135.52.6.1445533699.squirrel@cosmo.uchicago.edu> References: <5627D8B8.7030901@netfence.it> <5628CD2B.2000902@gmail.com> <5628CFA7.6040704@netfence.it> <CAOgwaMvH5RbAghKCrhWQ7B=8TUVBxoeAXtrQHGK8qWkwCyXUsg@mail.gmail.com> <5628FD40.1030701@netfence.it> <CAOgwaMvG0VoafNjme_c6dEhQ%2BZsKAO0_Q0i97=ta9=TPF=ZhBw@mail.gmail.com> <16867.128.135.52.6.1445533699.squirrel@cosmo.uchicago.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Oct 22, 2015 at 10:08 AM, Valeri Galtsev <galtsev@kicp.uchicago.edu> wrote: > > On Thu, October 22, 2015 11:18 am, Mehmet Erol Sanliturk wrote: > > On Thu, Oct 22, 2015 at 8:14 AM, Andrea Venturoli <ml@netfence.it> > wrote: > > > >> On 10/22/15 14:18, Mehmet Erol Sanliturk wrote: > >> > >> If you have two identical computers with the same programs running : > >>> One is working correctly , but other one is booting arbitrarily : > >>> > >> > >> I've got another identical box; I'll restore a dump on this and see if > >> the > >> behaviour is the same. > >> > >> > >> > >> > >> Therefore , there is a necessity to check that > >>> > >>> - processor is working correctly > >>> > >> > >> CPU Burn-in says yes. > >> > >> > >> > >> - memories are working correctly > >>> > >> > >> Memtest 86+ says so. > >> > >> > >> > >> - memory management chips are working correctly . > >>> > >> > >> I have no idea how to check. How do I do this? > >> > >> > >> > > > > If memory tests are showing memories are working correctly , it is > > possible > > to say that memory management chips are also working correctly . > Otherwise > > , it is not possible to write into and read from chips correctly . > > > > If memory chips fail , by testing with correctly working chips known , > the > > problem may be attributed to memory management chips . > > > > Another possibility is the Watt level of Power Supply : If the required > > watts is exceeding the existent power supply watts level , it may cause > > reboots when power use increases beyond its capacity . > > > > > > Another possibility is power supply is cutting power spontaneously or > > causing fluctuations . > > > > Yes, I've seen this even if PS is marginally pushed to its capacity, and > it is old, therefore filtering capacitors lost some of their capacitance. > Excessive ripple on bus power leads (resulting from the above) and > possibly aged capacitors of the system board (I still call it that way > even though long ago the jargon "motherboard" became a standard) partly to > blame. I've seen the machines starting to consume more power some 5 years > down the road merely because hard drives age, and start consuming more > power. > > Incidentally, memtest86 may pass successfully in the above case, as it > runs with zero load, hence much less power consumption. > > I also wouldn't discard the possibility that BIOS temperature sensor(s) is > (are) tripped - investigate that (simply increasing threshold levels would > be the way to test if this is the case). If you have AMD CPUs, you should > be safe. I heard someone said you can boil water on them and they still > keep running. I had once to live with 96F in the server room for 2 hours > (to let some maintenance be completed) and none of Opteron boxes got sick. > A few of Intel ones did... > > Valeri > > > > > > >> > >> Another problem may be a program which is causing generation of an > >>> invalid address showing boot start code and jumping into it . This is > >>> very easy for a i386 real mode program . > >>> > >> > >> In that case this program would be FreeBSD! That's why I'm asking here. > >> > >> > >> > >> > > > > If you can isolate the program causing boots , it will be possible to > > check > > its sources and binary file . > > > > > > > > > >> > >> Another possibility is that a program is broken ( contains an invalid > >>> address ) > >>> > >> > in HDD . When it starts to working , it jumps to that broken address > >> and this > >> > may start the boot . > >> > >> Would a userland program be allowed to do this??? > >> > >> > >> > > > > Let's assume that CPU is not over-heated and is not rebooting the > computer > > like motherboard is powered . > > > > Let's assume that there is no any malicious program part to cause > > rebooting > > . > > > > A broken network card may corrupt data and may cause serious problems . > > > > The remaining possibility is that instruction counter value is destroyed > > in > > a program and showing the BIOS boot code area . To reboot the computer , > > it is necessary to start BIOS boot code > > > > This may occur also during BIOS related calls . Instead of a proper > > interrupt code , boot part is invoked . > > > > Otherwise we will say that within FreeBSD OS parts , there is a point > that > > , instead of a proper shut down , it is directly rebooting the computer > by > > calling BIOS boot code . Checking panic points and searching OS sources > > for > > such a reboot code ( without any error message and request approval from > > the user ) existence may help . > > > > Here the most important part is to find the program part which is causing > > the reboots . Studying this program part will reveal the reason and , > > therefore the cure . > > > > > > I can not say any correct sentence here about FreeBSD internals due to ( > > not sufficient knowledge ) . > > > > > > Since that computer is not working properly , you can do the following : > > Reinstall OS into a spare disk and check with it . > > > > This will identify whether problem is caused by the presently installed > OS > > or not . > > If it can execute 64-bits OS , testing with such an OS will identify > > effect > > of OS or hardware . > > > > > > > > > >> > >> bye & Thanks > >> av. > >> > > > > > > Mehmet Erol Sanliturk > > _______________________________________________ > > freebsd-questions@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-questions > > To unsubscribe, send any mail to > > "freebsd-questions-unsubscribe@freebsd.org" > > > > > ++++++++++++++++++++++++++++++++++++++++ > Valeri Galtsev > Sr System Administrator > Department of Astronomy and Astrophysics > Kavli Institute for Cosmological Physics > University of Chicago > Phone: 773-702-4247 > ++++++++++++++++++++++++++++++++++++++++ > Another important trouble point is HDD cables : They may be badly corrupting loaded programs . Checking ( replacing ) HDD cables may be useful . Mehmet Erol Sanliturk
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOgwaMu%2B=EEn9OBtA64uGwedpF0OdwTfHGESrOU7argjFQZAgw>