Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Oct 2015 14:36:02 -0700
From:      Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com>
To:        galtsev@kicp.uchicago.edu
Cc:        Andrea Venturoli <ml@netfence.it>, "questions@freebsd.org" <questions@freebsd.org>,  Ernie Luzar <luzar722@gmail.com>
Subject:   Re: Spontaneous reboots with splash
Message-ID:  <CAOgwaMu%2B=EEn9OBtA64uGwedpF0OdwTfHGESrOU7argjFQZAgw@mail.gmail.com>
In-Reply-To: <16867.128.135.52.6.1445533699.squirrel@cosmo.uchicago.edu>
References:  <5627D8B8.7030901@netfence.it> <5628CD2B.2000902@gmail.com> <5628CFA7.6040704@netfence.it> <CAOgwaMvH5RbAghKCrhWQ7B=8TUVBxoeAXtrQHGK8qWkwCyXUsg@mail.gmail.com> <5628FD40.1030701@netfence.it> <CAOgwaMvG0VoafNjme_c6dEhQ%2BZsKAO0_Q0i97=ta9=TPF=ZhBw@mail.gmail.com> <16867.128.135.52.6.1445533699.squirrel@cosmo.uchicago.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Oct 22, 2015 at 10:08 AM, Valeri Galtsev <galtsev@kicp.uchicago.edu>
wrote:

>
> On Thu, October 22, 2015 11:18 am, Mehmet Erol Sanliturk wrote:
> > On Thu, Oct 22, 2015 at 8:14 AM, Andrea Venturoli <ml@netfence.it>
> wrote:
> >
> >> On 10/22/15 14:18, Mehmet Erol Sanliturk wrote:
> >>
> >> If you have two identical computers with the same programs running :
> >>> One is working correctly , but other one is booting arbitrarily :
> >>>
> >>
> >> I've got another identical box; I'll restore a dump on this and see if
> >> the
> >> behaviour is the same.
> >>
> >>
> >>
> >>
> >> Therefore , there is a necessity to check that
> >>>
> >>> - processor is working correctly
> >>>
> >>
> >> CPU Burn-in says yes.
> >>
> >>
> >>
> >> - memories are working correctly
> >>>
> >>
> >> Memtest 86+ says so.
> >>
> >>
> >>
> >> - memory management chips are working correctly .
> >>>
> >>
> >> I have no idea how to check. How do I do this?
> >>
> >>
> >>
> >
> > If memory tests are showing memories are working correctly , it is
> > possible
> > to say that memory management chips are also working correctly .
> Otherwise
> > , it is not possible to write into and read from chips correctly .
> >
> > If memory chips fail , by testing with correctly working chips known ,
> the
> > problem may be attributed to memory management chips .
> >
> > Another possibility is the Watt level of Power Supply : If the required
> > watts is exceeding the existent power supply watts level , it may cause
> > reboots when power use increases beyond its capacity .
> >
> >
> > Another possibility is power supply is cutting power spontaneously  or
> > causing fluctuations .
> >
>
> Yes, I've seen this even if PS is marginally pushed to its capacity, and
> it is old, therefore filtering capacitors lost some of their capacitance.
> Excessive ripple on bus power leads (resulting from the above) and
> possibly aged capacitors of the system board (I still call it that way
> even though long ago the jargon "motherboard" became a standard) partly to
> blame. I've seen the machines starting to consume more power some 5 years
> down the road merely because hard drives age, and start consuming more
> power.
>
> Incidentally, memtest86 may pass successfully in the above case, as it
> runs with zero load, hence much less power consumption.
>
> I also wouldn't discard the possibility that BIOS temperature sensor(s) is
> (are) tripped - investigate that (simply increasing threshold levels would
> be the way to test if this is the case). If you have AMD CPUs, you should
> be safe. I heard someone said you can boil water on them and they still
> keep running. I had once to live with 96F in the server room for 2 hours
> (to let some maintenance be completed) and none of Opteron boxes got sick.
> A few of Intel ones did...
>
> Valeri
>
> >
> >
> >>
> >> Another problem may be a program which is causing generation of an
> >>> invalid address showing boot start code and jumping into it . This is
> >>> very easy for a i386 real mode program .
> >>>
> >>
> >> In that case this program would be FreeBSD! That's why I'm asking here.
> >>
> >>
> >>
> >>
> >
> > If you can isolate the program causing boots , it will be possible to
> > check
> > its sources and binary file .
> >
> >
> >
> >
> >>
> >> Another possibility is that a program is broken ( contains an invalid
> >>> address )
> >>>
> >> > in HDD . When it starts to working  , it jumps to that broken address
> >> and this
> >> > may start the boot .
> >>
> >> Would a userland program be allowed to do this???
> >>
> >>
> >>
> >
> > Let's assume that CPU is not over-heated and is not rebooting the
> computer
> > like  motherboard is powered .
> >
> > Let's assume that there is no any malicious program part to cause
> > rebooting
> > .
> >
> > A broken network card may corrupt data and may cause serious problems .
> >
> > The remaining possibility is that instruction counter value is destroyed
> > in
> > a program  and showing the BIOS boot code area . To reboot the computer ,
> > it is necessary to start BIOS boot code
> >
> > This may occur also during BIOS related calls . Instead of a proper
> > interrupt code , boot part is invoked .
> >
> > Otherwise we will say that within FreeBSD OS parts , there is a point
> that
> > , instead of a proper shut down , it is directly rebooting the computer
> by
> > calling BIOS boot code . Checking panic points and searching OS sources
> > for
> > such a reboot code ( without any error message and request approval from
> > the user ) existence may help .
> >
> > Here the most important part is to find the program part which is causing
> > the reboots . Studying this program part will reveal the reason and ,
> > therefore the cure .
> >
> >
> > I can not say any correct sentence here about FreeBSD internals due to (
> > not sufficient knowledge ) .
> >
> >
> > Since that computer is not working properly , you can do the following :
> > Reinstall OS into a spare disk and check with it .
> >
> > This will identify whether problem is caused by the presently installed
> OS
> > or not .
> > If it can execute 64-bits OS , testing with such an OS will identify
> > effect
> > of OS or hardware .
> >
> >
> >
> >
> >>
> >>  bye & Thanks
> >>         av.
> >>
> >
> >
> > Mehmet Erol Sanliturk
> > _______________________________________________
> > freebsd-questions@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-questions
> > To unsubscribe, send any mail to
> > "freebsd-questions-unsubscribe@freebsd.org"
> >
>
>
> ++++++++++++++++++++++++++++++++++++++++
> Valeri Galtsev
> Sr System Administrator
> Department of Astronomy and Astrophysics
> Kavli Institute for Cosmological Physics
> University of Chicago
> Phone: 773-702-4247
> ++++++++++++++++++++++++++++++++++++++++
>



Another important trouble point is HDD cables : They may be badly
corrupting loaded programs .
Checking ( replacing ) HDD cables may be useful .


Mehmet Erol Sanliturk



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOgwaMu%2B=EEn9OBtA64uGwedpF0OdwTfHGESrOU7argjFQZAgw>