Date: Sat, 19 Jan 2013 12:30:17 +0200 From: Marin Atanasov Nikolov <dnaeon@gmail.com> To: Warren Block <wblock@wonkity.com> Cc: ml-freebsd-stable <freebsd-stable@freebsd.org>, Ian Lepore <ian@freebsd.org>, kpneal@pobox.com, Ronald Klop <ronald-freebsd8@klop.yi.org> Subject: Re: Spontaneous reboots on Intel i5 and FreeBSD 9.0 Message-ID: <CAJ-UWtRRfCKg9GBR_ppvtjvJGadiOXMXBFBpX7tAvLEXDoZHQg@mail.gmail.com> In-Reply-To: <alpine.BSF.2.00.1301181313560.1604@wonkity.com> References: <CAJ-UWtSANRMsOqwW9rJ6Eebta6=AiHeNO6fhPO0mhYhZiMmn4A@mail.gmail.com> <op.wq3zxn038527sy@ronaldradial.versatec.local> <alpine.BSF.2.00.1301180758460.96418@wonkity.com> <1358527685.32417.237.camel@revolution.hippie.lan> <20130118173602.GA76438@neutralgood.org> <alpine.BSF.2.00.1301181313560.1604@wonkity.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, Re-sending this one, as I've attached an image which was too large to pass the mailing lists, sorry about that :) After starting the system last night I kept monitoring the memory usage just in case I see something strange and I've noticed a significant memory drop of the free memory between 03:00am and 03:05am time. I've taken a screenshot of the graph, which you can also see at the link below: * http://users.unix-heaven.org/~dnaeon/memory-usage.jpg At 03:00am I can see that periodic(8) runs, but I don't see what could have taken so much of the free memory. I'm also running this system on ZFS and have daily rotating ZFS snapshots created - currently the number of ZFS snapshots are > 1000, and not sure if that could be causing this. Here's a list of the periodic(8) daily scripts that run at 03:00am time. % ls -1 /etc/periodic/daily 100.clean-disks 110.clean-tmps 120.clean-preserve 130.clean-msgs 140.clean-rwho 150.clean-hoststat 200.backup-passwd 210.backup-aliases 220.backup-pkgdb 300.calendar 310.accounting 330.news 400.status-disks 404.status-zfs 405.status-ata-raid 406.status-gmirror 407.status-graid3 408.status-gstripe 409.status-gconcat 420.status-network 430.status-rwho 440.status-mailq 450.status-security 460.status-mail-rejects 470.status-named 480.status-ntpd 490.status-pkg-changes 500.queuerun 800.scrub-zfs 999.local % ls -1 /usr/local/etc/periodic/daily 402.zfSnap 403.zfSnap_delete 411.pkg-backup smart I'll keep monitoring the memory usage and will see if the free memory drops again by more than 50% on the next periodic(8) daily run. If the memory drop keeps the current trend that would mean that the system should crash in the next 1-2 days, so if that happens and the memory was low at that time I'll start debugging the periodic(8) scripts and see which one might be causing this. Thanks and regards, Marin On Fri, Jan 18, 2013 at 10:23 PM, Warren Block <wblock@wonkity.com> wrote: > On Fri, 18 Jan 2013, kpneal@pobox.com wrote: > > On Fri, Jan 18, 2013 at 09:48:05AM -0700, Ian Lepore wrote: >> >>> I tend to agree, a machine that starts rebooting spontaneously when >>> nothing significant changed and it used to be stable is usually a sign >>> of a failing power supply or memory. >>> >> >> Agreed. >> >> But I disagree about memtest86. It's probably not completely without >>> value, but to me its value is only negative: if it tells you memory is >>> bad, it is. If it tells you it's good, you know nothing. Over the >>> years I've had 5 dimms fail. memtest86 found the error in one of them, >>> but said all the others were fine in continuous 48-hour tests. I even >>> tried running the tests on multiple systems. >>> >>> The thing that always reliably finds bad memory for me >>> is /usr/ports/math/mprime run in test/benchmark mode. It often takes 24 >>> or more hours of runtime, but it will find your bad memory. >>> >> >> I've had "good" luck with gcc showing bad memory. If compiling a new >> kernel >> produces seg faults then I know I have a hardware problem. I've seen >> compilers at work failing due to bad memory as well. >> >> Some problems only happen with particular access patterns. So if a >> compiler >> works fine then, like memtest86, it doesn't say anything about the health >> of the hardware. >> > > Most test tools are like that. They might diagnose something as bad, but > they often can't prove it is good. SMART has a reputation for not finding > any problems on disks that are failing, and capacitors that aren't swollen > or leaking still may not be working. > > But diagnostic tools can at least give a hint. In my case, memtest > indicated a problem--a big problem. I removed one DIMM at random (there > were only two) and the problems and memtest errors both went away. Replace > the DIMM, and both came back. > > ______________________________**_________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/**mailman/listinfo/freebsd-**stable<http://lists.freebsd.org/mailman/listinfo/freebsd-stable> > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@**freebsd.org<freebsd-stable-unsubscribe@freebsd.org> > " > -- Marin Atanasov Nikolov dnaeon AT gmail DOT com http://www.unix-heaven.org/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-UWtRRfCKg9GBR_ppvtjvJGadiOXMXBFBpX7tAvLEXDoZHQg>