From owner-freebsd-stable@FreeBSD.ORG Fri Jan 18 18:37:02 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A7329803; Fri, 18 Jan 2013 18:37:02 +0000 (UTC) (envelope-from dnaeon@gmail.com) Received: from mail-ie0-f174.google.com (mail-ie0-f174.google.com [209.85.223.174]) by mx1.freebsd.org (Postfix) with ESMTP id 702F1F2A; Fri, 18 Jan 2013 18:37:02 +0000 (UTC) Received: by mail-ie0-f174.google.com with SMTP id k11so886081iea.5 for ; Fri, 18 Jan 2013 10:36:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=uZPYuDT1qMrS6BD65GoA9GtaHzacSWyL8iQAjO6FSwI=; b=II/jco/rxTXSdpf1JjlAwEnFcN3yl08MpkpH86t0++2rz/mHrmPA0AU19WNZwwB7Wj AXgIMBnExIKmjiajvzsOwpeUcBnklDbDyJ9SaTwCNabYrtrbulWth1VoNZpHjqqJqM6c iKO5LkXoXkS6AmN257QDlH6SdcpHeUhyMvZ4mUGR5vjmxAgV2EsE3MGAWm/Ttkt4QPRn fhmSC6zDVyJI3qP6oL2XBlr29His9uICJM3zZBmuzmsp1abjjIZVu6YWFRKyd/BC8//p ZOoG9+FZGMs6BzWGd8Hq4s/s2pO6cVomrD1YSe3rrF1et7ZtOOALpZ/3xyz0eYCQiMc+ 1J/w== MIME-Version: 1.0 X-Received: by 10.43.134.65 with SMTP id ib1mr5131736icc.12.1358534216718; Fri, 18 Jan 2013 10:36:56 -0800 (PST) Received: by 10.50.13.71 with HTTP; Fri, 18 Jan 2013 10:36:56 -0800 (PST) In-Reply-To: <20130118173602.GA76438@neutralgood.org> References: <1358527685.32417.237.camel@revolution.hippie.lan> <20130118173602.GA76438@neutralgood.org> Date: Fri, 18 Jan 2013 20:36:56 +0200 Message-ID: Subject: Re: Spontaneous reboots on Intel i5 and FreeBSD 9.0 From: Marin Atanasov Nikolov To: kpneal@pobox.com Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Warren Block , ml-freebsd-stable , Ian Lepore , Ronald Klop X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 18:37:02 -0000 Hello, Thanks everyone for the input. Here's what I did. * Checked power cables - everything okay * Checked for bad capacitors - everything okay * Ran memtest86+ test - everything okay I have to mention also that the machine's hardware is new one - hdds, motherboard, memory, power supply, etc.. The only thing I've replaced on this machine are 2 x 750Gb old Seagate disks with brand new 2 x 1000Gb Seagate ones. That was a month ago I think, but didn't have any issue until recently. smartd doesn't say anything to be worried about these disks, neither. I am also using this machine as a build host for months already and I build different projects on it using gcc, clang, scan-build and others and I've never had problems building or testing anything, so I would have noticed any issues if I had problems with gcc or clang for example. On one of my old machines I had issues with memory and bad disks in the past and then the system simply halted. I could see on the terminal that the system halted and some info about the problem itself, but never had a system that rebooted due to issues with hardware. If it was a problem with memory or disks I would expect that the system simply halts, and not reboots itself, and I'm a bit puzzled what could be the root cause of this. This system very rarely changes in terms of hardware or software. On the software side the one change that was done few weeks ago was to allow System V IPC primitives to be used in jails, but I don't see how that could cause these issues. I've had a power outage more than 2 weeks ago, but the UPS successfully shutdown the system and it's batteries have been exhausted at the time of the outage. They were recharged since then, but in the meantime I'm thinking of unplugging this system from the UPS control cables, just to be sure that this outage since 2 weeks ago didn't break the UPS in some way. Thanks again, Marin On Fri, Jan 18, 2013 at 7:36 PM, wrote: > On Fri, Jan 18, 2013 at 09:48:05AM -0700, Ian Lepore wrote: > > I tend to agree, a machine that starts rebooting spontaneously when > > nothing significant changed and it used to be stable is usually a sign > > of a failing power supply or memory. > > Agreed. > > > But I disagree about memtest86. It's probably not completely without > > value, but to me its value is only negative: if it tells you memory is > > bad, it is. If it tells you it's good, you know nothing. Over the > > years I've had 5 dimms fail. memtest86 found the error in one of them, > > but said all the others were fine in continuous 48-hour tests. I even > > tried running the tests on multiple systems. > > > > The thing that always reliably finds bad memory for me > > is /usr/ports/math/mprime run in test/benchmark mode. It often takes 24 > > or more hours of runtime, but it will find your bad memory. > > I've had "good" luck with gcc showing bad memory. If compiling a new kernel > produces seg faults then I know I have a hardware problem. I've seen > compilers at work failing due to bad memory as well. > > Some problems only happen with particular access patterns. So if a > compiler > works fine then, like memtest86, it doesn't say anything about the health > of the hardware. > > -- > Kevin P. Neal http://www.pobox.com/~kpn/ > 'Concerns about "rights" and "ownership" of domains are > inappropriate. > It is appropriate to be concerned about "responsibilities" and "service" > to the community.' -- RFC 1591, page 4: March 1994 > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > -- Marin Atanasov Nikolov dnaeon AT gmail DOT com http://www.unix-heaven.org/