From owner-freebsd-questions@freebsd.org Wed Apr 19 12:26:56 2017 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A25A6D44010 for ; Wed, 19 Apr 2017 12:26:56 +0000 (UTC) (envelope-from g8kbvdave@googlemail.com) Received: from mail-wm0-x22f.google.com (mail-wm0-x22f.google.com [IPv6:2a00:1450:400c:c09::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 483171D37 for ; Wed, 19 Apr 2017 12:26:56 +0000 (UTC) (envelope-from g8kbvdave@googlemail.com) Received: by mail-wm0-x22f.google.com with SMTP id m123so10419629wma.0 for ; Wed, 19 Apr 2017 05:26:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to; bh=AImRMhF46WxRZxbcldo3gNQ1qNifUAsG7aW8/JOByso=; b=MHzUHpqldT/67qSEyjiy/oHtIXdeecYT7hwvx6O+AzWNff8wEaXdLCHMbQZIQKb5OC whUjrNy3jKZPlHGsbc5faooI1KhqTsc4SPGXuRYknmU5/p3mvlWhl/aR/XoomFHYhp2C 6p8CheDQuwhCDov6XkB3+ldSV/lSLhk9uyBH2aemcBghYC2as6OxzhPjA9U+Bxk4bbGp a/3RAgJk0twUqxbY8qrjn6QkgKloALYN1QbUq5AgKMVO8zZkWW9IYAHMsrlhSxvF318j nqllYrAoyLXYaf5v2fVYcnAXVT3IlBLANVX75x8lz8s1uz7Vfu6IU3XwQLqDfCXxGjFs 55CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to; bh=AImRMhF46WxRZxbcldo3gNQ1qNifUAsG7aW8/JOByso=; b=RgUBkVVtUPifOp/qWg6RVKD+ZApkzj0bili/GLFOBUrtONUXlur7bXCbR1bsAvAw5z zP4e4gZV9JMPSE++G4cRPJn9olXQjHMnpouFRhJu/vvc+TpPH23V9OOQrngWIpvB3Cex OhHy+jXo/lIFOaSEbhuvHk36IMcbmWqOaRnBV+5LX9kZc5GOiTpS5oKowYtlF9PGaX/7 QR23b/Rx1FUo1jZgIOYUUNe/2/6UPeydMstuJHMS79GVYJi7hFEa/zn482ddLMK3Z9Zx zRxotB0LODrEtEy/lgWdFmkLj8aPvvdhGWtSu2G+thdov4JSNxjDWGsbG78qKfYfqY0D MY6A== X-Gm-Message-State: AN3rC/4wpJjNb0sOqgr6fKXaV0UvrxnOgtMX2e5Y8Q1MtMUnBkeP0MiE xYNz9rSbSYMMRX6fQic= X-Received: by 10.28.176.5 with SMTP id z5mr18276048wme.3.1492604813555; Wed, 19 Apr 2017 05:26:53 -0700 (PDT) Received: from [192.168.2.57] ([217.41.35.220]) by smtp.gmail.com with ESMTPSA id b10sm19128822wme.22.2017.04.19.05.26.51 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Apr 2017 05:26:52 -0700 (PDT) Subject: Re: 10.3-stable random server reboots - finding the reason To: freebsd-questions@freebsd.org References: From: Dave B Message-ID: Date: Wed, 19 Apr 2017 13:26:51 +0100 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Apr 2017 12:26:56 -0000 On 18/04/17 19:26, freebsd-questions-request@freebsd.org wrote: > On 04/18/17 12:59, tech-lists wrote: >> I have an up-to-date 10.3 server that is randomly rebooting, after being >> up for days. Previously it had been up for many months. The problem is, >> nothing seems to be left in the logs to indicate why it's doing this. I >> have all.log and console.log enabled. >> >> So, what I'm asking is, how can I capture its last gasp? >> Start by overhauling it and clean any resident colony of dust bunnies out of it, also out of all the heat sinks/coolers too, and as others have said, check and clean the fan(s). Check all the power etc connections are firm and solid. (Come to that, are there any annecdotal reports of any electrical disturbances about the same time as the server dies? It may be falling rather than jumping off the edge.) If its powered by a dedicated UPS, are it's batteries OK? (After 2 years they should be treated as suspect at best! Easy to change though on most UPS's.) Some cheap (copper over aluminium) SATA cables can die over time too, causing all sorts of hard disk related wierd mayhem. Examine the mobo' electrolytic caps (round can's standing vertically) if any are showing bulging ends(!) (Or have actually split) or are showing a brown mess around their base, you'll need to replace them (not trivial to do!) Or replace the mobo. (Still a surprisingly common failure, even after the rogue producer of such items was "sorted out".) Such failures often manifest themselves as bad RAM! If you have to start swapping assemblies, start with the power supply, but *ONLY change one thing at a time* between tests, to be sure you identify the cause. And, if you think you found it, swap the last thing back in, to see if it fails again. If this is in a "production" environment (other people use it) make a clone and swap the entire machine out, so you can run diag's on the suspect machine at your leisure without causing any grief to your users. Always good to have a clone as a backup for anything serious. Best Regards. Dave B.