From owner-freebsd-questions@freebsd.org Wed Jan 22 16:36:56 2020 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 42CD71F5280 for ; Wed, 22 Jan 2020 16:36:56 +0000 (UTC) (envelope-from matthew@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 482rdz6wDYz4DhS for ; Wed, 22 Jan 2020 16:36:55 +0000 (UTC) (envelope-from matthew@FreeBSD.org) Received: from smtp.infracaninophile.co.uk (smtp.infracaninophile.co.uk [81.2.117.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.infracaninophile.co.uk", Issuer "Let's Encrypt Authority X3" (verified OK)) (Authenticated sender: matthew/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id C83801C081 for ; Wed, 22 Jan 2020 16:36:55 +0000 (UTC) (envelope-from matthew@FreeBSD.org) Received: from PD0786.local (242.201-252-62.static.virginmediabusiness.co.uk [62.252.201.242]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: m.seaman@infracaninophile.co.uk) by smtp.infracaninophile.co.uk (Postfix) with ESMTPSA id 32DE51C892 for ; Wed, 22 Jan 2020 16:36:52 +0000 (UTC) Authentication-Results: smtp.infracaninophile.co.uk; dmarc=none (p=none dis=none) header.from=FreeBSD.org Authentication-Results: smtp.infracaninophile.co.uk/32DE51C892; dkim=none; dkim-atps=neutral Subject: Re: 12.1 RELEASE General Protection Fault (Trap 9) To: freebsd-questions@freebsd.org References: <22046a36-12d3-032a-6325-24e18b1a855b@lateapex.net> From: matthew@FreeBSD.org Message-ID: <693acc2b-b573-9fba-ab73-91d28f27e8ac@infracaninophile.co.uk> Date: Wed, 22 Jan 2020 16:36:52 +0000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.3.1 MIME-Version: 1.0 In-Reply-To: <22046a36-12d3-032a-6325-24e18b1a855b@lateapex.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jan 2020 16:36:56 -0000 On 22/01/2020 15:12, Jason Van Patten wrote: > Since sometime before Christmas (as far as I know), my NAS has started > randomly crashing, reloading, and saving cores in /var/crash.  It was > doing this with 12.0 and now with 12.1.  My gut tells me it's hardware > related, but I'm not quite sure.  The various bits and pieces are: Given the crashes do not appear to be associated with any particular activity, I think you're on the money with your diagnosis that it is hardware related. Did you change any of the hardware on this system recently? If you've added more disks or such, then you may have overloaded the PSU. If the PSU can't produce voltages in spec, then you will see random crashes, although I doubt in that case you'ld always see 'General PRotection Fault'. Unless this is a new machine, or you've changed some of the hardware this is unlikely to be the diagnosis. Otherwise, suspect hardware problems. In rough order of expense, least to most: * Bad heatsink, failed case fan, CPU thermal paste not up to snuff or other cause that may lead to your system overheating * Bad memory * Bad CPU The first of these is relatively cheap and easy to handle: make sure you're getting unimpeded airflow through the chassis -- clean any filters, make sure fans are spinning correctly and that heatsinks have good thermal contact, if necessary by renewing any thermal paste. Monitoring the CPU temperature will help here -- if you see the CPU temperature increasing just before everything goes kaput, that's a fairly solid diagnostic. For an i7, you should be able to use the coretemp(4) kernel module and read-off the temperature from the dev.cpu.%d.temperature sysctls. Memory problems can frequently be diagnosed by use of a memory checker like sysutils/memtest86+ -- if this says you have a problem, then you do have a problem. However, it may not catch every possible memory problem so it can wrongly give you an 'all clear'. It's pretty accurate in practice though. A more definitive test is to swap out any suspect RAM modules and see if the problem goes away. The worst case is a bad CPU. memtest86+ will diagnose some CPU faults, but it is less effective on CPU problems. If there is a CPU problem, it will be a pretty subtle one, as typical symptoms of CPU problems are the system won't boot and the BIOS makes horrible beeping noises when you try. Even so, this isn't a definitive list. I've heard tales about trying to diagnose this sort of problem where someone had bit by bit swapped out all of the components of a system except for the case, and the problem still occurred. Turned out the case was slightly bent and that put enough stress on the motherboard to cause some intermittent electrical connectivity. Cheers, Matthew