From owner-freebsd-current@freebsd.org Mon Jan 4 14:17:47 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C623AA6155E for ; Mon, 4 Jan 2016 14:17:47 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x22a.google.com (mail-wm0-x22a.google.com [IPv6:2a00:1450:400c:c09::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5DD7A1EA4 for ; Mon, 4 Jan 2016 14:17:47 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x22a.google.com with SMTP id f206so174684501wmf.0 for ; Mon, 04 Jan 2016 06:17:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-type:content-transfer-encoding; bh=fB0h5Tpis/mZCrCsf9MKdCvWSmkl4EGf7cnb4G/rIiY=; b=rVW08N8Xg21UZaCLMIfA0fzdSFgigjq/aMKQybJWgp5Ly9YYk2tzLndZPcf4N4Kv7y xAVn88DqxdEOzuS9wnfgAp5gz5eg5Y9+TwLflxFVV1SQsJX8K+jRmHjvFLywxoutynil WEHj0RQI6GvkzPMmiNgtBrC+KygabKKtoqIT2PASkEjGTnOTsBLja6OHtbQU7Zug+uDk OOKDuKbczwxhbnxwANhlH0ouhTeqdeeJ6PgQy82b0J4CrGQJTS0rNEkbTWdZCpnByP1P ukHSOJbkkLt8+UVwVFQ2gLvLtdNeCJp8azSN1iQN1WbXAzVzb3UfdoTFFs73dEx0/aVR FNWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=fB0h5Tpis/mZCrCsf9MKdCvWSmkl4EGf7cnb4G/rIiY=; b=LXhznunBpGKCuranfcgANOIawrjJyxFAnGPtdX1BYleo73y0jZnFjfFdrzVuHvE97+ ZdyWr9ZWVe+qXR6jVxRmj1kPrz8exX/P/j3uvuSsB/XWhzG+KjJokz6x89RWGYL5fPzO SFqSS3OInX4wuyLIi9UoWK5txotaLTTik1G1gz0ZcJ1LqgofUEUbWxvCj0cA1IYRAk4g dp6kdIS2StNSkWwBpQw1HA3tla+bfxav94Xm8AVZnZycGfHjMOLJz9LvaqPNSNE2R6U6 O1hv+YbA8ohPnybyqEGJKXO7lT48FP5Sg6Rhm28ou33L5jKnc2dq9jHElC1XrYle/Llq jNVw== X-Gm-Message-State: ALoCoQlWBZcbZX56czDHI7XP3GgBrIa/0BiSEfTZc2+s3S2IDOIeaHK7w6wBBNvTsNfkWPPP1AEtvjnZ+GzeEGIcvdjGOIF+cw== X-Received: by 10.28.55.209 with SMTP id e200mr44868330wma.2.1451917065515; Mon, 04 Jan 2016 06:17:45 -0800 (PST) Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171]) by smtp.gmail.com with ESMTPSA id e198sm26394233wmd.0.2016.01.04.06.17.44 for (version=TLSv1/SSLv3 cipher=OTHER); Mon, 04 Jan 2016 06:17:44 -0800 (PST) Subject: Re: FreeBsd MCA Panic Crash !! To: freebsd-current@freebsd.org References: <1451903649383-6064691.post@n5.nabble.com> From: Steven Hartland Message-ID: <568A7F0F.6060307@multiplay.co.uk> Date: Mon, 4 Jan 2016 14:17:51 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <1451903649383-6064691.post@n5.nabble.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Jan 2016 14:17:48 -0000 Bank 5 seems to be common to all the crashes, which may suggest you have some dodgy ram or possibly the driving CPU's memory controller. As the error says this is a Hardware issue. One thing we've used in the past to narrow issues like this down is to remove as much RAM as possible and to disable all but one CPU core using /boot/loader.conf hints, where X is the the number of CPU core to disable as reported by the boot process. hint.lapic.X.disabled="1" Regards Steve On 04/01/2016 10:34, shahzaibcb wrote: > Hi, > > We've switched to FreeBSD recently to accomodate large video storage as we > are running video streaming website. So the job of the FreeBSD is to > transcode the uploaded videos using ffmpeg and serve them to users via nginx > webserver but so far our experience is not very good with it. It crashes > every 2-3 days and we're unable to track down the problem. The server specs > are pretty high : > > > Supermicro X5690 (12 cores, 24 threads - 2u) > 96GB RAM > 12x3TB RAID-10 (HBA-LSI9211) > > Here is the screenshot of recent crash : > > http://prntscr.com/9er3pk > > One thing worth mentioning is, before going down there's no load on server, > more or less free RAM usually is around 12GB. We've tried following > solutions so far : > > > - Updated FreeBSD OS > - Replaced 800W PS with 900W > - We've reduced CMOS from MAX(26x) to 18x as suggested in this post > http://unix.stackexchange.com/questions/60574/determining-cause-of-linux-kernel-panic > > The solution we've not performed so far is : > > - Disable mca using (hw.mca.enabled: 0) - As we're getting MCA panics. > > Here is the crash dump : > > [root@cw001 /var/crash]# mcelog --no-dmi --ascii --file core.txt.1 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 3 BANK 5 > MISC 0 ADDR 802bf6a69 > MCG status:MCIP > MCi status: > Uncorrected error > Error enabled > MCi_MISC register valid > MCi_ADDR register valid > Processor context corrupt > MCA: Internal Timer error > STATUS be00000000800400 MCGSTATUS 4 > MCGCAP 1c09 APICID 3 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 2 BANK 5 > MISC 0 ADDR 802bf6a69 > MCG status:MCIP > MCi status: > Uncorrected error > Error enabled > MCi_MISC register valid > MCi_ADDR register valid > Processor context corrupt > MCA: Internal Timer error > STATUS be00000000800400 MCGSTATUS 4 > MCGCAP 1c09 APICID 2 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 3 BANK 5 > MISC 0 ADDR 802bf6a69 > MCG status:MCIP > MCi status: > Uncorrected error > Error enabled > MCi_MISC register valid > MCi_ADDR register valid > Processor context corrupt > MCA: Internal Timer error > STATUS be00000000800400 MCGSTATUS 4 > MCGCAP 1c09 APICID 3 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 2 BANK 5 > MISC 0 ADDR 802bf6a69 > MCG status:MCIP > MCi status: > Uncorrected error > Error enabled > MCi_MISC register valid > MCi_ADDR register valid > Processor context corrupt > MCA: Internal Timer error > STATUS be00000000800400 MCGSTATUS 4 > MCGCAP 1c09 APICID 2 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 44 > > ----------------------------------------------------------------------------------- > > I showed those Hardware errors to Vendor from whom we purchased Supermicro > servers . This is what he has to say : > > ----------------------------------- > Why do you not made one test environment with CentOS or one other Linux that > you know to use, and see if you have same errors ??? if not than you know > that the errors come from OS not from hardware. ( CentOS, RedHead….work > diferend like FreeBSD – work direct on hardware if you don’t have the right > kernel settings can the server crashed. CentOS , RedHead…. don’t work direct > on hardware and distribute the resource load better and you have better > control and you can better debug one situation) > ----------------------------------- > > Now we're on a black hole and unable to find that either issue with FreeBSD > or Hardware. We're thinking to disable mca in loader.conf but ppl are not > suggesting it. If you guys can help us, it'd be very kind. > > > > -- > View this message in context: http://freebsd.1045724.n5.nabble.com/FreeBsd-MCA-Panic-Crash-tp6064691.html > Sent from the freebsd-current mailing list archive at Nabble.com. > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"