From owner-freebsd-stable@freebsd.org Thu May 12 19:03:09 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7EA53B38624 for ; Thu, 12 May 2016 19:03:09 +0000 (UTC) (envelope-from steven@multiplay.co.uk) Received: from mail-wm0-x235.google.com (mail-wm0-x235.google.com [IPv6:2a00:1450:400c:c09::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2115611ED for ; Thu, 12 May 2016 19:03:08 +0000 (UTC) (envelope-from steven@multiplay.co.uk) Received: by mail-wm0-x235.google.com with SMTP id a17so151801810wme.0 for ; Thu, 12 May 2016 12:03:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=PbYUa4okLJD3j+jaZuSRBIWOciOMsEkFSUxcbXUegwc=; b=KN/yX7WVYTlbJjolcp/rR/ZxaldaMTFpOcrZQU4jbXxAeOXBV5it0bcPsPeLNZwS5K h/BRLYK4gdcuoqwx/6TZNqNeauFQwodNbAN+YkeSAptIJJWdADs/uGoz0+UOg5cyIRoO MHOahHHZfSc5U1BCu1iWJXLDXKAqYTPuGjBmOyJltxONS382FcEAM0EiOpJrqcU9qvfJ oS9qxVkHvIiNqGhRkXdf9zPpDm/4UwCBdmYxCUiGDk8cQCowFVls8rMHotaaczFF5qjT mYkADf6kGQckUpZyYVFI4mcFZ+i+Zgfc0JFtdLpOGvVc8V/O0fRydoKGvZxVaynSO2cj qE2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=PbYUa4okLJD3j+jaZuSRBIWOciOMsEkFSUxcbXUegwc=; b=Q8GT6L4NuswgqfSzDVeHyfXJXPhM9s81uIqEAr4Jcx6I+kWX40PRtbfHLUvV6la6bX mDY3yifxSadoEn6yU8hTEaiXm5RoTaPTT4yi4XpVxiNavPhGP5op+IVDVTh4joiM2SdR 3EkF30jvwyz4lzfxi/r2TaGozX9p24VVEoE9VjPCy0zVteLdae5VSfmF/MRLxvMoAE3q q8gAZbnmfVxomgTb2v2bQ+gxhNWFw0Yv6zYInR7IdtAB5zvpX3CFc071HtdJUYtgeWed BMTOLOgHMMR58dtQmkBw4Bc22qltUMCtibjWeoceeMUKfjgwNHs0llXR6KpzxeRxUy6j F7qA== X-Gm-Message-State: AOPr4FXaRkBuIAbQ92G7c+Z1pD8MmPB7ziHWCiZcHhwn46NAZE5ZmwLsQ3PbI7n3ttHL+uEA8TD54NYTSRCektVH MIME-Version: 1.0 X-Received: by 10.194.163.229 with SMTP id yl5mr11534293wjb.6.1463079787281; Thu, 12 May 2016 12:03:07 -0700 (PDT) Received: by 10.28.93.203 with HTTP; Thu, 12 May 2016 12:03:07 -0700 (PDT) In-Reply-To: <57349ED3.7060606@barnabas.dk> References: <57349D5B.50202@barnabas.dk> <57349ED3.7060606@barnabas.dk> Date: Thu, 12 May 2016 20:03:07 +0100 Message-ID: Subject: Re: HP DL 585 / ACPI ID / ECC Memory / Panic From: Steven Hartland To: Nikolaj Hansen Cc: "freebsd-stable@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2016 19:03:09 -0000 I wouldn't rule out a bad cpu as we had a very similar issue and that's what it was. Quick way to confirm is to move all the dram from the disabled CPU to one of the other CPUs and see if the issue stays away with the current CPU still disabled. If that's the case it's likely the on chip memory controller has developed a fault On Thursday, 12 May 2016, Nikolaj Hansen wrote: > Hi, > > I recently added a zfs disk array to my old HP 585 G1 Server. > Immediately there was kernel panics and I have spent quite a bit of time > figuring out what was really wrong. > > The system has 4 cpu cards with opteron double core processors. Each > card has 4x2 gigabyte memory 4x2x4 = 32 gigabyte of total system mem. > The memory is DDR400 ECC mem. > > The panic was very easily reproducable. I just had to issue enough reads > to the system up until the faulty mem was accessed. > > Strangely I can run memtest86+ with the DDR setting on and I find no > error what so ever. > > Adding > > hint.lapic.2.disabled=1 > /boot/loader.conf > > Immediately mitigates the error for FreeBSD. So here is my conclusion: > > If you can make the system stable by disabling one core on one cpu card: > > 1) The other cards / mem must be ok. > 2) The mainboard must be ok since one of the cores on the cpu is still > running / not barfing panics. > 3) the cpu core with acpi 2 is probably also ok. it is on the same chip > as a non disabled core. > 4) It is likely down to a rotten DIMM. > > In place of mindlessly trying to find the culprit by switching dimms I > would really like to identify the CPU, card and mem module from the os. > > Info here: > > http://pastebin.com/jqufNKck > > Thank you for your time and help. > > -- > > > Med venlig hilsen / with regards > > Nikolaj Hansen > > > > > >