From owner-freebsd-stable@freebsd.org  Thu Jun 29 13:43:12 2017
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3D57DD9DD96
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Thu, 29 Jun 2017 13:43:12 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: from mail-yw0-x22d.google.com (mail-yw0-x22d.google.com
 [IPv6:2607:f8b0:4002:c05::22d])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id EEFD284FB5
 for <freebsd-stable@freebsd.org>; Thu, 29 Jun 2017 13:43:11 +0000 (UTC)
 (envelope-from asomers@gmail.com)
Received: by mail-yw0-x22d.google.com with SMTP id 63so37104368ywr.0
 for <freebsd-stable@freebsd.org>; Thu, 29 Jun 2017 06:43:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:sender:in-reply-to:references:from:date:message-id
 :subject:to:cc;
 bh=+fVen5vSStjH8h2Kxr6QLOZWR4QSfV9Nn/9aQ9K3pNc=;
 b=AeH0DdNa8gevJ4BfmRXyQxv9mx/0HuwbJC3HL+2pTxfSnZfK3d5L8lApFGf/PsmW0T
 S0s3Qh+YuLZVasMVx34WyE8shS8ija2mTAyTzNku8Uc74NqxApWXCvfchcuHwkjEB0zA
 eZ9NvtZ5TUDKsIwZTLtREbjlhCduyc262vN7IH2MKM2In+gsrjYJrJh+SYlbA4SlIcYp
 SuGrcxf86RCVkDyfmCzo7LG0bbyvgnVXQBJyEBDSFDkZYhtzQ1fi5qOh27HfSO0wg33P
 O6/JdtEX24BRzzyk/JK20w/7XXI0dVsVwGbub22aWe55U/E+gPKfGTGkvkGc8/a5JdFE
 30WA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:sender:in-reply-to:references:from
 :date:message-id:subject:to:cc;
 bh=+fVen5vSStjH8h2Kxr6QLOZWR4QSfV9Nn/9aQ9K3pNc=;
 b=VhktcWd0UiatgGEToSVqA7pzfymipx9vxNRAzVg2ZsLkkYYd5Mr2fGq9m5IRDSf0wc
 bDIrq+lQht1Cg/Ley8QV9TkAQW+i2rbw9Pp8Loe+PFLpf1bZDyYRWwQT12tpS0SpGTcg
 s72qmXXBRxRbsyMhpZTGvd1FG58yvMF1gIX8PbJbTnG2vTRBOLlnplOczWetvBy8UUO9
 KPVebW3RMzQOKSfBlikDiUbECXuYK/6t7dqGYSelzNOrETyjBMI69eLCoOLN5Ex/8wE6
 TLikehfWyMjP8Vz41lBEJWZk/evI5I/mvAWucwTkV+SNNKoo7xcmhMpRj50eNfENOfCt
 sP8A==
X-Gm-Message-State: AKS2vOwNSN7HmDW44fbucRSa9caByhyU3IurhHDFN/UIzQ6r41Rs7rTX
 XiMN2orGN0rDaf8GOmfGfbONyl5qew==
X-Received: by 10.13.235.202 with SMTP id u193mr11873912ywe.222.1498743791037; 
 Thu, 29 Jun 2017 06:43:11 -0700 (PDT)
MIME-Version: 1.0
Sender: asomers@gmail.com
Received: by 10.13.206.199 with HTTP; Thu, 29 Jun 2017 06:43:10 -0700 (PDT)
In-Reply-To: <3c4044c5-9016-80ce-1302-2546c76f0dd4@norma.perm.ru>
References: <fec0d640-7818-3e35-059a-fc0c5a588684@norma.perm.ru>
 <3c4044c5-9016-80ce-1302-2546c76f0dd4@norma.perm.ru>
From: Alan Somers <asomers@freebsd.org>
Date: Thu, 29 Jun 2017 07:43:10 -0600
X-Google-Sender-Auth: 1FLpjBFdaEvg9Cw2Nuoul8NeriI
Message-ID: <CAOtMX2gvASYCj2Z94a0bd0bm1pCD5azZSprZEP7ShDj04YZbtA@mail.gmail.com>
Subject: Re: redundant zfs pool, system traps and tonns of corrupted files
To: "Eugene M. Zheganin" <emz@norma.perm.ru>
Cc: FreeBSD <freebsd-stable@freebsd.org>
Content-Type: text/plain; charset="UTF-8"
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jun 2017 13:43:12 -0000

On Thu, Jun 29, 2017 at 6:04 AM, Eugene M. Zheganin <emz@norma.perm.ru> wrote:
> Hi,
>
> On 29.06.2017 16:37, Eugene M. Zheganin wrote:
>>
>> Hi.
>>
>>
>> Say I'm having a server that traps more and more often (different panics:
>> zfs panics, GPFs, fatal traps while in kernel mode etc), and then I realize
>> it has tonns of permanent errors on all of it's pools that scrub is unable
>> to heal. Does this situation mean it's a bad memory case ? Unfortunately I
>> switched the hardware to an identical server prior to encountering zpools
>> have errors, so I'm not use when did they appear. Right now I'm about to run
>> a memtest on an old hardware.
>>
>>
>> So, whadda you say - does it point at the memory as the root problem ?

Certainly a good guess.

>>
>
> I'm also not quite getting the situation when I have errors on a vdev level,
> but 0 errors on a lower device layer (could someone please explain this):

ZFS checksums whole records at a time.  On RAIDZ, each record is
spread over multiple disks, usually the entire RAID stripe.  So when
ZFS detects a checksum error on a record stored in RAIDZ, it doesn't
know which individual disk was actually responsible.  Instead, it
blames the RAIDZ vdev.  That's why you have thousands of checksum
errors on your raidz vdevs.  The few checksum errors you have on
individual disks might have come from the labels or uberblocks, which
are not raided.

-Alan