From owner-freebsd-stable@freebsd.org Thu Jun 29 13:43:12 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3D57DD9DD96 for ; Thu, 29 Jun 2017 13:43:12 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-yw0-x22d.google.com (mail-yw0-x22d.google.com [IPv6:2607:f8b0:4002:c05::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EEFD284FB5 for ; Thu, 29 Jun 2017 13:43:11 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-yw0-x22d.google.com with SMTP id 63so37104368ywr.0 for ; Thu, 29 Jun 2017 06:43:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=+fVen5vSStjH8h2Kxr6QLOZWR4QSfV9Nn/9aQ9K3pNc=; b=AeH0DdNa8gevJ4BfmRXyQxv9mx/0HuwbJC3HL+2pTxfSnZfK3d5L8lApFGf/PsmW0T S0s3Qh+YuLZVasMVx34WyE8shS8ija2mTAyTzNku8Uc74NqxApWXCvfchcuHwkjEB0zA eZ9NvtZ5TUDKsIwZTLtREbjlhCduyc262vN7IH2MKM2In+gsrjYJrJh+SYlbA4SlIcYp SuGrcxf86RCVkDyfmCzo7LG0bbyvgnVXQBJyEBDSFDkZYhtzQ1fi5qOh27HfSO0wg33P O6/JdtEX24BRzzyk/JK20w/7XXI0dVsVwGbub22aWe55U/E+gPKfGTGkvkGc8/a5JdFE 30WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=+fVen5vSStjH8h2Kxr6QLOZWR4QSfV9Nn/9aQ9K3pNc=; b=VhktcWd0UiatgGEToSVqA7pzfymipx9vxNRAzVg2ZsLkkYYd5Mr2fGq9m5IRDSf0wc bDIrq+lQht1Cg/Ley8QV9TkAQW+i2rbw9Pp8Loe+PFLpf1bZDyYRWwQT12tpS0SpGTcg s72qmXXBRxRbsyMhpZTGvd1FG58yvMF1gIX8PbJbTnG2vTRBOLlnplOczWetvBy8UUO9 KPVebW3RMzQOKSfBlikDiUbECXuYK/6t7dqGYSelzNOrETyjBMI69eLCoOLN5Ex/8wE6 TLikehfWyMjP8Vz41lBEJWZk/evI5I/mvAWucwTkV+SNNKoo7xcmhMpRj50eNfENOfCt sP8A== X-Gm-Message-State: AKS2vOwNSN7HmDW44fbucRSa9caByhyU3IurhHDFN/UIzQ6r41Rs7rTX XiMN2orGN0rDaf8GOmfGfbONyl5qew== X-Received: by 10.13.235.202 with SMTP id u193mr11873912ywe.222.1498743791037; Thu, 29 Jun 2017 06:43:11 -0700 (PDT) MIME-Version: 1.0 Sender: asomers@gmail.com Received: by 10.13.206.199 with HTTP; Thu, 29 Jun 2017 06:43:10 -0700 (PDT) In-Reply-To: <3c4044c5-9016-80ce-1302-2546c76f0dd4@norma.perm.ru> References: <3c4044c5-9016-80ce-1302-2546c76f0dd4@norma.perm.ru> From: Alan Somers Date: Thu, 29 Jun 2017 07:43:10 -0600 X-Google-Sender-Auth: 1FLpjBFdaEvg9Cw2Nuoul8NeriI Message-ID: Subject: Re: redundant zfs pool, system traps and tonns of corrupted files To: "Eugene M. Zheganin" Cc: FreeBSD Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jun 2017 13:43:12 -0000 On Thu, Jun 29, 2017 at 6:04 AM, Eugene M. Zheganin wrote: > Hi, > > On 29.06.2017 16:37, Eugene M. Zheganin wrote: >> >> Hi. >> >> >> Say I'm having a server that traps more and more often (different panics: >> zfs panics, GPFs, fatal traps while in kernel mode etc), and then I realize >> it has tonns of permanent errors on all of it's pools that scrub is unable >> to heal. Does this situation mean it's a bad memory case ? Unfortunately I >> switched the hardware to an identical server prior to encountering zpools >> have errors, so I'm not use when did they appear. Right now I'm about to run >> a memtest on an old hardware. >> >> >> So, whadda you say - does it point at the memory as the root problem ? Certainly a good guess. >> > > I'm also not quite getting the situation when I have errors on a vdev level, > but 0 errors on a lower device layer (could someone please explain this): ZFS checksums whole records at a time. On RAIDZ, each record is spread over multiple disks, usually the entire RAID stripe. So when ZFS detects a checksum error on a record stored in RAIDZ, it doesn't know which individual disk was actually responsible. Instead, it blames the RAIDZ vdev. That's why you have thousands of checksum errors on your raidz vdevs. The few checksum errors you have on individual disks might have come from the labels or uberblocks, which are not raided. -Alan