From owner-freebsd-fs@FreeBSD.ORG Sun May 6 15:43:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0944F106566B for ; Sun, 6 May 2012 15:43:22 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id BDE528FC08 for ; Sun, 6 May 2012 15:43:21 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id q46FhLXM011346; Sun, 6 May 2012 10:43:21 -0500 (CDT) Date: Sun, 6 May 2012 10:43:21 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Simon In-Reply-To: <201205061521.q46FLML1011267@blade.simplesystems.org> Message-ID: References: <201205061521.q46FLML1011267@blade.simplesystems.org> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Sun, 06 May 2012 10:43:21 -0500 (CDT) Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 15:43:22 -0000 On Sun, 6 May 2012, Simon wrote: > > So if you have a 50TB ZFS filesystem and your memory goes bad, even if ECC, > your entire 50TB is gonna go bunkers? disks fail, but memory doesn't? CPUs > don't fail? > > There are many things in a server that can fail and cause corruption, but that > shouldn't take down entire zpool. I'm okay with a few missing files ending up > in lost+found, but entire filesystem? That renders the entire thing useless if you > ask me. By your definition, computers would be useless. :-) There is no telling what might happen if a program (including kernel code) was to execute wrong instructions or read wrong data. This is not specific to zfs. Zfs caches large amounts of data in its in-memory ARC cache, which is succeptible to in-memory corruption. If it tried to detect and prevent memory corruption, it would be extremely slow and likely not work at all if there were actual failures. Part of the metadata structure of the pool needs to be cached in RAM for performance reasons. On the zfs-discuss list we sometimes hear of zfs checksum errors which are due to memory errors rather than disk errors. Zfs can be used without ECC memory, but pool reliability will suffer. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/