From owner-freebsd-fs@FreeBSD.ORG  Fri Aug 31 21:05:07 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9146316A420;
	Fri, 31 Aug 2007 21:05:07 +0000 (UTC)
	(envelope-from kvs@binarysolutions.dk)
Received: from solow.pil.dk (relay.pil.dk [195.41.47.164])
	by mx1.freebsd.org (Postfix) with ESMTP id 585B013C458;
	Fri, 31 Aug 2007 21:05:07 +0000 (UTC)
	(envelope-from kvs@binarysolutions.dk)
Received: from coruscant.local (naboo.binarysolutions.dk [80.196.17.173])
	by solow.pil.dk (Postfix) with ESMTP id B3C8E1CC117;
	Fri, 31 Aug 2007 23:04:38 +0200 (CEST)
Received: by coruscant.local (Postfix, from userid 502)
	id 04BBF5D656A; Fri, 31 Aug 2007 23:04:37 +0200 (CEST)
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
References: <m1wsvtkviw.fsf@binarysolutions.dk>
	<20070820112946.GC16977@garage.freebsd.pl>
	<m1ps1iz9bi.fsf@binarysolutions.dk>
From: Kenneth Vestergaard Schmidt <kvs@pil.dk>
Date: Fri, 31 Aug 2007 23:04:37 +0200
In-Reply-To: <m1ps1iz9bi.fsf@binarysolutions.dk> (Kenneth Vestergaard
	Schmidt's message of "Mon\, 20 Aug 2007 14\:20\:33 +0200")
Message-ID: <m13axza00q.fsf@binarysolutions.dk>
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (darwin)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: 'checksum mismatch' all over the place
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Aug 2007 21:05:07 -0000

Kenneth Vestergaard Schmidt <kvs@pil.dk> writes:
>> How do you know it was fine? Did you have something that did
>> checksumming? You could try geli with integrity verification feature
>> turned on, fill the disks with some random data and then read it back,
>> if your controller corrupts the data, geli should tell you this.
>
> I may have to do this. The previous drive was almost filled to the brim
> with data, which rsync looked at each day, and we didn't have a lot of
> re-transfer, but that doesn't necessarily mean anything.

*blush*

This turned out to be a firmware-issue with the Eonstor
RAID-enclosure. After upgrading to v3.47, everything is fine in the
checksum-department.

Now, however, I can't seem to keep the box running. We've rsync'd 1.56
TB data to an 8.18 TB raidz2 pool, and we're getting panics all the
time.

It's an x86 with 4 GB RAM. I've got the following in /boot/loader.conf:

  vfs.zfs.prefetch_disable="1"
  vfs.zfs.arc_max="107772160"
  vm.kmem_size_max="629145600"
  vm.kmem_size_min="629145600"

and kern.maxvnodes is set to 50000. When the machine is finished
booting, 'vmstat -m' says:

         Type InUse MemUse HighUse Requests  Size(s)
      solaris 49972 158199K       -   455307  16,32,64,128,256,512,1024,2048,4096

and after about an hours worth of rsync'ing, we get:

         Type InUse MemUse HighUse Requests  Size(s)
      solaris 198797 449675K       - 404226785  16,32,64,128,256,512,1024,2048,4096
  panic: kmem_malloc(28672): kmem_map too small: 614682624 total allocated

I'm not quite sure what knobs to twiddle with, or what values to watch,
so any help in this department would be much appreciated. I'm sure it'd
be nice to update the Wiki, too, with that info, since the values there
don't make things stable.

-- 
Kenneth Schmidt
pil.dk