From owner-freebsd-current@FreeBSD.ORG Thu Dec 13 14:15:27 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E608816A418 for ; Thu, 13 Dec 2007 14:15:27 +0000 (UTC) (envelope-from daved@tamu.edu) Received: from sr-6-int.cis.tamu.edu (smtp-relay.tamu.edu [165.91.22.120]) by mx1.freebsd.org (Postfix) with ESMTP id AB82013C457 for ; Thu, 13 Dec 2007 14:15:27 +0000 (UTC) (envelope-from daved@tamu.edu) Received: from localhost (localhost.tamu.edu [127.0.0.1]) by sr-6-int.cis.tamu.edu (Postfix) with ESMTP id 03AC828D648; Thu, 13 Dec 2007 07:59:48 -0600 (CST) Received: from [192.168.1.46] (pool-71-113-249-98.herntx.dsl-w.verizon.net [71.113.249.98]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by sr-6-int.cis.tamu.edu (Postfix) with ESMTP id 5951328DE4A; Thu, 13 Dec 2007 07:59:44 -0600 (CST) In-Reply-To: <4760B444.1080604@clearchain.com> References: <47606C09.2070209@isc.org> <47609F0A.7010805@clearchain.com> <47609FE3.8040606@barafranca.com> <4760B444.1080604@clearchain.com> Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: multipart/signed; micalg=sha1; boundary=Apple-Mail-1-203849357; protocol="application/pkcs7-signature" Message-Id: <06CAC7FC-DB58-441D-A6E0-76D1D8133393@tamu.edu> From: David Duchscher Date: Thu, 13 Dec 2007 07:59:35 -0600 To: Benjamin Close X-Mailer: Apple Mail (2.752.2) X-Virus-Scanned: amavisd-new at tamu.edu X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-current@freebsd.org, Hugo Silva Subject: Re: ZFS melting under postgres... X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Dec 2007 14:15:28 -0000 --Apple-Mail-1-203849357 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed On Dec 12, 2007, at 10:25 PM, Benjamin Close wrote: > Hugo Silva wrote: >> Benjamin Close wrote: >>> Peter Losher wrote: >>>> Hi, >>>> >>>> As part of our testing 7.0/ZFS we tried putting it thru it's paces >>>> having ZFS act as our storage medium for some test pgsql db's >>>> (like for >>>> sqlgrey, etc) and in both BETA2 and BETA4 (amd64) we get the same >>>> results with a RAIDZ2 container: >>>> >>>> -=- >>>> Dec 12 14:24:12 nsa sqlgrey: fatal: setconfig error at >>>> /usr/local/sbin/sqlgrey line 186. >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad4 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad6 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad8 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad10 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad12 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad14 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad16 offset=3665128448 size=21504 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad18 offset=3665128448 size=21504 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad4 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad6 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad8 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad10 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad12 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad14 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad16 offset=3665128448 size=21504 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad18 offset=3665128448 size=21504 >>>> Dec 12 16:49:53 nsa root: ZFS: zpool I/O failure, zpool=vault >>>> error=86 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad4 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad6 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad8 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad10 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad12 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad14 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad16 offset=3665128448 size=21504 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad18 offset=3665128448 size=21504 >>>> Dec 12 16:49:53 nsa postgres[50527]: [5-1] PANIC: could not >>>> write to >>>> log file 2, segment 53 at offset 7864320, length 8192: Input/ >>>> output error >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad4 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad6 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad8 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad10 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad12 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad14 offset=3665128448 size=22016 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad16 offset=3665128448 size=21504 >>>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>>> path=/dev/ad18 offset=3665128448 size=21504 >>>> Dec 12 16:49:53 nsa root: ZFS: zpool I/O failure, zpool=vault >>>> error=86 >>>> Dec 12 16:49:53 nsa postgres[50596]: [1-1] FATAL: the database >>>> system >>>> is starting up >>>> Dec 12 16:49:53 nsa kernel: pid 50527 (postgres), uid 70: exited on >>>> signal 6 (core dumped) >>>> -=- >>>> >>>> It basically corrupts the container from the inside until it fails >>>> completely (usually withing 24-48 hours depending on how busy >>>> the db is) >>>> >>>> I had thought it was a bad SATA replicator/controller, but we >>>> had that >>>> replaced w/ one from Supermicro. So it's either the disks, or >>>> something >>>> in ZFS. Anyone used ZFS to backend any db's (mysql or pgsql?) >>>> >>>> If you need more info, let me know... >>>> >>>> >>> Try turning of zil, whilst I don't use a db, I have zfs under >>> high load. I've found without zil turned off I see checksum >>> corruption as well: >>> >>> /boot/loader.conf >>> >>> vfs.zfs.zil_disable=1 >>> >>> Cheers, >>> Benjamin >> >> Wouldn't it be a bad idea to disable ZIL ? >> >> http://www.solarisinternals.com/wiki/index.php/ >> ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29 > > A good read is: > > http://blogs.sun.com/perrin/entry/the_lumberjack > > Which shows why zil exists. > > Cheers, > Benjamin So does anybody know of a battery backed NVRAM card that can be used with FreeBSD that the ZIL could be offloaded to? -- DaveD --Apple-Mail-1-203849357--