From owner-freebsd-current@FreeBSD.ORG Thu Dec 13 04:22:37 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2B34016A417 for ; Thu, 13 Dec 2007 04:22:37 +0000 (UTC) (envelope-from Benjamin.Close@clearchain.com) Received: from ipmail05.adl2.internode.on.net (ipmail05.adl2.internode.on.net [203.16.214.145]) by mx1.freebsd.org (Postfix) with ESMTP id 94FFD13C442 for ; Thu, 13 Dec 2007 04:22:36 +0000 (UTC) (envelope-from Benjamin.Close@clearchain.com) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq4HAGc/YEd5LSgsWmdsb2JhbACBWo4JASCBOw X-IronPort-AV: E=Sophos;i="4.24,160,1196602200"; d="scan'208";a="16393155" Received: from ppp121-45-40-44.lns10.adl2.internode.on.net (HELO mail.clearchain.com) ([121.45.40.44]) by ipmail05.adl2.internode.on.net with ESMTP; 13 Dec 2007 14:52:34 +1030 Received: from benjamin-closes-powerbook-g4-12.local (wcl.ml.unisa.edu.au [130.220.166.5]) (authenticated bits=0) by mail.clearchain.com (8.13.8/8.13.8) with ESMTP id lBD4MQ4h068911 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 13 Dec 2007 14:52:32 +1030 (CST) (envelope-from Benjamin.Close@clearchain.com) Message-ID: <4760B444.1080604@clearchain.com> Date: Thu, 13 Dec 2007 14:55:40 +1030 From: Benjamin Close User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Hugo Silva References: <47606C09.2070209@isc.org> <47609F0A.7010805@clearchain.com> <47609FE3.8040606@barafranca.com> In-Reply-To: <47609FE3.8040606@barafranca.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV version 0.91.2, clamav-milter version 0.91.2 on pegasus.clearchain.com X-Virus-Status: Clean X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (mail.clearchain.com [192.168.154.1]); Thu, 13 Dec 2007 14:52:32 +1030 (CST) Cc: freebsd-current@FreeBSD.ORG Subject: Re: ZFS melting under postgres... X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Dec 2007 04:22:37 -0000 Hugo Silva wrote: > Benjamin Close wrote: >> Peter Losher wrote: >>> Hi, >>> >>> As part of our testing 7.0/ZFS we tried putting it thru it's paces >>> having ZFS act as our storage medium for some test pgsql db's (like for >>> sqlgrey, etc) and in both BETA2 and BETA4 (amd64) we get the same >>> results with a RAIDZ2 container: >>> >>> -=- >>> Dec 12 14:24:12 nsa sqlgrey: fatal: setconfig error at >>> /usr/local/sbin/sqlgrey line 186. >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad4 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad6 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad8 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad10 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad12 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad14 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad16 offset=3665128448 size=21504 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad18 offset=3665128448 size=21504 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad4 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad6 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad8 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad10 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad12 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad14 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad16 offset=3665128448 size=21504 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad18 offset=3665128448 size=21504 >>> Dec 12 16:49:53 nsa root: ZFS: zpool I/O failure, zpool=vault error=86 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad4 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad6 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad8 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad10 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad12 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad14 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad16 offset=3665128448 size=21504 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad18 offset=3665128448 size=21504 >>> Dec 12 16:49:53 nsa postgres[50527]: [5-1] PANIC: could not write to >>> log file 2, segment 53 at offset 7864320, length 8192: Input/output >>> error >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad4 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad6 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad8 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad10 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad12 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad14 offset=3665128448 size=22016 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad16 offset=3665128448 size=21504 >>> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault >>> path=/dev/ad18 offset=3665128448 size=21504 >>> Dec 12 16:49:53 nsa root: ZFS: zpool I/O failure, zpool=vault error=86 >>> Dec 12 16:49:53 nsa postgres[50596]: [1-1] FATAL: the database system >>> is starting up >>> Dec 12 16:49:53 nsa kernel: pid 50527 (postgres), uid 70: exited on >>> signal 6 (core dumped) >>> -=- >>> >>> It basically corrupts the container from the inside until it fails >>> completely (usually withing 24-48 hours depending on how busy the db >>> is) >>> >>> I had thought it was a bad SATA replicator/controller, but we had that >>> replaced w/ one from Supermicro. So it's either the disks, or >>> something >>> in ZFS. Anyone used ZFS to backend any db's (mysql or pgsql?) >>> >>> If you need more info, let me know... >>> >>> >> Try turning of zil, whilst I don't use a db, I have zfs under high >> load. I've found without zil turned off I see checksum corruption as >> well: >> >> /boot/loader.conf >> >> vfs.zfs.zil_disable=1 >> >> Cheers, >> Benjamin > > Wouldn't it be a bad idea to disable ZIL ? > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29 > A good read is: http://blogs.sun.com/perrin/entry/the_lumberjack Which shows why zil exists. Cheers, Benjamin