Date: Thu, 13 Dec 2007 13:25:06 +1030 From: Benjamin Close <Benjamin.Close@clearchain.com> To: Peter Losher <Peter_Losher@isc.org> Cc: FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: ZFS melting under postgres... Message-ID: <47609F0A.7010805@clearchain.com> In-Reply-To: <47606C09.2070209@isc.org> References: <47606C09.2070209@isc.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Peter Losher wrote:
> Hi,
>
> As part of our testing 7.0/ZFS we tried putting it thru it's paces
> having ZFS act as our storage medium for some test pgsql db's (like for
> sqlgrey, etc) and in both BETA2 and BETA4 (amd64) we get the same
> results with a RAIDZ2 container:
>
> -=-
> Dec 12 14:24:12 nsa sqlgrey: fatal: setconfig error at
> /usr/local/sbin/sqlgrey line 186.
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad4 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad6 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad8 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad10 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad12 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad14 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad16 offset=3665128448 size=21504
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad18 offset=3665128448 size=21504
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad4 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad6 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad8 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad10 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad12 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad14 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad16 offset=3665128448 size=21504
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad18 offset=3665128448 size=21504
> Dec 12 16:49:53 nsa root: ZFS: zpool I/O failure, zpool=vault error=86
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad4 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad6 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad8 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad10 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad12 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad14 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad16 offset=3665128448 size=21504
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad18 offset=3665128448 size=21504
> Dec 12 16:49:53 nsa postgres[50527]: [5-1] PANIC: could not write to
> log file 2, segment 53 at offset 7864320, length 8192: Input/output error
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad4 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad6 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad8 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad10 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad12 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad14 offset=3665128448 size=22016
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad16 offset=3665128448 size=21504
> Dec 12 16:49:53 nsa root: ZFS: checksum mismatch, zpool=vault
> path=/dev/ad18 offset=3665128448 size=21504
> Dec 12 16:49:53 nsa root: ZFS: zpool I/O failure, zpool=vault error=86
> Dec 12 16:49:53 nsa postgres[50596]: [1-1] FATAL: the database system
> is starting up
> Dec 12 16:49:53 nsa kernel: pid 50527 (postgres), uid 70: exited on
> signal 6 (core dumped)
> -=-
>
> It basically corrupts the container from the inside until it fails
> completely (usually withing 24-48 hours depending on how busy the db is)
>
> I had thought it was a bad SATA replicator/controller, but we had that
> replaced w/ one from Supermicro. So it's either the disks, or something
> in ZFS. Anyone used ZFS to backend any db's (mysql or pgsql?)
>
> If you need more info, let me know...
>
>
Try turning of zil, whilst I don't use a db, I have zfs under high load.
I've found without zil turned off I see checksum corruption as well:
/boot/loader.conf
vfs.zfs.zil_disable=1
Cheers,
Benjamin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47609F0A.7010805>
