Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 Feb 2012 12:06:03 -0800
From:      Dennis Glatting <dg17@penx.com>
To:        Michael Shuey <shuey@fmepnet.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS size reduced, 100% full, on fbsd9 upgrade
Message-ID:  <1329595563.42839.28.camel@btw.pki2.com>
In-Reply-To: <CAELRr5kPXjqTooLbjPC1oPB3e2TfRC=eE%2Bzvsu-tW54Pz42xFg@mail.gmail.com>
References:  <CAELRr5kPXjqTooLbjPC1oPB3e2TfRC=eE%2Bzvsu-tW54Pz42xFg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I'm not a ZFS wiz but...


On Sat, 2012-02-18 at 10:25 -0500, Michael Shuey wrote:
> I'm upgrading a server from 8.2p6 to 9.0-RELEASE, and I've tried both
> make in the source tree and freebsd-update and I get the same strange
> result.  As soon as I boot to the fbsd9 kernel, even booting into
> single-user mode, the pool's size is greatly reduced.  All filesystems
> show 100% full (0 bytes free space), nothing can be written to the
> pool (probably a side-effect of being 100% full), and dmesg shows
> several of "Solaris: WARNING: metaslab_free_dva(): bad DVA
> 0:5978620460544" warnings (with different numbers).  Switching kernels
> back to the 8.2p6 kernel restores things to normal, but I'd really
> like to finish my fbsd9 upgrade.
> 
> The system is a 64-bit Intel box with 4 GB of memory, and 8 disks in a
> raidz2 pool called "pool".  It's booted to the 8.2p6 kernel now, and
> scrubbing the pool, but last time I did this (roughly a week ago) it
> was fine.  / is a gmirror, but /usr, /tmp, and /var all come from the
> pool.  Normally, the pool has 1.2 TB of free space, and is version 15
> (zfs version 4).  Some disks are WD drives, with 4k native sectors,
> but some time ago I rebuilt the pool to use a native 4k sector size
> (ashift=12).
> 

I believe 4GB of memory is the minimum. More is better. When you use the
minimum of anything, expect dodginess.

You should upgrade your pool -- bug fixes and all that.

Are all the disks 4k sectors? I found that a mix of 512 and 4k work but
performance is best when they are all the same. I have also found 512
emulation isn't a believable choice when looking at performance (i.e.,
set for 4k).

Different people have different opinions but I personally do not use ZFS
for the OS, rather I RAID1 the OS. The question you have to ask is
if /usr goes kablewie whether you have he skills to put it back
together. I do not, so "simple" (i.e., hardware RAID1) for the OS is
good for me -- it isn't the OS that's being worked in my setups, rather
the data areas. 


> Over time, I've been slowly replacing disks (1 at a time) to increase
> the free space in the pool.  Also, the system experienced severe
> failure recently; the power supply blew, and took out the memory (and
> presumably motherboard).  I replaced these last week with known-good
> board/memory/processor/PS, and it's been running fine since.
> 

Expect mixed results with mixed disks, at least from my experience,
particularly when it comes to performance.

Is the MB the same? I have had mixed results. I find the Gigabyte boards
work well but ASUS dodgy when it comes to high interrupt handling.
Server boards with ECC memory are the most reliable.


> Any suggestions?  Is it possible I've got some nasty pool corruption
> going on - and if so, how do I go about fixing it?  Any advice would
> be appreciated.  This is a backup server, so I could rebuild its
> contents from the primary, but I'd rather fix it if possible (since I
> want to do a fbsd9 upgrade on the primary next).

I screw around with my set ups. What I found is rebuilding the pool
(when I screw it up) is the least troublesome approach.

Recently I found a tray bad on one of my servers. Drove me nuts for two
weeks. It could be a loose cable, or bad cable, or crimped cable, but I
am not yet in the position to open the case. Most of my ZFS weirdnesses
have been hardware related.

It could be your blowout impacted your disks or wiring. Do you SMART? I
found, generally, SMART is goodness but I presently have a question mark
when it comes to the Hitachi 4TB disks (I misbehaved on that system so
then issue could be my own; however on another system there wasn't any
errors).

I have found, when I have multiple, identical controllers, that the same
firmware across the controllers is a good approach, otherwise weirdness
and different MBs manifest this problem in different ways. Also, make
sure your MB's BIOS is recent.

YMMV







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1329595563.42839.28.camel>