Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 Jul 2013 13:56:44 -0700
From:      Freddie Cash <fjwcash@gmail.com>
To:        Jeremy Chadwick <jdc@koitsu.org>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: Slow resilvering with mirrored ZIL
Message-ID:  <CAOjFWZ7dr45twGCP=9_6wTy_RewVJsoK__KCVDtLYEQ3RAVeDQ@mail.gmail.com>
In-Reply-To: <20130704191203.GA95642@icarus.home.lan>
References:  <CABBFC07-68C2-4F43-9AFC-920D8C34282E@unixconn.com> <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <E5CCC8F551CA4627A3C7376AD63A83CC@multiplay.co.uk> <CBCA1716-A3EC-4E3B-AE0A-3C8028F6AACF@alumni.chalmers.se> <20130704000405.GA75529@icarus.home.lan> <C8C696C0-2963-4868-8BB8-6987B47C3460@alumni.chalmers.se> <20130704171637.GA94539@icarus.home.lan> <2A261BEA-4452-4F6A-8EFB-90A54D79CBB9@alumni.chalmers.se> <20130704191203.GA95642@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 4, 2013 at 12:12 PM, Jeremy Chadwick <jdc@koitsu.org> wrote:

> I believe -- but I need someone else to chime in here with confirmation,
> particularly someone who is familiar with ZFS's internals -- once your
> pool is ashift 12, you can do a disk replacement ***without*** having to
> do the gnop procedure (because the pool itself is already using ashift
> 12).  But again, I need someone to confirm that.
>

Correct.  The ashift property of a vdev is set at creation time and cannot
be changed (AFAIK) without destroying/recreating the pool.  Thus, you can
use gnop to create the vdev with ashift=12, and then just do normal "zpool
replace" or "zpool detach/attach" to replace drives in the vdevs (512b or
4K drives) without gnop.

Haven't read the code :) but I have done many, many drive replacements on
ashift=9 and ashift=12 vdevs and watched what happens via zdb.  :)

The WD10EARS are known for excessively parking their heads, which
> causes massive performance problems with both reads and writes.  This is
> known by PC enthusiasts as the "LCC issue" (LCC = Load Cycle Count,
> referring to SMART attribute 193).
>
> On these drives there are ways to work around this issue -- it
> specifically involves disabling drive-level APM.  To do so, you have to
> initiate a specific ATA CDB to the drive using "camcontrol cmd", and
> this has to be done every time the system reboots.  There is one
> drawback to disabling APM as well: the drives run hotter.
>

On some WD Green drives, depending on the firmware and manufacturing date,
you can use the wdidle3.exe program (via a DOS boot) to set the timeout to
either "disabled" or "15 minutes" which is usually enough to prevent most
of the head-parking wear-out issues.  However, I believe this only worked
up until Dec 2011 or Dec 2012?

We had the misfortune of using 12 of these in a ZFS storage box when they
were first released (2 TB for under $150?  Hell Yeah!  Ooops, you get what
you pay for ...).  We quickly replaced them.

You really need to be running stable/9 if you want to use SSDs with ZFS.
> I cannot stress this enough.  I will not bend on this fact.  I do not
> care if what people have are SLC rather than MLC or TLC -- it doesn't
> matter.  TRIM on ZFS is a downright necessity for long-term reliability
> of an SSD.  Anyway...
>

One can mitigate this a little by leaving 25% of the SSD
unpartitioned/unformatted, thus allowing the background GC process to work
without impacting performance and providing long-term performance that's
close to (but not quite 100%) after-TRIM performance.  Takes a lot of
will-power to leave 8-16-odd GB free on an SSD that cost close to $200,
though.  :)

It's not perfect, it's not as good as using TRIM, but at least it's doable
on FreeBSD pre-9.1-STABLE.

>
> You should probably be made aware of the fact that SSDs need to be
> kept roughly 30-40% unused to get the most benefits out of wear
> levelling.  Once you hit the 20% remaining mark, performance takes a
> hit, and the drive begins hurting more and more.  Low-capacity SSDs
> are therefore generally worthless given the capacity limitation need.
>

Ah, I see you mention what I did above.  :)  Guess that's what I get for
not reading all the way through before starting a reply.  :)

-- 
Freddie Cash
fjwcash@gmail.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOjFWZ7dr45twGCP=9_6wTy_RewVJsoK__KCVDtLYEQ3RAVeDQ>