Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Apr 2011 09:43:47 +0930
From:      "Daniel O'Connor" <doconnor@gsoft.com.au>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        freebsd-stable List <freebsd-stable@freebsd.org>
Subject:   Re: ZFS vs OSX Time Machine
Message-ID:  <AF725CFF-86A4-4D65-A26E-496F6B9BD33E@gsoft.com.au>
In-Reply-To: <20110428195601.GA31807@icarus.home.lan>
References:  <537A8F4F-A302-40F9-92DF-403388D99B4B@gsoft.com.au> <20110428195601.GA31807@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help

On 29/04/2011, at 5:26, Jeremy Chadwick wrote:
>> I have the following ZFS related tunables
>>=20
>> vfs.zfs.arc_max=3D"3072M"
>> vfs.zfs.prefetch_disable=3D"1"=20
>> vfs.zfs.txg.timeout=3D5
>> vfs.zfs.cache_flush_disable=3D1
>=20
> Are the last two actually *working* in /boot/loader.conf?  Can you
> verify by looking at them via sysctl?  AFAIK they shouldn't work, =
since
> they lack double-quotes around the values.  Parsing errors are =
supposed
> to throw you back to the loader prompt.  See loader.conf(5) for the
> syntax.

Yep, they're working :)

> I'm also not sure why you're setting cache_flush_disable at all.

I think I was wondering if it would help the abysmal write performance =
of these disks..

>> Any help appreciated, thanks :)
>=20
> Others seem to be battling stating that "NFS doesn't work for TM", but
> that isn't what you're complaining about.  You're complaining that
> FreeBSD with ZFS + NFS performs extremely poorly when trying to do
> backups from an OS X client using TM (writing to the NFS mount).

Yes, and also TM is over AFP not NFS (I forgot to mention that..)

> I have absolutely no experience with TM or OS X, so if it's actually a
> client-level problem (which I'm doubting) I can't help you there.
>=20
> Just sort of a ramble here at different things...
>=20
> It would be useful to provide ZFS ARC sysctl data from the FreeBSD
> system where you're seeing performance issues.  "sysctl -a
> kstat.zfs.misc.arcstats" should suffice.

kstat.zfs.misc.arcstats.hits: 236092077
kstat.zfs.misc.arcstats.misses: 6451964
kstat.zfs.misc.arcstats.demand_data_hits: 98087637
kstat.zfs.misc.arcstats.demand_data_misses: 1220891
kstat.zfs.misc.arcstats.demand_metadata_hits: 138004440
kstat.zfs.misc.arcstats.demand_metadata_misses: 5231073
kstat.zfs.misc.arcstats.prefetch_data_hits: 0
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 0
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 0
kstat.zfs.misc.arcstats.mru_hits: 15041670
kstat.zfs.misc.arcstats.mru_ghost_hits: 956048
kstat.zfs.misc.arcstats.mfu_hits: 221050407
kstat.zfs.misc.arcstats.mfu_ghost_hits: 3269042
kstat.zfs.misc.arcstats.allocated: 15785717
kstat.zfs.misc.arcstats.deleted: 4690878
kstat.zfs.misc.arcstats.stolen: 4990300
kstat.zfs.misc.arcstats.recycle_miss: 2142423
kstat.zfs.misc.arcstats.mutex_miss: 518
kstat.zfs.misc.arcstats.evict_skip: 2251705
kstat.zfs.misc.arcstats.evict_l2_cached: 0
kstat.zfs.misc.arcstats.evict_l2_eligible: 470396116480
kstat.zfs.misc.arcstats.evict_l2_ineligible: 2048
kstat.zfs.misc.arcstats.hash_elements: 482679
kstat.zfs.misc.arcstats.hash_elements_max: 503063
kstat.zfs.misc.arcstats.hash_collisions: 19593315
kstat.zfs.misc.arcstats.hash_chains: 116103
kstat.zfs.misc.arcstats.hash_chain_max: 16
kstat.zfs.misc.arcstats.p: 1692798721
kstat.zfs.misc.arcstats.c: 3221225472
kstat.zfs.misc.arcstats.c_min: 402653184
kstat.zfs.misc.arcstats.c_max: 3221225472
kstat.zfs.misc.arcstats.size: 3221162968
kstat.zfs.misc.arcstats.hdr_size: 103492088
kstat.zfs.misc.arcstats.data_size: 2764591616
kstat.zfs.misc.arcstats.other_size: 353079264
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_read_bytes: 0
kstat.zfs.misc.arcstats.l2_write_bytes: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 19
kstat.zfs.misc.arcstats.l2_write_trylock_fail: 0
kstat.zfs.misc.arcstats.l2_write_passed_headroom: 0
kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0
kstat.zfs.misc.arcstats.l2_write_in_l2: 0
kstat.zfs.misc.arcstats.l2_write_io_in_progress: 0
kstat.zfs.misc.arcstats.l2_write_not_cacheable: 1
kstat.zfs.misc.arcstats.l2_write_full: 0
kstat.zfs.misc.arcstats.l2_write_buffer_iter: 0
kstat.zfs.misc.arcstats.l2_write_pios: 0
kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 0

> You should also try executing "zpool iostat -v 1" during the TM backup
> to see if there's a particular device which is behaving poorly.  There
> have been reports of ZFS pools behaving poorly when a single device
> within the pool has slow I/O (e.g. 5 hard disks, one of which has
> internal issues, resulting in the entire pool performing horribly).  =
You
> should let this run for probably 60-120 seconds to get an idea.  Given
> your parameters above (assuming vfs.zfs.txg.timeout IS in fact 5!), =
you
> should see "bursts" of writes every 5 seconds.

OK.

> I know that there are some things on ZFS that perform badly overall.
> Anything that involves excessive/large numbers of files (not file =
sizes,
> but actual files themselves) seems to perform not-so-great with ZFS.
> For example, Maildir on ZFS =3D piss-poor performance.  There are ways =
to
> work around this issue (if I remember correctly, by adding a dedicated
> "log" device to your ZFS pool, but be aware your log devices need to
> be reliable (if you have a single log device and it fails the entire
> pool is damaged, if I remember right)), but I don't consider it
> feasible.  So if TM is creating tons of files on the NFS mount (backed
> by ZFS), then I imagine the performance isn't so great.

Hmm, the sparse disk image does have ~80000 files in a single =
directory..

> Could you please provide the following sysctl values?  Thanks.
>=20
> kern.maxvnodes
> kern.minvnodes
> vfs.freevnodes
> vfs.numvnodes

kern.maxvnodes: 204477
kern.minvnodes: 51119
vfs.freevnodes: 51118
vfs.numvnodes: 66116

> If the FreeBSD machine has a wireless card in it, if at all possible
> could you try ruling that out by hooking up wired Ethernet instead?
> It's probably not the cause, but worth trying anyway.  If you have a
> home router or something doing 802.11, don't bother with this idea.

The FreeBSD box is wired, although it's using an re card as the em card =
died(!!).

The OSX box is connected via an Airport Express (11n).

> Next, you COULD try using Samba/CIFS on the FreeBSD box to see if you
> can narrow the issue down to bad NFS performance.  Please see this =
post
> of mine about tuning Samba on FreeBSD (backed by ZFS) to get extremely
> good performance.  Many people responded and said their performance
> drastically improved (you can see the thread yourself).  The trick is
> AIO.  You can ignore the part about setting vm.kmem_size in =
loader.conf;
> that advice is now old/deprecated (does not pertain to you given the
> date of your kernel), and vfs.zfs.txg.write_limit_override is =
something
> you shouldn't mess with unless absolutely needed to leave it default:
>=20
> =
http://lists.freebsd.org/pipermail/freebsd-stable/2011-February/061642.htm=
l

OK. I don't think TM can use CIFS, I will try ISCSI as someone else =
suggested, perhaps it will help.

> Finally, when was the last time this FreeBSD machine was rebooted?  =
Some
> people have seen horrible performance that goes away after a reboot.
> There's some speculation that memory fragmentation has something to do
> with it.  I simply don't know.  I'm not telling you to reboot the box
> (please don't; it would be more useful if it could be kept up in case
> folks want to do analysis of it).

I think performance does improve after a reboot :(

top looks like..
last pid: 16112;  load averages:  0.24,  0.22,  0.23                     =
up 8+16:11:50  09:43:19
653 processes: 1 running, 652 sleeping
CPU:  3.6% user,  0.0% nice,  3.4% system,  0.6% interrupt, 92.5% idle
Mem: 1401M Active, 578M Inact, 4143M Wired, 4904K Cache, 16M Buf, 1658M =
Free
Swap: 4096M Total, 160M Used, 3936M Free, 3% Inuse

although free does go down very low (~250MB) at times.

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C









Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AF725CFF-86A4-4D65-A26E-496F6B9BD33E>