Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Apr 2011 11:44:06 -0400
From:      Pierre Lamy <pierre@userid.org>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        freebsd-fs@freebsd.org, Volodymyr Kostyrko <c.kworr@gmail.com>
Subject:   Re: ZFS v28 for 8.2-STABLE
Message-ID:  <4DBC2E46.9060404@userid.org>
In-Reply-To: <20110430001524.GA58845@icarus.home.lan>
References:  <4DB8EF02.8060406@bk.ru> <ipf6i6$54v$1@dough.gmane.org> <20110430001524.GA58845@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 4/29/2011 8:15 PM, Jeremy Chadwick wrote:
> On Fri, Apr 29, 2011 at 11:20:21PM +0300, Volodymyr Kostyrko wrote:
>> 28.04.2011 07:37, Ruslan Yakovlev wrote:
>>> Does actually patch exist for 8.2-STABLE ?
>>> I probe
>>> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20110317.patch.xz
>>>
>>> Building failed with:
>>> can't cd to /usr/src/cddl/usr.bin/zstreamdump
>>> Also sys/cddl/compat/opensolaris/sys/sysmacros.h failed to patch.
>>>
>>> Current FreeBSD 8.2-STABLE #35 Mon Apr 18 03:40:38 EEST 2011 i386
>>> periodically frozen on high load like backup by rsync or find -sx ...
>>> (from default cron tasks).
>> Well ZFSv28 should be very close to STABLE for now?
>>
>> http://lists.freebsd.org/pipermail/freebsd-current/2011-February/023152.html
> It's now a matter of opinion.  The whole idea of ZFSv28 being committed
> to HEAD was to be tested.  I haven't seen any indication of a progress
> report provided for anything on HEAD that pertains to ZFSv28, have you?
>
> Furthermore, the FreeBSD Quarterly Status Report just came out on 04/27
> for the months of January-March (almost a 2 month delay, sigh):
>
> 1737     04/27 10:58  Daniel Gerzo        ( 41K) FreeBSD Status Report January-March, 2011
>
> http://www.freebsd.org/news/status/report-2011-01-2011-03.html
>
> Which states that ZFSv28 is "now available in CURRENT", which we've
> known for months:
>
> http://www.freebsd.org/news/status/report-2011-01-2011-03.html#ZFSv28-available-in-FreeBSD-9-CURRENT
>
> But again, no progress report, so nobody except those who follow
> HEAD/CURRENT know what the progress is.  And that progress has not been
> relayed to any of the non-HEAD/CURRENT lists.
>
> I'm a total hard-ass about this stuff, and have been for years, because
> it all boils down to communication (or lack there-of).  It seems very
> hasty to say "Yeah! MFC this!" when we (folks who only follow STABLE)
> have absolutely no idea if what's in CURRENT is actually broken in some
> way or if there are outstanding problems -- and if there are, what those
> are so users can be aware of them in advance.
>

Hello,

Here's a summary of my recent end-user work with ZFS on -current. I 
recently was lucky enough to purchase 2 NAS systems, which consist of 2 
cheap new PCs loaded with 6 HD, one is a simple gpt boot device 1x 1tb 
and 5x 2tb data drives. The mobo has 6 sata connectors but I needed to 
purchase an additional PCI-E sata adapter since the DVD also uses a sata 
port. The system has 4gb memory and a new inexpensive quad core AMD CPU.

I've been running it (recent -current) for a couple of weeks with heavy 
single-user use. 2.5tb/7.1tb.

The only problem I found, was that deleting a file-backed log device 
from a degraded pool would immediately panic the system. I'm not running 
stock -current so I didn't report it.

Resilvering seems absurdly slow, but since I won't be doing it much also 
didn't care. My NAS is side by side redundant, so if resilvering takes 
more than 2 days I would just replicate off of my other NAS.

Throughput without a log device was in the range of 30mb/sec (3% of my 
1gb interface). Adding a file-backed log device on a UFS partition that 
is used for boot, resulted in a 10x jump, saturating the SATA bus that I 
was sending data from over the network. It spiked up to 30% of interface 
throughput/max bus speed for disk, and did not vary much. This resolved 
the issues I saw that a lot of other people have posted about on the 
internet, about very spiky data transfers. I first used a 40mb/sec 
throughput USB device as the log device, which showed a dramatic 
smoothness in data transfer, but still had ~15 seconds where no data 
would xfer, while it was flushed from USB to disk. After researching I 
discovered that I could use a file backed log device and this fixed all 
the problems about spiky data transfers.

Before that I had tuned the sysctl's as the poor out of the box settings 
were giving me very slow speeds (in the range of 1% network throughput, 
before log device). I played around with the vfs.zfs tunables but found 
that I did not need to after I added the log device, and the out of the 
box settings for that sysctl tree were just fine.

I had first set this up before CAM was added to -current as default, and 
did not use labels. Due to troubleshooting some unrelated disk issues, I 
ended up switching to CAM without problems, and subsequently labeled the 
disks (recreated the zpool after the labeling). I am now using CAM and 
AHCI without any issues.

Here are some personal notes about the tunables I set, I am sure they 
are not all helpful. I didn't add them one by one, I simply mass changed 
them and saw a positive result. Also noted are the commands I used and 
current system status.

sysctl -w net.inet.tcp.sendspace=373760
sysctl -w net.inet.tcp.recvspace=373760
sysctl -w net.local.stream.sendspace=82320
sysctl -w net.local.stream.recvspace=82320
sysctl -w vfs.zfs.prefetch_disable=1
sysctl -w net.local.stream.recvspace=373760
sysctl -w net.local.stream.sendspace=373760
sysctl -w net.local.inflight=1
sysctl -w net.inet.tcp.ecn.enable=1
sysctl -w net.inet.flowtable.enable=0
sysctl -w net.raw.recvspace=373760
sysctl -w net.raw.sendspace=373760
sysctl -w net.inet.tcp.local_slowstart_flightsize=10
sysctl -a net.inet.tcp.delayed_ack=0
sysctl -w kern.maxvnodes=600000
sysctl -w net.local.dgram.recvspace=8192
sysctl -w net.local.dgram.maxdgram=8192
sysctl -w net.inet.tcp.slowstart_flightsize=10
sysctl -w net.inet.tcp.path_mtu_discovery=0

<root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada0 /dev/ada0
<root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada1 /dev/ada1
<root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada3 /dev/ada3
<root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada4 /dev/ada4
<root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada5 /dev/ada5

Labels so that later I will be able to more easily identify disks. My 
mobo has a single ata bus slave port for SATA. That disk would 
"disappear" from the box. Moving the drive to a master sata port 
resolved the issue (? very odd).

gnop create -S 4096 /dev/label/g_ada0
mkdir /var/preserve/zfs
dd if=/dev/zero of=/var/preserve/zfs/log_device bs=1m count=5000
  zpool create -f tank raidz /dev/label/g_ada0.nop /dev/label/g_ada1 
/dev/label/g_ada3 /dev/label/g_ada4 /dev/label/g_ada5 log 
/var/preserve/zfs/log_device

The 4 above lines are to set the alignment to 4kb, to create a file 
backed log device, and create the pool.

zfs set atime=off tank

I decided not to use dedup, because my files don't have a lot of dup. 
They're mostly large media files, ISOs etc.

<root.wheel@zfs-slave> [/var/preserve/root] # zpool status
   pool: tank
  state: ONLINE
  scan: none requested
config:

         NAME                            STATE     READ WRITE CKSUM
         tank                            ONLINE       0     0     0
           raidz1-0                      ONLINE       0     0     0
             label/g_ada0                ONLINE       0     0     0
             label/g_ada1                ONLINE       0     0     0
             label/g_ada3                ONLINE       0     0     0
             label/g_ada4                ONLINE       0     0     0
             label/g_ada5                ONLINE       0     0     0
         logs
           /var/preserve/zfs/log_device  ONLINE       0     0     0

errors: No known data errors
<root.wheel@zfs-slave> [/var/preserve/root] #

<root.wheel@zfs-slave> [/var/preserve/root] # df
Filesystem          Size    Used   Avail Capacity  Mounted on
/dev/gpt/pyros-a    9.7G    3.3G    5.6G    37%    /
/dev/gpt/pyros-c    884G    6.1G    808G     1%    /var
tank                7.1T    2.5T    4.6T    35%    /tank
<root.wheel@zfs-slave> [/var/preserve/root] #


ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada1 at ahcich2 bus 0 scbus3 target 0 lun 0
ada1: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada2 at ahcich3 bus 0 scbus4 target 0 lun 0
ada2: <ST31000520AS CC32> ATA-8 SATA 2.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada3 at ahcich4 bus 0 scbus5 target 0 lun 0
ada3: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada4 at ahcich5 bus 0 scbus6 target 0 lun 0
ada4: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada5 at ata1 bus 0 scbus8 target 0 lun 0
ada5: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
ada5: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)

CPU: AMD Phenom(tm) II X4 920 Processor (2800.19-MHz K8-class CPU)
...
real memory  = 4294967296 (4096 MB)
avail memory = 3840598016 (3662 MB)

ZFS filesystem version 5
ZFS storage pool version 28


Best practices:

Tune the sysctls related to buffer sizes / queue depth.
Label your disks before you build the zpool.
Use gnop to 4kb align the disks. Only one disk in the pool needs this 
before you create it.
Use CAM.
*** USE A LOG DEVICE! ***

-Pierre









Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DBC2E46.9060404>