Date: Sat, 30 Apr 2011 11:44:06 -0400 From: Pierre Lamy <pierre@userid.org> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: freebsd-fs@freebsd.org, Volodymyr Kostyrko <c.kworr@gmail.com> Subject: Re: ZFS v28 for 8.2-STABLE Message-ID: <4DBC2E46.9060404@userid.org> In-Reply-To: <20110430001524.GA58845@icarus.home.lan> References: <4DB8EF02.8060406@bk.ru> <ipf6i6$54v$1@dough.gmane.org> <20110430001524.GA58845@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On 4/29/2011 8:15 PM, Jeremy Chadwick wrote: > On Fri, Apr 29, 2011 at 11:20:21PM +0300, Volodymyr Kostyrko wrote: >> 28.04.2011 07:37, Ruslan Yakovlev wrote: >>> Does actually patch exist for 8.2-STABLE ? >>> I probe >>> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20110317.patch.xz >>> >>> Building failed with: >>> can't cd to /usr/src/cddl/usr.bin/zstreamdump >>> Also sys/cddl/compat/opensolaris/sys/sysmacros.h failed to patch. >>> >>> Current FreeBSD 8.2-STABLE #35 Mon Apr 18 03:40:38 EEST 2011 i386 >>> periodically frozen on high load like backup by rsync or find -sx ... >>> (from default cron tasks). >> Well ZFSv28 should be very close to STABLE for now? >> >> http://lists.freebsd.org/pipermail/freebsd-current/2011-February/023152.html > It's now a matter of opinion. The whole idea of ZFSv28 being committed > to HEAD was to be tested. I haven't seen any indication of a progress > report provided for anything on HEAD that pertains to ZFSv28, have you? > > Furthermore, the FreeBSD Quarterly Status Report just came out on 04/27 > for the months of January-March (almost a 2 month delay, sigh): > > 1737 04/27 10:58 Daniel Gerzo ( 41K) FreeBSD Status Report January-March, 2011 > > http://www.freebsd.org/news/status/report-2011-01-2011-03.html > > Which states that ZFSv28 is "now available in CURRENT", which we've > known for months: > > http://www.freebsd.org/news/status/report-2011-01-2011-03.html#ZFSv28-available-in-FreeBSD-9-CURRENT > > But again, no progress report, so nobody except those who follow > HEAD/CURRENT know what the progress is. And that progress has not been > relayed to any of the non-HEAD/CURRENT lists. > > I'm a total hard-ass about this stuff, and have been for years, because > it all boils down to communication (or lack there-of). It seems very > hasty to say "Yeah! MFC this!" when we (folks who only follow STABLE) > have absolutely no idea if what's in CURRENT is actually broken in some > way or if there are outstanding problems -- and if there are, what those > are so users can be aware of them in advance. > Hello, Here's a summary of my recent end-user work with ZFS on -current. I recently was lucky enough to purchase 2 NAS systems, which consist of 2 cheap new PCs loaded with 6 HD, one is a simple gpt boot device 1x 1tb and 5x 2tb data drives. The mobo has 6 sata connectors but I needed to purchase an additional PCI-E sata adapter since the DVD also uses a sata port. The system has 4gb memory and a new inexpensive quad core AMD CPU. I've been running it (recent -current) for a couple of weeks with heavy single-user use. 2.5tb/7.1tb. The only problem I found, was that deleting a file-backed log device from a degraded pool would immediately panic the system. I'm not running stock -current so I didn't report it. Resilvering seems absurdly slow, but since I won't be doing it much also didn't care. My NAS is side by side redundant, so if resilvering takes more than 2 days I would just replicate off of my other NAS. Throughput without a log device was in the range of 30mb/sec (3% of my 1gb interface). Adding a file-backed log device on a UFS partition that is used for boot, resulted in a 10x jump, saturating the SATA bus that I was sending data from over the network. It spiked up to 30% of interface throughput/max bus speed for disk, and did not vary much. This resolved the issues I saw that a lot of other people have posted about on the internet, about very spiky data transfers. I first used a 40mb/sec throughput USB device as the log device, which showed a dramatic smoothness in data transfer, but still had ~15 seconds where no data would xfer, while it was flushed from USB to disk. After researching I discovered that I could use a file backed log device and this fixed all the problems about spiky data transfers. Before that I had tuned the sysctl's as the poor out of the box settings were giving me very slow speeds (in the range of 1% network throughput, before log device). I played around with the vfs.zfs tunables but found that I did not need to after I added the log device, and the out of the box settings for that sysctl tree were just fine. I had first set this up before CAM was added to -current as default, and did not use labels. Due to troubleshooting some unrelated disk issues, I ended up switching to CAM without problems, and subsequently labeled the disks (recreated the zpool after the labeling). I am now using CAM and AHCI without any issues. Here are some personal notes about the tunables I set, I am sure they are not all helpful. I didn't add them one by one, I simply mass changed them and saw a positive result. Also noted are the commands I used and current system status. sysctl -w net.inet.tcp.sendspace=373760 sysctl -w net.inet.tcp.recvspace=373760 sysctl -w net.local.stream.sendspace=82320 sysctl -w net.local.stream.recvspace=82320 sysctl -w vfs.zfs.prefetch_disable=1 sysctl -w net.local.stream.recvspace=373760 sysctl -w net.local.stream.sendspace=373760 sysctl -w net.local.inflight=1 sysctl -w net.inet.tcp.ecn.enable=1 sysctl -w net.inet.flowtable.enable=0 sysctl -w net.raw.recvspace=373760 sysctl -w net.raw.sendspace=373760 sysctl -w net.inet.tcp.local_slowstart_flightsize=10 sysctl -a net.inet.tcp.delayed_ack=0 sysctl -w kern.maxvnodes=600000 sysctl -w net.local.dgram.recvspace=8192 sysctl -w net.local.dgram.maxdgram=8192 sysctl -w net.inet.tcp.slowstart_flightsize=10 sysctl -w net.inet.tcp.path_mtu_discovery=0 <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada0 /dev/ada0 <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada1 /dev/ada1 <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada3 /dev/ada3 <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada4 /dev/ada4 <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada5 /dev/ada5 Labels so that later I will be able to more easily identify disks. My mobo has a single ata bus slave port for SATA. That disk would "disappear" from the box. Moving the drive to a master sata port resolved the issue (? very odd). gnop create -S 4096 /dev/label/g_ada0 mkdir /var/preserve/zfs dd if=/dev/zero of=/var/preserve/zfs/log_device bs=1m count=5000 zpool create -f tank raidz /dev/label/g_ada0.nop /dev/label/g_ada1 /dev/label/g_ada3 /dev/label/g_ada4 /dev/label/g_ada5 log /var/preserve/zfs/log_device The 4 above lines are to set the alignment to 4kb, to create a file backed log device, and create the pool. zfs set atime=off tank I decided not to use dedup, because my files don't have a lot of dup. They're mostly large media files, ISOs etc. <root.wheel@zfs-slave> [/var/preserve/root] # zpool status pool: tank state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 label/g_ada0 ONLINE 0 0 0 label/g_ada1 ONLINE 0 0 0 label/g_ada3 ONLINE 0 0 0 label/g_ada4 ONLINE 0 0 0 label/g_ada5 ONLINE 0 0 0 logs /var/preserve/zfs/log_device ONLINE 0 0 0 errors: No known data errors <root.wheel@zfs-slave> [/var/preserve/root] # <root.wheel@zfs-slave> [/var/preserve/root] # df Filesystem Size Used Avail Capacity Mounted on /dev/gpt/pyros-a 9.7G 3.3G 5.6G 37% / /dev/gpt/pyros-c 884G 6.1G 808G 1% /var tank 7.1T 2.5T 4.6T 35% /tank <root.wheel@zfs-slave> [/var/preserve/root] # ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada1 at ahcich2 bus 0 scbus3 target 0 lun 0 ada1: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada2 at ahcich3 bus 0 scbus4 target 0 lun 0 ada2: <ST31000520AS CC32> ATA-8 SATA 2.x device ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) ada3 at ahcich4 bus 0 scbus5 target 0 lun 0 ada3: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada4 at ahcich5 bus 0 scbus6 target 0 lun 0 ada4: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada4: Command Queueing enabled ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada5 at ata1 bus 0 scbus8 target 0 lun 0 ada5: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device ada5: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) CPU: AMD Phenom(tm) II X4 920 Processor (2800.19-MHz K8-class CPU) ... real memory = 4294967296 (4096 MB) avail memory = 3840598016 (3662 MB) ZFS filesystem version 5 ZFS storage pool version 28 Best practices: Tune the sysctls related to buffer sizes / queue depth. Label your disks before you build the zpool. Use gnop to 4kb align the disks. Only one disk in the pool needs this before you create it. Use CAM. *** USE A LOG DEVICE! *** -Pierre
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DBC2E46.9060404>