Date: Sun, 01 May 2011 02:09:18 +0200 From: Martin Matuska <mm@FreeBSD.org> To: Pierre Lamy <pierre@userid.org> Cc: freebsd-fs@freebsd.org, Volodymyr Kostyrko <c.kworr@gmail.com> Subject: Re: ZFS v28 for 8.2-STABLE Message-ID: <4DBCA4AE.3090506@FreeBSD.org> In-Reply-To: <4DBC2E46.9060404@userid.org> References: <4DB8EF02.8060406@bk.ru> <ipf6i6$54v$1@dough.gmane.org> <20110430001524.GA58845@icarus.home.lan> <4DBC2E46.9060404@userid.org>
next in thread | previous in thread | raw e-mail | index | archive | help
We plan to MFC v28. But as this change is quite intrusive to the users, there is no way back if you upgrade your pool (not upgrading bootcode = not able to boot = saved by mfsBSD). It will happen when we think it is stable enough to be in STABLE. As of me, I am not using it in serious production yet (I am very happy with v15 + latest patches), but my development servers with v28 seem pretty stable. I have updated patch to reflect latest changes (grab latest one): http://people.freebsd.org/~mm/patches/zfs/v28/ As to your setup, have you tried using a partition as a log device? File-based devices are generally considered experimental in all ZFS implementations (including Solaris). Dňa 30.04.2011 17:44, Pierre Lamy wrote / napísal(a): > On 4/29/2011 8:15 PM, Jeremy Chadwick wrote: >> On Fri, Apr 29, 2011 at 11:20:21PM +0300, Volodymyr Kostyrko wrote: >>> 28.04.2011 07:37, Ruslan Yakovlev wrote: >>>> Does actually patch exist for 8.2-STABLE ? >>>> I probe >>>> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20110317.patch.xz >>>> >>>> >>>> Building failed with: >>>> can't cd to /usr/src/cddl/usr.bin/zstreamdump >>>> Also sys/cddl/compat/opensolaris/sys/sysmacros.h failed to patch. >>>> >>>> Current FreeBSD 8.2-STABLE #35 Mon Apr 18 03:40:38 EEST 2011 i386 >>>> periodically frozen on high load like backup by rsync or find -sx ... >>>> (from default cron tasks). >>> Well ZFSv28 should be very close to STABLE for now? >>> >>> http://lists.freebsd.org/pipermail/freebsd-current/2011-February/023152.html >>> >> It's now a matter of opinion. The whole idea of ZFSv28 being committed >> to HEAD was to be tested. I haven't seen any indication of a progress >> report provided for anything on HEAD that pertains to ZFSv28, have you? >> >> Furthermore, the FreeBSD Quarterly Status Report just came out on 04/27 >> for the months of January-March (almost a 2 month delay, sigh): >> >> 1737 04/27 10:58 Daniel Gerzo ( 41K) FreeBSD Status Report >> January-March, 2011 >> >> http://www.freebsd.org/news/status/report-2011-01-2011-03.html >> >> Which states that ZFSv28 is "now available in CURRENT", which we've >> known for months: >> >> http://www.freebsd.org/news/status/report-2011-01-2011-03.html#ZFSv28-available-in-FreeBSD-9-CURRENT >> >> >> But again, no progress report, so nobody except those who follow >> HEAD/CURRENT know what the progress is. And that progress has not been >> relayed to any of the non-HEAD/CURRENT lists. >> >> I'm a total hard-ass about this stuff, and have been for years, because >> it all boils down to communication (or lack there-of). It seems very >> hasty to say "Yeah! MFC this!" when we (folks who only follow STABLE) >> have absolutely no idea if what's in CURRENT is actually broken in some >> way or if there are outstanding problems -- and if there are, what those >> are so users can be aware of them in advance. >> > > Hello, > > Here's a summary of my recent end-user work with ZFS on -current. I > recently was lucky enough to purchase 2 NAS systems, which consist of 2 > cheap new PCs loaded with 6 HD, one is a simple gpt boot device 1x 1tb > and 5x 2tb data drives. The mobo has 6 sata connectors but I needed to > purchase an additional PCI-E sata adapter since the DVD also uses a sata > port. The system has 4gb memory and a new inexpensive quad core AMD CPU. > > I've been running it (recent -current) for a couple of weeks with heavy > single-user use. 2.5tb/7.1tb. > > The only problem I found, was that deleting a file-backed log device > from a degraded pool would immediately panic the system. I'm not running > stock -current so I didn't report it. > > Resilvering seems absurdly slow, but since I won't be doing it much also > didn't care. My NAS is side by side redundant, so if resilvering takes > more than 2 days I would just replicate off of my other NAS. > > Throughput without a log device was in the range of 30mb/sec (3% of my > 1gb interface). Adding a file-backed log device on a UFS partition that > is used for boot, resulted in a 10x jump, saturating the SATA bus that I > was sending data from over the network. It spiked up to 30% of interface > throughput/max bus speed for disk, and did not vary much. This resolved > the issues I saw that a lot of other people have posted about on the > internet, about very spiky data transfers. I first used a 40mb/sec > throughput USB device as the log device, which showed a dramatic > smoothness in data transfer, but still had ~15 seconds where no data > would xfer, while it was flushed from USB to disk. After researching I > discovered that I could use a file backed log device and this fixed all > the problems about spiky data transfers. > > Before that I had tuned the sysctl's as the poor out of the box settings > were giving me very slow speeds (in the range of 1% network throughput, > before log device). I played around with the vfs.zfs tunables but found > that I did not need to after I added the log device, and the out of the > box settings for that sysctl tree were just fine. > > I had first set this up before CAM was added to -current as default, and > did not use labels. Due to troubleshooting some unrelated disk issues, I > ended up switching to CAM without problems, and subsequently labeled the > disks (recreated the zpool after the labeling). I am now using CAM and > AHCI without any issues. > > Here are some personal notes about the tunables I set, I am sure they > are not all helpful. I didn't add them one by one, I simply mass changed > them and saw a positive result. Also noted are the commands I used and > current system status. > > sysctl -w net.inet.tcp.sendspace=373760 > sysctl -w net.inet.tcp.recvspace=373760 > sysctl -w net.local.stream.sendspace=82320 > sysctl -w net.local.stream.recvspace=82320 > sysctl -w vfs.zfs.prefetch_disable=1 > sysctl -w net.local.stream.recvspace=373760 > sysctl -w net.local.stream.sendspace=373760 > sysctl -w net.local.inflight=1 > sysctl -w net.inet.tcp.ecn.enable=1 > sysctl -w net.inet.flowtable.enable=0 > sysctl -w net.raw.recvspace=373760 > sysctl -w net.raw.sendspace=373760 > sysctl -w net.inet.tcp.local_slowstart_flightsize=10 > sysctl -a net.inet.tcp.delayed_ack=0 > sysctl -w kern.maxvnodes=600000 > sysctl -w net.local.dgram.recvspace=8192 > sysctl -w net.local.dgram.maxdgram=8192 > sysctl -w net.inet.tcp.slowstart_flightsize=10 > sysctl -w net.inet.tcp.path_mtu_discovery=0 > > <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada0 /dev/ada0 > <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada1 /dev/ada1 > <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada3 /dev/ada3 > <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada4 /dev/ada4 > <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada5 /dev/ada5 > > Labels so that later I will be able to more easily identify disks. My > mobo has a single ata bus slave port for SATA. That disk would > "disappear" from the box. Moving the drive to a master sata port > resolved the issue (? very odd). > > gnop create -S 4096 /dev/label/g_ada0 > mkdir /var/preserve/zfs > dd if=/dev/zero of=/var/preserve/zfs/log_device bs=1m count=5000 > zpool create -f tank raidz /dev/label/g_ada0.nop /dev/label/g_ada1 > /dev/label/g_ada3 /dev/label/g_ada4 /dev/label/g_ada5 log > /var/preserve/zfs/log_device > > The 4 above lines are to set the alignment to 4kb, to create a file > backed log device, and create the pool. > > zfs set atime=off tank > > I decided not to use dedup, because my files don't have a lot of dup. > They're mostly large media files, ISOs etc. > > <root.wheel@zfs-slave> [/var/preserve/root] # zpool status > pool: tank > state: ONLINE > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > label/g_ada0 ONLINE 0 0 0 > label/g_ada1 ONLINE 0 0 0 > label/g_ada3 ONLINE 0 0 0 > label/g_ada4 ONLINE 0 0 0 > label/g_ada5 ONLINE 0 0 0 > logs > /var/preserve/zfs/log_device ONLINE 0 0 0 > > errors: No known data errors > <root.wheel@zfs-slave> [/var/preserve/root] # > > <root.wheel@zfs-slave> [/var/preserve/root] # df > Filesystem Size Used Avail Capacity Mounted on > /dev/gpt/pyros-a 9.7G 3.3G 5.6G 37% / > /dev/gpt/pyros-c 884G 6.1G 808G 1% /var > tank 7.1T 2.5T 4.6T 35% /tank > <root.wheel@zfs-slave> [/var/preserve/root] # > > > ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 > ada0: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device > ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada1 at ahcich2 bus 0 scbus3 target 0 lun 0 > ada1: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device > ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada1: Command Queueing enabled > ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada2 at ahcich3 bus 0 scbus4 target 0 lun 0 > ada2: <ST31000520AS CC32> ATA-8 SATA 2.x device > ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada2: Command Queueing enabled > ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) > ada3 at ahcich4 bus 0 scbus5 target 0 lun 0 > ada3: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device > ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada3: Command Queueing enabled > ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada4 at ahcich5 bus 0 scbus6 target 0 lun 0 > ada4: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device > ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada4: Command Queueing enabled > ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada5 at ata1 bus 0 scbus8 target 0 lun 0 > ada5: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device > ada5: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) > ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > > CPU: AMD Phenom(tm) II X4 920 Processor (2800.19-MHz K8-class CPU) > ... > real memory = 4294967296 (4096 MB) > avail memory = 3840598016 (3662 MB) > > ZFS filesystem version 5 > ZFS storage pool version 28 > > > Best practices: > > Tune the sysctls related to buffer sizes / queue depth. > Label your disks before you build the zpool. > Use gnop to 4kb align the disks. Only one disk in the pool needs this > before you create it. > Use CAM. > *** USE A LOG DEVICE! *** > > -Pierre > > > > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DBCA4AE.3090506>