Date: Wed, 11 Oct 2017 12:08:01 -0600 From: markham breitbach <markham_breitbach@ssimicro.com> To: freebsd-questions@freebsd.org Subject: Re: FreeBSD ZFS file server with SSD HDD Message-ID: <d0c4a978-5fab-ef66-89c0-7ee956ff5b24@ssimicro.com> In-Reply-To: <e99b1b0c-7d8a-90b4-d49b-24a9d8428864@holgerdanske.com> References: <20171011130512.GE24374@apple.rat.burntout.org> <e99b1b0c-7d8a-90b4-d49b-24a9d8428864@holgerdanske.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I ran into some problems of disks choking on heavy IO under VMware. It turned out to be an issue with firmware on the SSDs and backplane in a Dell server. It's probably worth making sure those are all up to date. -M On 2017-10-11 11:30 AM, David Christensen wrote: > On 10/11/17 06:05, Kate Dawson wrote: >> Currently running a FreeBSD NFS server with a zpool comprising >> 12 x 1TB hard disk drives are arranged as pairs of mirrors in a strip >> set ( RAID 10 ) > > That should do 6+ Gb/s. > > > bonnie++ should be able to measure that. (It's been a while, but I > seem to recall that bonnie++ expects raw drives and nukes your data. > So, it could take some effort to use it.) > > https://www.coker.com.au/bonnie++/ > > >> An additional 2x 960GB SSD added. These two SSD are partitioned with a >> small partition begin used for a ZIL log, and larger partion arranged >> for >> L2ARC cache. > > Assuming the ZIL is mirrored, that should do 5+ Gb/s. > > > Assuming the L2ARC is striped, that should do 10+ Gb/s. > > > I dont' know how to test ZIL and L2ARC in isolation, but dbench should > be able to test what ZFS exposes, both locally and over NFS: > > https://dbench.samba.org/ > > >> Additionally the host has 64GB RAM and 16 CPU cores (AMD Opteron 2Ghz) > > That should do 20+ Gb/s. > > > Memtest86+ will be to measure: > > http://www.memtest.org/ > > >> A dataset from the pool is exported via NFS to a number of Debian >> Gnu/Linux hosts running a xen hypervisor. These run several disk image >> based virtual machines >> >> In general use, the FreeBSD NFS host sees very little read IO, which >> is to expected >> as the RAM cache and L2ARC are designed to minimise the amount of >> read load >> on the disks. >> >> However we're starting to see high load ( mostly IO WAIT ) on the Linux >> virtualisation hosts, and virtual machines - with kernel timeouts >> occurring resulting in crashes and instability. >> >> I believe this may be due to the limited number of random write IOPS >> available >> on the zpool NFS export. >> >> I can get sequential writes and reads to and from the NFS server at >> speeds that approach the maximum the network provides ( currently 1Gb/s >> + Jumbo Frames, and I could increase this by bonding multiple >> interfaces together. ) >> >> However day to day usage does not show network utilisation anywhere near >> this maximum. >> >> If I look at the output of `zpool iostat -v tank 1 ` I see that every >> five seconds or so, the numner of write operation go to > 2k >> >> I think this shows that the I'm hitting the limit that the spinning disk >> can provide in this workload. >> >> As a cost effective way to improve this ( rather than replacing the >> whole chassis ) I was considering replacing the 1TB HDD with 1TB SSD, >> for the improved IOPS. >> >> I wonder if there were any opinions within the community here, on >> >> 1. What metrics can I gather to confirm the disk write IO as bottleneck? >> >> 2. If the proposed solution will have the required effect? That is an >> decrease in the IOWAIT on the GNU/Linux virtualization hosts. > > > I infer your network to be: > > - 1 host running FreeBSD (freebsd-version? uname -a?) and an NFS > server (version?). > > - N (how many?) Debian GNU/Linux hosts (/etc/debian-version? uname > -a?), each running a Xen hypervisor (version?) and an NFS client. > > - The VM's are configured to see their drives as local devices (e.g. > the VM's are not running NFS clients connected to the FreeBSD NFS > server). > > - Gigabit switch (make? model?). > > - 1 Gigabit connection between switch and each host. > > > As you have correctly stated, you need visibility on the relevant > performance metrics to make informed decisions. In addition to the > above tools: > > - For networking, I'd try netstat: > > http://netstat.net/ > > - For drive I/O, I use nmon on Debian: > > https://en.wikipedia.org/wiki/Nmon > > - I believe iostat is available on both: > > https://en.wikipedia.org/wiki/Iostat > > - For CPU's, RAM, and swap, I use top. > > https://en.wikipedia.org/wiki/Top_(software) > > - You seem to have found at least one ZFS tool. > > > As others have stated, you will want to ensure that all the pieces are > reasonably in tune -- VM, NFS client, Xen, Debian networking, switch, > FreeBSD networking, NFS server, ZFS, etc.. I'd start by looking for > errors and/or warnings in the usual places (dmesg, /var/log, etc.). I > typically leave the settings at the installer defaults, unless I have > some compelling reason to make a change (at least one reader made a > suggestion). Be sure to keep good notes if you're going to muck with > the settings. > > > As for 'zpool iostat -v tank 1', I suspect ZFS is telling you that it > is flushing writes to the HDD's every five seconds. If flushes always > complete before the next scheduled flush, replacing the HDD's with > SSD's probably will not help with the VM IO WAIT and kernel timeout > problems. But, if the flushes are overrunning each other during peak > usage, you may have found the bottleneck. > > > That said, I suspect that the root cause of the VM IO WAIT and kernel > timeout problems is that the virtual machines need a low latency > connection to their system drives, temporary file systems, and/or swap > devices, and they aren't getting it. I would not bet on NFS to > provide this, even with SSD's instead of HDD's. I would bet on local > resources. I suggest: > > 1. Put 2 mirrored SSD's in each Xen server. > > 2. Put VM system drives on the local SSD mirror. > > 3. Put VM /tmp file systems on the local SSD mirror, or on RAM: > > https://en.wikipedia.org/wiki/Tmpfs > > 4. Put VM swap devices on the local SSD mirror, or on RAM: > > https://en.wikipedia.org/wiki/Zram > > 5. Put VM data drives on NFS. > > > I am unsure if it is better to do the "on RAM" and "on NFS" ideas at > the Xen level or within each VM. Performance is one consideration. > Others considerations are security and accountability -- e.g. do > customers have root on the VM's? > > > To improve NFS performance: > > 1. Enlarging the pipe between the NFS server and the switch -- > bonding (your idea), upgrade to 10 Gb/s, etc.. > > 2. Enlarge the pipes between the Xen hosts and the switch. > > 3. Add NIC's to the NFS server, add switches, and divide up the Xen > hosts across the switches. > > 4. Add NIC's to the NFS server, one per Xen host, and make direct > connections between the NFS server and each Xen host. > > > Please let us know how it goes. :-) > > > David > _______________________________________________ > freebsd-questions@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to > "freebsd-questions-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d0c4a978-5fab-ef66-89c0-7ee956ff5b24>