Date: Wed, 3 Sep 2014 11:07:20 +1000 From: Paul Koch <paul.koch@akips.com> To: freebsd-stable@freebsd.org Subject: Re: 10.0 interaction with vmware Message-ID: <20140903110720.2bd1b373@akips.com> In-Reply-To: <ltpuji$lus$1@ger.gmane.org> References: <20140826171657.0c79c54d@akips.com> <ltpuji$lus$1@ger.gmane.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 29 Aug 2014 15:18:32 +0200 Ivan Voras <ivoras@freebsd.org> wrote: > On 26/08/2014 09:16, Paul Koch wrote: > > > How does this work actually work ? Does it only take back what > > FreeBSD considers to be "free" memory or can the host start taking > > back "inactive", "wired", "zfs arc" memory ? We tend to rely on > > stuff being in inactive and zfs arc. If we start swapping, we > > are dead. > > Under memory pressure, VMWare's Balooning will cause internal FreeBSD's > "memory low" triggers to fire, which will release ARC memory, which will > probably degrade your performance. But from what I've seen, for some > reason, it's pretty hard to actually see the VMWare host activate > balooning, at least on FreeBSD servers. I've been using this combination > for years and I only saw it once, for a trivial amount of memory. It's > probably a last-resort measure. Yer, releasing ARC memory would be tragic because it would already contain useful data for us and going back to disk/SAN would be a hit. We do set limits on the ARC size on the install because it appears to be very "aggressive" at consuming memory. We also constantly monitor/graph memory usage, so the customer can get some idea of what is happening on their FreeBSD VM. eg. http://www.akips.com/gz/downloads/sys-graph.html http://www.akips.com/gz/downloads/poller-graph.html On that machine, ARC has been limited to ~2G, and it appears to always hover around there. If ballooning was turned on and memory was tight enough to cause ARC to drop, at least they'd be able to go back in time and see that something tragic happened. > Also, VMWare will manage guest memory even without any guest software at > all. It keeps track of recently active memory pages and may swap the > unused ones out. In computing time, how long is "recently" ??? We have very few running processes, and a handful of largish mmap'ed files. Most of the mmap'ed files are read ~40 times a second, so we'd assume that they are always "recently" active. Our largest mmap'ed file is only written to once a minute with every polled statistic. Every memory page updated, but once a minute may not be considered "recently" in computing time. If ballooning caused paging out of that mmap'ed file, we'd be toast. > FWIW, I think ZFS's crazy memory footprint makes it unsuitable for VMs > (or actually most non-file-server workflows...), but I'm sure most > people here will not agree with me :D If you have the opportunity to try > it out in production, just run a regular UFS2+SU in your VM for a couple > of days and see the difference. We actually started out with UFS2+SU on our data partition, but wanted a FreeBSD install configuration of "one size fits all" that would work ok on bare metal and a VM. We have zero control on of the platform the customer uses - ranging from a throw away old desktop PC to high end dedicated bare metal, or in a VM in the data centre. Since we are mostly CPU bound, ZFS doesn't appear to be a performance problem for us in a VM. On a side note, one of the reasons why switched to ZFS is because we "thought" we had a data corruption problem with UFS2 when shutting down. It took a while to discover what we were doing wrong. Doh!! At shutdown, running on physical hardware or in a VM, we'd get to "All Buffers Synced" and the machine would hang for ages before powering off or rebooting. When it came back up, the file system was dirty, and wasn't umounted properly. Googling for 'all buffers synced' came up with various issues related to USB. But, what was happening was... we have largish mmap'ed files (eg. 2G), which we mmap with the MAP_NOSYNC flag. The memory pages are being written to constantly, and we fsync() them every 600 seconds so we can control the time when the disk write occurs. It appears the fsync writes out the entire mmap'ed file sequentially because a quick calc on the file size and raw disk write speed generally matches. But at shutdown, we were forgetting to do a final fsync on those big files, which meant that the OS had to write them out. That doesn't appear to occur until after the "all buffers synced" message though. On real hardware, it just looks like the machine has hung, but did notice the disk led hard on. Running in a VirtualBox VM, at shutdown we ran gstat/systat on the FreeBSD host, which showed the disk stuck in 100% for ages and ages after the "all buffers synced" message. It was taking so long that the VM was being killed ungracefully by the shutdown scripts. We use MAP_NOSYNC because without it, the default sync'ing behaviour on large mmap'ed files sucks. It seems the shutdown behaviour is similar or much worse. The problem on physical hardware was no obvious messages of what the machine was doing after the "all buffers synced" message! Now we just do a fsync(1) of every mmap'ed file in our shutdown script, and the machine shuts down clean and fast. Paul. -- Paul Koch | Founder, CEO AKIPS Network Monitor http://www.akips.com Brisbane, Australia
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140903110720.2bd1b373>