Date: Fri, 5 Apr 2013 12:17:27 +0200 From: Joar Jegleim <joar.jegleim@gmail.com> To: freebsd-fs@freebsd.org Subject: Regarding regular zfs Message-ID: <CAFfb-hpt4iKSb0S2fgQ16Hp51KLWJew1Se32yX1cUPYi6pp72g@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi FreeBSD ! I've already sent this one to questions@freebsd.org, but realised this list would be a better option. So I've got this setup where we have a storage server delivering about 2 million jpeg's as a backend for a website ( it's ~1TB of data) The storage server is running zfs and every 15 minutes it does a zfs send to a 'slave', and our proxy will fail over to the slave if the main storage server goes down . I've got this script that initially zfs send's a whole zfs volume, and for every send after that only sends the diff . So after the initial zfs send, the diff's usually take less than a minute to send over. I've had increasing problems on the 'slave', it seem to grind to a halt for anything between 5-20 seconds after every zfs receive . Everything on the server halts / hangs completely. I've had a couple go's on trying to solve / figure out what's happening without luck, and this 3rd time I've invested even more time on the problem . To sum it up: -Server was initially on 8.2-RELEASE -I've set some sysctl variables such as: # 16GB arc_max ( server got 30GB of ram, but had a couple 'freeze' situations, suspect zfs.arc ate too much memory) vfs.zfs.arc_max=17179869184 # 8.2 default to 30 here, setting it to 5 which is default from 8.3 and onwards vfs.zfs.txg.timeout="5" # Set TXG write limit to a lower threshold. This helps "level out" # the throughput rate (see "zpool iostat"). A value of 256MB works well # for systems with 4 GB of RAM, while 1 GB works well for us w/ 8 GB on # disks which have 64 MB cache. <<BR>> # NOTE: in <v28, this tunable is called 'vfs.zfs.txg.write_limit_override'. #vfs.zfs.txg.write_limit_override=1073741824 # for 8.2 vfs.zfs.write_limit_override=1073741824 # for 8.3 and above -I've implemented mbuffer for the zfs send / receive operations. With mbuffer the sync went a lot faster, but still got the same symptoms when the zfs receive is done, the hang / unresponsiveness returns for 5-20 seconds -I've upgraded to 8.3-RELEASE ( + zpool upgrade and zfs upgrade to V28), same symptoms -I've upgraded to 9.1-RELEASE, still same symptoms The period where the server is unresponsive after a zfs receive, I suspected it would correlate with the amount of data being sent, but even if there is only a couple MB's data the hang / unresponsiveness is still substantial . I suspect it may have something to do with the zfs volume being sent is mount'ed on the slave, and I'm also doing the backups from the slave, which means a lot of the time the backup server is rsyncing the zfs volume being updated. I've noticed that the unresponsiveness / hang situations occur while the backupserver is rsync'ing from the zfs volume being updated, when the backupserver is 'done' and nothing is working with files in the zfs volume being updated i hardly notice any of the symptoms (mabye just a minor lag for much less than a second, hardly noticeable) . So my question(s) to the list would be: In my setup have I taken the use case for zfs send / receive too far (?) as in, it's not meant for this kind of syncing and this often, so there's actually nothing 'wrong'. -- ---------------------- Joar Jegleim Homepage: http://cosmicb.no Linkedin: http://no.linkedin.com/in/joarjegleim fb: http://www.facebook.com/joar.jegleim AKA: CosmicB @Freenode ----------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFfb-hpt4iKSb0S2fgQ16Hp51KLWJew1Se32yX1cUPYi6pp72g>