From owner-freebsd-fs@FreeBSD.ORG Sun Sep 12 21:01:49 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3623E1065672; Sun, 12 Sep 2010 21:01:49 +0000 (UTC) (envelope-from prvs=18715e5890=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 7170E8FC17; Sun, 12 Sep 2010 21:01:48 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Sun, 12 Sep 2010 22:01:43 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sun, 12 Sep 2010 22:01:43 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 by mail1.multiplay.co.uk (MDaemon PRO v10.0.4) with ESMTP id md50011223331.msg; Sun, 12 Sep 2010 22:01:43 +0100 X-Authenticated-Sender: Killing@multiplay.co.uk X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=18715e5890=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <3FBF66BF11AA4CBBA6124CA435A4A31B@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <5DB6E7C798E44D33A05673F4B773405E@multiplay.co.uk><4C85E91E.1010602@icyb.net.ua> <4C873914.40404@freebsd.org><20100908084855.GF2465@deviant.kiev.zoral.com.ua> <4C874F00.3050605@freebsd.org> <4C8D087B.5040404@freebsd.org> <03537796FAB54E02959E2D64FC83004F@multiplay.co.uk> <4C8D280F.3040803@freebsd.org> Date: Sun, 12 Sep 2010 22:01:42 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5931 Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek , jhell Subject: Re: zfs very poor performance compared to ufs due to lack of cache? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Sep 2010 21:01:49 -0000 ----- Original Message ----- From: "Andriy Gapon" > > All :-) > Revision of your code, all the extra patches, workload, graphs of ARC and memory > dynamics and that's just for the start. > Then, analysis similar to that of Wiktor. E.g. trying to test with a single > file and then removing it, or better yet, examining with DTrace actual code > paths taken from sendfile(2). All those have been given in past posts on this thread, but that's quite fragmented, sorry about that, so here's the current summary for reference:- The machine is a stream server with its job being to serve mp4 http streams via nginx. It also exports the fs via nfs to an encoding box which does all the grunt work of creating the streams, but that doesn't seem relevant here as this was not in use during these tests. We currently have two such machines one which has been updated to zfs and one which is still on ufs. After upgrading to 8.1-RELEASE and zfs all seemed ok until we had a bit of a traffic hike at which point we noticed the machine in question really struggling even though it was serving less than 100 clients at under 3mbps for a few popular streams which should have all easily fitted in cache. Upon investigation it seems that zfs wasn't caching anything so all streams where being read direct from disk overloading the areca controller backed with a 7 disk RAID6 volume. After my original post we've done a number of upgrades and we are now currently running 8-STABLE as of the 06/09 plus the following http://people.freebsd.org/~mm/patches/zfs/v15/stable-8-v15.patch http://people.freebsd.org/~mm/patches/zfs/zfs_metaslab_v2.patch http://people.freebsd.org/~mm/patches/zfs/zfs_abe_stat_rrwlock.patch needfree.patch and vm_paging_needed.patch posted by jhell > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c > @@ -500,6 +500,7 @@ again: > sched_unpin(); > } > VM_OBJECT_LOCK(obj); > + if (error == 0) > + vm_page_set_validclean(m, off, bytes); > vm_page_wakeup(m); > if (error == 0) > uio->uio_resid -= bytes; When nginx is active and using sendfile we see a large amount of memory, equivalent to the size of the files being accessed it seems, slip into inactive according to top and the size of arc drop to the at most the minimum configured and some times even less. The machine now has 7GB or ram and these are the load.conf settings currently in use:- # As we have battery backed cache we can do this vfs.zfs.cache_flush_disable=1 vfs.zfs.prefetch_disable=0 # Physical Memory * 1.5 vm.kmem_size="11G" vfs.zfs.arc_min="5G" vfs.zfs.arc_max="6656M" vfs.zfs.vdev.cache.size="20M" Currently arc_summary reports the following after been idle for several hours:- ARC Size: Current Size: 76.92% 5119.85M (arcsize) Target Size: (Adaptive) 76.92% 5120.00M (c) Min Size (Hard Limit): 76.92% 5120.00M (c_min) Max Size (High Water): ~1:1 6656.00M (c_max) Column details as requested previously:- cnt, time, kstat.zfs.misc.arcstats.size, vm.stats.vm.v_pdwakeups, vm.stats.vm.v_cache_count, vm.stats.vm.v_inactive_count, vm.stats.vm.v_active_count, vm.stats.vm.v_wire_count, vm.stats.vm.v_free_count 1,1284323760,5368902272,72,49002,156676,27241,1505466,32523 2,1284323797,5368675288,73,51593,156193,27612,1504846,30682 3,1284323820,5368675288,73,51478,156248,27649,1504874,30671 4,1284323851,5368670688,74,22994,184834,27609,1504794,30698 5,1284323868,5368670688,74,22990,184838,27605,1504792,30698 6,1284324024,5368679992,74,22246,184624,27663,1505177,31171 7,1284324057,5368679992,74,22245,184985,27663,1504844,31170 Point notes: 1. Initial values 2. single file request size: 692M 3. repeat request #2 4. request for second file 205M 5. repeat request #4 6. multi request #2 7. complete top details after tests:- Mem: 106M Active, 723M Inact, 5878M Wired, 87M Cache, 726M Buf, 124M Free Swap: 4096M Total, 836K Used, 4095M Free arc_summary snip after test ARC Size: Current Size: 76.92% 5119.97M (arcsize) Target Size: (Adaptive) 76.92% 5120.09M (c) Min Size (Hard Limit): 76.92% 5120.00M (c_min) Max Size (High Water): ~1:1 6656.00M (c_max) If I turn the box on so it gets a real range of requests, after about an hour I see something like:- Mem: 104M Active, 2778M Inact, 3065M Wired, 20M Cache, 726M Buf, 951M Free Swap: 4096M Total, 4096M Free ARC Size: Current Size: 34.37% 2287.36M (arcsize) Target Size: (Adaptive) 100.00% 6656.00M (c) Min Size (Hard Limit): 76.92% 5120.00M (c_min) Max Size (High Water): ~1:1 6656.00M (c_max) As you can see the size of ARC has even dropped below c_min. The results of the live test where gathered directly after a reboot, in case that's relevant. If someone could suggest a set of tests that would help I'll be happy to run them but from what's been said thus far is seems that the use of sendfile is forcing memory use other than that coming from arc which is what's expected? Would running the same test with sendfile disabled in nginx help? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.