From owner-freebsd-performance@FreeBSD.ORG Tue Jun 14 12:20:14 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6F16116A41C for ; Tue, 14 Jun 2005 12:20:14 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 81BE343D53 for ; Tue, 14 Jun 2005 12:20:13 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87]) by mailout2.pacific.net.au (8.13.4/8.13.4/Debian-1) with ESMTP id j5ECK4AC028216; Tue, 14 Jun 2005 22:20:04 +1000 Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (8.13.4/8.13.4/Debian-1) with ESMTP id j5ECK2M9021015; Tue, 14 Jun 2005 22:20:02 +1000 Date: Tue, 14 Jun 2005 22:20:03 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Glenn Dawson In-Reply-To: <6.1.0.6.2.20050604230636.01bf68c0@cobalt.antimatter.net> Message-ID: <20050614213135.K38258@delplex.bde.org> References: <6.1.0.6.2.20050604230636.01bf68c0@cobalt.antimatter.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-performance@freebsd.org Subject: Re: vn(4) performance on 4.11 versus md(4) on 5.4 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2005 12:20:14 -0000 On Sat, 4 Jun 2005, Glenn Dawson wrote: > I have a number of systems running 4.11 that have file backed virtual disks, > each of which contains a jail. I need to start using 5.4 for new servers. > The catch is, file backed virtual disks using md(4) seem to be much slower > than similar virtual disks on 4.11 using vn(4). vn(4) on 4.11 is about 2.24 > times faster than the equivalent setup using md(4) on 5.4. > > I've posted the results of some tests that I ran at > http://www.antimatter.net/md-versus-vn.txt > > Is this decrease in performance known? Is there something I can do in order > to come close to the performance that 4.11 has? I've tried changing some of > the parameters of the filesystem on the virtual disk, but the performance > didn't change. Writes by md are now synchronous. Try turning this off using "mdconfig -o async ...", though this is probably too dangerous to use in production -- the sync writes are a hack to work around hangs, and my system hung almost instantly while testing this. For copying a cached copy of /usr/src/sys/ (~100MB) on an old de-GEOMed version of -current, with all filesystems mounted -async -noatime, I got the following times: # ffs1 fs on ad2s2d 6.21 real 0.52 user 3.39 sys # ffs2 fs on md2 (default) on file zz on previous fs 63.83 real 0.56 user 3.34 sys # ffs2 fs on md3 (-o async) on same file (after mdconfig -u 2) 16.10 real 0.50 user 3.40 sys Syncing of the last fs deadlocked the file systems on md3 and ad2s2d :-( but not others. For dd'ing /dev/zero to large file, the sync writes gave a loss of performance of almost exactly your factor of 2.24 relative to the non-md fs: the raw disk speed is about 55MB/sec and writing to the native ffs gave 54MB/sec by mostly writing with a physical block size of 64K and writing via md2 gave 25MB/sec by writing always with a physical block size of 16K. The size of 64K results from clustering and the size of 16K results from sync writes breaking clustering (md always writes the fs block size which is 16K in my tests, and since the writes are sync they must be done individually so they cannot be clustered). >From mdconfig(1): % -o [no]option % Set or reset options. % % [no]async % For vnode backed devices: avoid IO_SYNC for increased % performance but at the risk of deadlocking the entire % kernel. % ... % [no]cluster % Enable clustering on this disk. A nearby bug in md is that "-o cluster" has always been silently ignored. I think we decided that it is the user's responsibility to mount md-backed (and other file systems on non-physical or memory-like devices) with -o noclusterw -o noclusterr to prevent wasteful clustering). This is easy to forget, however. vn used to turn off clustering non-optionally to avoid some deadlock problems but this was removed long before 4.11 when the deadlock problems were supposed to be fixed, so turning off clustering was supposed to be only a small optimization. Try turning it off to see if it reduces deadlocks. >From md.c's cvs history: % RCS file: /home/ncvs/src/sys/dev/md/md.c,v % Working file: md.c % head: 1.124 % ... % ---------------------------- % revision 1.115 % date: 2004/03/10 20:41:08; author: phk; state: Exp; lines: +5 -3 % Fix a long-standing deadlock issue with vnode backed md(4) devices: % % On vnode backed md(4) devices over a certain, currently undetermined % size relative to the buffer cache our "lemming-syncer" can provoke % a buffer starvation which puts the md thread to sleep on wdrain. % % This generally tends to grind the entire system to a stop because the % event that is supposed to wake up the thread will not happen until a fair % bit of the piled up I/O requests in the system finish, and since a lot % of those are on a md(4) vnode backed device which is currently waiting % on wdrain until a fair amount of the piled up ... you get the picture. % % The cure is to issue all VOP_WRITES on the vnode backing the device % with IO_SYNC. % % In addition to more closely emulating a real disk device with a % non-lying write-cache, this makes the writes exempt from rate-limited % (there to avoid starving the buffer cache) and consequently prevents % the deadlock. % % Unfortunately performance takes a hit. % % Add "async" option to give people who know what they are doing the % old behaviour. % ---------------------------- Bruce