Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Mar 2000 16:59:25 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Richard Wendland <richard@netcraft.com>
Cc:        Paul Richards <paul@originative.co.uk>, Alfred Perlstein <bright@wintelcom.net>, Poul-Henning Kamp <phk@critter.freebsd.dk>, current@FreeBSD.ORG, fs@FreeBSD.ORG
Subject:   Re: FreeBSD random I/O performance issues
Message-ID:  <200003220059.QAA83848@apollo.backplane.com>
References:   <200003220022.AAA28786@ns0.netcraft.com>

next in thread | previous in thread | raw e-mail | index | archive | help

:Paul Richards said in "Re: patches for test / review":
:
:> Richard, do you want to post a summary of your tests?
:
:Well I'd best post the working draft of my report on the issues
:I've seen, as I'm not going to have time to work on it in the near
:future, and it raises serious performance issues that are best
:looked at soon.  Note none of these detailed results are from
:current, but Paul Richards has checked that these issues are still
:present in current.
:
: (lots of good stuff)

    Interesting.  The behavior is probably related closely to the
    write-behind methodology that UFS uses.

    A while back while fixing an O(N^2) degenerate condition in the buffer
    cache queueing code, DG and I had a long discussion of the write_behind
    behavior.  I added a sysctl to 4.x that changes the write_behind
    behavior:

	sysctl vfs.write_behind

	0	Turned off
	1	Normal		(default)
	2	Backed off

    It would be interesting to see how the benchmark performs with 
    write_behind turned off (set to 0).  Note that a setting of 2
    is highly experimental and will probably suffer from the same problem(s)
    that normal mode suffers from.  (see below, I ran the benchmark)

    In general turning off write behind is *NOT* a good idea, because
    it saturates the buffer cache with dirty blocks and can lead to seriously
    degraded performance on a normal system due to write hogging.   On the
    flip side, this was all before I put in the new buffer cache flushing code
    so it is possible that 4.x will not degrade as seriously with write
    behind turned off.  I haven't run saturation tests recently with 
    write_behind turned off.

    A secondary issue -- actually the reason *why* performance is so bad, is
    that the buffer cache nominally locks the underlying VM pages when issuing
    a write and this is almost certainly the cause of the program stalls.
    When a program writes a piece of data (and I/O is started immediately),
    and then reads it back later on, the read operation may stall even though
    the data is in the cache due to the write not having yet completed.  The
    write operation might also stall if another nearby write is in progress
    (I'm not sure on that last point).

    Kirk has made significant improvements to stalls related to bitmap 
    operations.  I'm not sure if softupdates must be turned on or not to
    get these improvements.  The data blocks can still stall, though, but 
    part of the plan for later this year is to fix that too.

:The benchmark program source code is available, and easy to run,
:the bottom of the report has links.

    test3:/test/tmp# sysctl -w vfs.write_behind=0		(turned off)
    test3:/test/tmp# time ./seekreadwrite xxx 10000
    0.125u 0.807s 0:00.93 98.9%     5+181k 0+0io 0pf+0w

    test3:/test/tmp# sysctl -w vfs.write_behind=1		(normal)
    test3:/test/tmp# time ./seekreadwrite xxx 10000
    0.040u 1.709s 0:32.57 5.3%      4+174k 0+8750io 0pf+0w


:I also have a range of results from an ATA (IDE) cheap deskside
:Dell system running FreeBSD 3.3-RELEASE, with a range of wd(4)
:flags.  This system exhibits much better performance than the SCSI
:systems above at this benchmark, perhaps related to better DMA
:ability.
:
:ATA being faster than SCSI on this benchmark is a bit of a side-issue
:to the thrust of this report, but the performance numbers may give
:hints diagnosing the problem.

    IDE drives sometimes appear to be faster because they fake the 
    write-completion response (they return the response prior to the
    write actually completing).  It could also simply be that the 
    lack of any real mixed I/O (due to the file being so small) is
    a slightly faster operation on an IDE drive.  I wouldn't read much
    into it... where SCSI really shines is in more heavily loaded 
    environments.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

:Thanks,
:	Richard
:-
:Richard Wendland				richard@netcraft.com



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200003220059.QAA83848>