From owner-freebsd-hackers@FreeBSD.ORG  Tue Jul  8 22:30:13 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 34DF63C1;
 Tue,  8 Jul 2014 22:30:13 +0000 (UTC)
Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 155CD2925;
 Tue,  8 Jul 2014 22:30:12 +0000 (UTC)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
 by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id s68MU0Dw028257;
 Tue, 8 Jul 2014 15:30:04 -0700 (PDT)
 (envelope-from truckman@FreeBSD.org)
Message-Id: <201407082230.s68MU0Dw028257@gw.catspoiler.org>
Date: Tue, 8 Jul 2014 15:30:00 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
Subject: Re: Strange IO performance with UFS
To: kostikbel@gmail.com
In-Reply-To: <20140705195816.GV93733@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
Cc: freebsd-fs@FreeBSD.org, sparvu@systemdatarecorder.org,
 freebsd-hackers@FreeBSD.org, roger.pau@citrix.com
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Jul 2014 22:30:13 -0000

On  5 Jul, Konstantin Belousov wrote:
> On Sat, Jul 05, 2014 at 06:18:07PM +0200, Roger Pau Monn? wrote:

>> As can be seen from the log above, at first the workload runs fine,
>> and the disk is only performing writes, but at some point (in this
>> case around 40% of completion) it starts performing this
>> read-before-write dance that completely screws up performance.
> 
> I reproduced this locally.  I think my patch is useless for the fio/4k write
> situation.
> 
> What happens is indeed related to the amount of the available memory.
> When the size of the file written by fio is larger than the memory,
> system has to recycle the cached pages.  So after some moment, doing
> a write has to do read-before-write, and this occurs not at the EOF
> (since fio pre-allocated the job file).

I reproduced this locally with dd if=/dev/zero bs=4k conv=notrunc ...
For the small file case, if I flush the file from cache by unmounting
the filesystem where it resides and then remounting the filesystem, then
I see lots of reads right from the start.

> In fact, I used 10G file on 8G machine, but I interrupted the fio
> before it finish the job.  The longer the previous job runs, the longer
> is time for which new job does not issue reads.  If I allow the job to
> completely fill the cache, then the reads starts immediately on the next
> job run.
> 
> I do not see how could anything be changed there, if we want to keep
> user file content on partial block writes, and we do.

About the only thing I can think of that might help is to trigger
readahead when we detect sequential small writes.  We'll still have to
do the reads, but hopefully they will be larger and occupy less time in
the critical path.

Writing a multiple of the filesystem blocksize is still the most
efficient strategy.