Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Jun 2006 20:10:49 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Ivan Voras <ivoras@fer.hr>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Is the fsync() fake on FreeBSD6.1?
Message-ID:  <20060629195739.L77878@delplex.bde.org>
In-Reply-To: <44A2E0FD.6060302@fer.hr>
References:  <a0cd7c070606270032h3a42de6ahf21cd11abedb6400@mail.gmail.com> <44A1B958.4030204@fer.hr> <20060628230439.M75051@delplex.bde.org> <44A2E0FD.6060302@fer.hr>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 28 Jun 2006, Ivan Voras wrote:

> Bruce Evans wrote:
>
>>> But I see strange results with iostat. It shows 16KB transactions, ~2900 
>>> tps and 46 MB/s. On the other hand, the program runs for ~36 seconds, 
>>> which gives ~1390 tps (this is a single desktop drive). Since 36 seconds 
>>> of 46MB/s would result in a file 1.6 GB in size, while it's clearly 
>>> 50000*512=25MB, iostat is lying.
>> 
>> This is because you fsync() every 512 bytes.  The file system then writes
>> a 16K inode block and a 16K data block, giving 64 times as much i/o as
>> necessary.
>
> Ok, so you're saying that it actually does 46MB/s, rewriting 16K FS blocks 
> over and over?

Yes.  It's actually surprising that the speed is only 46MB/s if the drive
caches the write.

> In that case, wouldn't all writes to the FS, especially with soft-updates be 
> minimally 16K+16K? It doesn't look like it when I monitor a live server - 
> there are 8KB and 4KB writes... maybe UFS fragments complicate the 
> (ac)counting.

Yes, the minimum i/o size is the fragment size, and the average file size
is still probably smaller than 16K.  However, for sequential writes to
large files, most writes should be 64K+0K (DFLTPHYS+0K) or 128K+0K
(MAXPHYS+0K) depending on how broken MAXPHYS vs MINPHYS is.  Clustering
combines 16K-blocks into either 64K or 128K-blocks, and at least without
soft updates and without sync mounting or fsync(), inode updates are
normally delayed longer than data updates.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060629195739.L77878>