Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Jan 2013 18:01:27 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: stupid UFS behaviour on random writes
Message-ID:  <103826787.2103620.1358463687244.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <alpine.BSF.2.00.1301171939380.421@wojtek.tensor.gdynia.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
Wojciech Puchar wrote:
> create 10GB file (on 2GB RAM machine, with some swap used to make sure
> little cache would be available for filesystem.
> 
> dd if=/dev/zero of=file bs=1m count=10k
> 
> block size is 32KB, fragment size 4k
> 
> 
> now test random read access to it (10 threads)
> 
> randomio test 10 0 0 4096
> 
> normal result on such not so fast disk in my laptop.
> 
> 118.5 | 118.5 5.8 82.3 383.2 85.6 | 0.0 inf nan 0.0 nan
> 138.4 | 138.4 3.9 72.2 499.7 76.1 | 0.0 inf nan 0.0 nan
> 142.9 | 142.9 5.4 69.9 297.7 60.9 | 0.0 inf nan 0.0 nan
> 133.9 | 133.9 4.3 74.1 480.1 75.1 | 0.0 inf nan 0.0 nan
> 138.4 | 138.4 5.1 72.1 380.0 71.3 | 0.0 inf nan 0.0 nan
> 145.9 | 145.9 4.7 68.8 419.3 69.6 | 0.0 inf nan 0.0 nan
> 
> 
> systat shows 4kB I/O size. all is fine.
> 
> BUT random 4kB writes
> 
> randomio test 10 1 0 4096
> 
> total | read: latency (ms) | write: latency (ms)
> iops | iops min avg max sdev | iops min avg max
> sdev
> --------+-----------------------------------+----------------------------------
> 38.5 | 0.0 inf nan 0.0 nan | 38.5 9.0 166.5 1156.8 261.5
> 44.0 | 0.0 inf nan 0.0 nan | 44.0 0.1 251.2 2616.7 492.7
> 44.0 | 0.0 inf nan 0.0 nan | 44.0 7.6 178.3 1895.4 330.0
> 45.0 | 0.0 inf nan 0.0 nan | 45.0 0.0 239.8 3457.4 522.3
> 45.5 | 0.0 inf nan 0.0 nan | 45.5 0.1 249.8 5126.7 621.0
> 
> 
> 
> results are horrific. systat shows 32kB I/O, gstat shows half are
> reads
> half are writes.
> 
> Why UFS need to read full block, change one 4kB part and then write
> back, instead of just writing 4kB part?

Because that's the way the buffer cache works. It writes an entire buffer
cache block (unless at the end of file), so it must read the rest of the block into
the buffer, so it doesn't write garbage (the rest of the block) out.

I'd argue that using an I/O size smaller than the file system block size is
simply sub-optimal and that most apps. don't do random I/O of blocks.
OR
If you had an app. that does random I/O of 4K blocks (at 4K byte offsets),
then using a 4K/1K file system would be better.

NFS is the exception, in that it keeps track of a dirty byte range within
a buffer cache block and writes that byte range. (NFS writes are byte granular,
unlike a disk.)
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?103826787.2103620.1358463687244.JavaMail.root>