Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 5 Jul 2014 12:58:31 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Roger Pau Monn? <roger.pau@citrix.com>
Cc:        freebsd-fs@freebsd.org, Stefan Parvu <sparvu@systemdatarecorder.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: Strange IO performance with UFS
Message-ID:  <20140705095831.GO93733@kib.kiev.ua>
In-Reply-To: <53B7C616.1000702@citrix.com>
References:  <53B691EA.3070108@citrix.com> <53B69C73.7090806@citrix.com> <20140705001938.54a3873dd698080d93d840e2@systemdatarecorder.org> <53B7C616.1000702@citrix.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--YAtIlCcyqLEoH4m3
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Jul 05, 2014 at 11:32:06AM +0200, Roger Pau Monn? wrote:
> On 04/07/14 23:19, Stefan Parvu wrote:
> > Hi,
> >=20
> >>> I'm doing some tests on IO performance using fio, and I've found
> >>> something weird when using UFS and large files. I have the following
> >>> very simple sequential fio workload:
> >=20
> > System:
> > FreeBSD ox 10.0-RELEASE-p6 FreeBSD 10.0-RELEASE-p6 #0: Tue Jun 24 07:47=
:37 UTC 2014    =20
> > root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
> >=20
> >=20
> > 1. Seq Write to 1 file, 10GB size, single writer, block 4k, UFS2:
> >=20
> > I tried to write seq using a single writer using an IOSIZE similar to y=
our example, 10
> > GB to a 14TB Hdw RAID 10 LSI device using fio 2.1.9 under FreeBSD 10.0.=
=20
> >=20
> > Result:
> > Run status group 0 (all jobs):
> >   WRITE: io=3D10240MB, aggrb=3D460993KB/s, minb=3D460993KB/s, maxb=3D46=
0993KB/s,=20
> >   mint=3D22746msec, maxt=3D22746msec
>=20
> This looks much better than what I've saw in my benchmarks, how much
> memory does the system have?
>=20
> In my case I've seen the reads issue when trying to write to files that
> where greater than the memory the system has. My box has 6GB of RAM and
> I was using a 10GB file.
>=20
> >=20
> >=20
> > 2. Seq Write to 2500 files, each file 5MB size, multiple writers, UFS2:
> >=20
> > Result:
> > Run status group 0 (all jobs):
> >   WRITE: io=3D12500MB, aggrb=3D167429KB/s, minb=3D334KB/s, maxb=3D9968K=
B/s,=20
> >   mint=3D2568msec, maxt=3D76450msec
> >=20
> > Questions:
> >=20
> >  - where are you writing, what storage: hdw / sfw RAID ?
>=20
> The storage is a simple SATA disk, no RAID:
>=20
> pass0 at ahcich0 bus 0 scbus0 target 0 lun 0
> pass0: <ST500DM002-1BD142 KC45> ATA-8 SATA 3.x device
> pass0: Serial Number Z3T3FJXL
> pass0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> pass0: Command Queueing enabled
>=20
> >  - are you using time based fio tests ?
>=20
> I'm using the following fio workload, as stated in the first email:
>=20
> [global]
> rw=3Dwrite
> size=3D4g
> bs=3D4k
>=20
> [job1]
>=20
> The problem doesn't seem to be related to the hardware (I've also seen
> this when running inside of a VM), but to UFS itself that at some point
> (or maybe under certain conditions) starts making a lot of reads when
> doing a simple write:
>=20
>               kernel`g_io_request+0x384
>               kernel`g_part_start+0x2c3
>               kernel`g_io_request+0x384
>               kernel`g_part_start+0x2c3
>               kernel`g_io_request+0x384
>               kernel`ufs_strategy+0x8a
>               kernel`VOP_STRATEGY_APV+0xf5
>               kernel`bufstrategy+0x46
>               kernel`cluster_read+0x5e6
>               kernel`ffs_balloc_ufs2+0x1be2
>               kernel`ffs_write+0x310
>               kernel`VOP_WRITE_APV+0x166
>               kernel`vn_write+0x2eb
>               kernel`vn_io_fault_doio+0x22
>               kernel`vn_io_fault1+0x78
>               kernel`vn_io_fault+0x173
>               kernel`dofilewrite+0x85
>               kernel`kern_writev+0x65
>               kernel`sys_write+0x63
>=20
> This can also be seen by running iostat in parallel with the fio workload:
>=20
> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> ada0     243.3 233.7 31053.3 29919.1   31  57.4 100
>=20
> This clearly shows that even when I was doing a sequential write (the
> fio workload shown above), the disk was actually reading more data than
> writing it, which makes no sense, and all the reads come from the path
> trace shown above.

The backtrace above means that the BA_CLRBUF was specified for UFS_BALLOC().
In turns, this occurs when the write size is less than the UFS block size.
UFS has to read the block to ensure that partial write does not corrupt
the rest of the buffer.

You can get the block size for file with stat(2), st_blksize field of
the struct stat, or using statfs(2), field f_iosize of struct statfs,
or just looking at the dumpfs output for your filesystem, the bsize
value.  For modern UFS typical value is 32KB.

--YAtIlCcyqLEoH4m3
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJTt8xGAAoJEJDCuSvBvK1Bj4oP/1vOuNuetXtgTXYq7OwGEjRm
l45JFiwz4onMkXoqHQv7t/GEluC1SZjTXeUV8uFPcb+azSOAgRD7EpXaRzQ+2TNV
lQYIrFQ1CBND6NBfabNQ1V7upnGZ5jkDx2egMnOJgLGae59308+SrUa5d1Z5D3d/
JpuaA8IcWo8UmowE+SH4pFa0gQjmY3CBbxjjTNJo3sEi5EjGerf4UKqEV5v8tBWg
kL8dOYDFPidvU8pur9thjvLtOFiDTVbypaPsB6gbdixJZvPBEn3GNJPWt3eiCDTp
NM0amiHk57JncSx3EJSmH5BxhHrdHrNAfW2S3LzwGN6Iul3rvoyQdQFSZX85TNRc
9YB8QvdaDx48MsEQnv1SXlJSHJQFPzpzQ7xQjjAvee+yhBX8iCAdaqAY/uBgG5iM
XznhtERlaBcIeh59VZFUdH0Iwq/x6t0/di6DzP3NakB2RW9bCZp6fP0fy/eM954Y
ScDm7NF6YfJ8vJhFnOK8sRPeMGn93HPMYUaAMLLJgkQoHOc0gsOW24C9y3Gepysb
LS69tWq4MHa+fxkQSrwsTaSpNjWUEvE6b+YESquk8wqs/ilW+MuD8T6fSw7UrZUF
rsIQ7yxdIyploduNP7YWbTnBxw3qCinVwZ4LplPjP3OeRi+HbA1GQkT+uR4Ak25h
UeOraCzeLWtLU60U6AM1
=JOHj
-----END PGP SIGNATURE-----

--YAtIlCcyqLEoH4m3--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140705095831.GO93733>