Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 Oct 2012 10:28:44 +0300
From:      Nikolay Denev <ndenev@gmail.com>
To:        "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.ORG>
Subject:   Bad ZFS - NFS interaction? [ was: NFS server bottlenecks ]
Message-ID:  <65F06188-F333-4961-B3E9-CB8EB8696945@gmail.com>
In-Reply-To: <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com>
References:  <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca> <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Oct 13, 2012, at 6:22 PM, Nikolay Denev <ndenev@gmail.com> wrote:

>=20
> On Oct 13, 2012, at 5:05 AM, Rick Macklem <rmacklem@uoguelph.ca> =
wrote:
>=20
>> I wrote:
>>> Oops, I didn't get the "readahead" option description
>>> quite right in the last post. The default read ahead
>>> is 1, which does result in "rsize * 2", since there is
>>> the read + 1 readahead.
>>>=20
>>> "rsize * 16" would actually be for the option "readahead=3D15"
>>> and for "readahead=3D16" the calculation would be "rsize * 17".
>>>=20
>>> However, the example was otherwise ok, I think? rick
>>=20
>> I've attached the patch drc3.patch (it assumes drc2.patch has already =
been
>> applied) that replaces the single mutex with one for each hash list
>> for tcp. It also increases the size of NFSRVCACHE_HASHSIZE to 200.
>>=20
>> These patches are also at:
>> http://people.freebsd.org/~rmacklem/drc2.patch
>> http://people.freebsd.org/~rmacklem/drc3.patch
>> in case the attachments don't get through.
>>=20
>> rick
>> ps: I haven't tested drc3.patch a lot, but I think it's ok?
>=20
> drc3.patch applied and build cleanly and shows nice improvement!
>=20
> I've done a quick benchmark using iozone over the NFS mount from the =
Linux host.
>=20
> drc2.pach (but with NFSRVCACHE_HASHSIZE=3D500)
>=20
> 	TEST WITH 8K
> 	=
--------------------------------------------------------------------------=
-----------------------
>        Auto Mode
>        Using Minimum Record Size 8 KB
>        Using Maximum Record Size 8 KB
>        Using minimum file size of 2097152 kilobytes.
>        Using maximum file size of 2097152 kilobytes.
>        O_DIRECT feature enabled
>        SYNC Mode.=20
>        OPS Mode. Output is in operations per second.
>        Command line used: iozone -a -y 8k -q 8k -n 2g -g 2g -C -I -o =
-O -i 0 -i 1 -i 2
>        Time Resolution =3D 0.000001 seconds.
>        Processor cache size set to 1024 Kbytes.
>        Processor cache line size set to 32 bytes.
>        File stride size set to 17 * record size.
>                                                            random  =
random    bkwd   record   stride                                  =20
>              KB  reclen   write rewrite    read    reread    read   =
write    read  rewrite     read   fwrite frewrite   fread  freread
>         2097152       8    1919    1914     2356     2321    2335    =
1706                                                         =20
>=20
> 	TEST WITH 1M
> 	=
--------------------------------------------------------------------------=
-----------------------
>        Auto Mode
>        Using Minimum Record Size 1024 KB
>        Using Maximum Record Size 1024 KB
>        Using minimum file size of 2097152 kilobytes.
>        Using maximum file size of 2097152 kilobytes.
>        O_DIRECT feature enabled
>        SYNC Mode.=20
>        OPS Mode. Output is in operations per second.
>        Command line used: iozone -a -y 1m -q 1m -n 2g -g 2g -C -I -o =
-O -i 0 -i 1 -i 2
>        Time Resolution =3D 0.000001 seconds.
>        Processor cache size set to 1024 Kbytes.
>        Processor cache line size set to 32 bytes.
>        File stride size set to 17 * record size.
>                                                            random  =
random    bkwd   record   stride                                  =20
>              KB  reclen   write rewrite    read    reread    read   =
write    read  rewrite     read   fwrite frewrite   fread  freread
>         2097152    1024      73      64      477      486     496      =
61                                                         =20
>=20
>=20
> drc3.patch
>=20
> 	TEST WITH 8K
> 	=
--------------------------------------------------------------------------=
-----------------------
>        Auto Mode
>        Using Minimum Record Size 8 KB
>        Using Maximum Record Size 8 KB
>        Using minimum file size of 2097152 kilobytes.
>        Using maximum file size of 2097152 kilobytes.
>        O_DIRECT feature enabled
>        SYNC Mode.=20
>        OPS Mode. Output is in operations per second.
>        Command line used: iozone -a -y 8k -q 8k -n 2g -g 2g -C -I -o =
-O -i 0 -i 1 -i 2
>        Time Resolution =3D 0.000001 seconds.
>        Processor cache size set to 1024 Kbytes.
>        Processor cache line size set to 32 bytes.
>        File stride size set to 17 * record size.
>                                                            random  =
random    bkwd   record   stride                                  =20
>              KB  reclen   write rewrite    read    reread    read   =
write    read  rewrite     read   fwrite frewrite   fread  freread
>         2097152       8    2108    2397     3001     3013    3010    =
2389                                                         =20
>=20
>=20
> 	TEST WITH 1M
> 	=
--------------------------------------------------------------------------=
-----------------------
>        Auto Mode
>        Using Minimum Record Size 1024 KB
>        Using Maximum Record Size 1024 KB
>        Using minimum file size of 2097152 kilobytes.
>        Using maximum file size of 2097152 kilobytes.
>        O_DIRECT feature enabled
>        SYNC Mode.=20
>        OPS Mode. Output is in operations per second.
>        Command line used: iozone -a -y 1m -q 1m -n 2g -g 2g -C -I -o =
-O -i 0 -i 1 -i 2
>        Time Resolution =3D 0.000001 seconds.
>        Processor cache size set to 1024 Kbytes.
>        Processor cache line size set to 32 bytes.
>        File stride size set to 17 * record size.
>                                                            random  =
random    bkwd   record   stride                                  =20
>              KB  reclen   write rewrite    read    reread    read   =
write    read  rewrite     read   fwrite frewrite   fread  freread
>         2097152    1024      80      79      521      536     528      =
75                                                         =20
>=20
>=20
> Also with drc3 the CPU usage on the server is noticeably lower. Most =
of the time I could see only the geom{g_up}/{g_down} threads,
> and a few nfsd threads, before that nfsd's were much more prominent.
>=20
> I guess under bigger load the performance improvement can be bigger.
>=20
> I'll run some more tests with heavier loads this week.
>=20
> Thanks,
> Nikolay
>=20
>=20

If anyone is interested here's a FlameGraph generated using DTrace and
Brendan Gregg's tools from https://github.com/brendangregg/FlameGraph :

	https://home.totalterror.net/freebsd/goliath-kernel.svg

It was sampled during Oracle database restore from Linux host over the =
nfs mount.
Currently all IO on the dataset that the linux machine writes is stuck, =
simple ls in the directory
hangs for maybe 10-15 minutes and then eventually completes.

Looks like some weird locking issue.

[*] http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/

P.S.: The machine runs with drc3.patch for the NFS server.
P.S.2: The nfsd server is configured for vfs.nfsd.maxthreads=3D200, =
maybe that's too much?




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?65F06188-F333-4961-B3E9-CB8EB8696945>