Date: Mon, 15 Oct 2012 10:28:44 +0300 From: Nikolay Denev <ndenev@gmail.com> To: "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.ORG> Subject: Bad ZFS - NFS interaction? [ was: NFS server bottlenecks ] Message-ID: <65F06188-F333-4961-B3E9-CB8EB8696945@gmail.com> In-Reply-To: <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca> <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Oct 13, 2012, at 6:22 PM, Nikolay Denev <ndenev@gmail.com> wrote: >=20 > On Oct 13, 2012, at 5:05 AM, Rick Macklem <rmacklem@uoguelph.ca> = wrote: >=20 >> I wrote: >>> Oops, I didn't get the "readahead" option description >>> quite right in the last post. The default read ahead >>> is 1, which does result in "rsize * 2", since there is >>> the read + 1 readahead. >>>=20 >>> "rsize * 16" would actually be for the option "readahead=3D15" >>> and for "readahead=3D16" the calculation would be "rsize * 17". >>>=20 >>> However, the example was otherwise ok, I think? rick >>=20 >> I've attached the patch drc3.patch (it assumes drc2.patch has already = been >> applied) that replaces the single mutex with one for each hash list >> for tcp. It also increases the size of NFSRVCACHE_HASHSIZE to 200. >>=20 >> These patches are also at: >> http://people.freebsd.org/~rmacklem/drc2.patch >> http://people.freebsd.org/~rmacklem/drc3.patch >> in case the attachments don't get through. >>=20 >> rick >> ps: I haven't tested drc3.patch a lot, but I think it's ok? >=20 > drc3.patch applied and build cleanly and shows nice improvement! >=20 > I've done a quick benchmark using iozone over the NFS mount from the = Linux host. >=20 > drc2.pach (but with NFSRVCACHE_HASHSIZE=3D500) >=20 > TEST WITH 8K > = --------------------------------------------------------------------------= ----------------------- > Auto Mode > Using Minimum Record Size 8 KB > Using Maximum Record Size 8 KB > Using minimum file size of 2097152 kilobytes. > Using maximum file size of 2097152 kilobytes. > O_DIRECT feature enabled > SYNC Mode.=20 > OPS Mode. Output is in operations per second. > Command line used: iozone -a -y 8k -q 8k -n 2g -g 2g -C -I -o = -O -i 0 -i 1 -i 2 > Time Resolution =3D 0.000001 seconds. > Processor cache size set to 1024 Kbytes. > Processor cache line size set to 32 bytes. > File stride size set to 17 * record size. > random = random bkwd record stride =20 > KB reclen write rewrite read reread read = write read rewrite read fwrite frewrite fread freread > 2097152 8 1919 1914 2356 2321 2335 = 1706 =20 >=20 > TEST WITH 1M > = --------------------------------------------------------------------------= ----------------------- > Auto Mode > Using Minimum Record Size 1024 KB > Using Maximum Record Size 1024 KB > Using minimum file size of 2097152 kilobytes. > Using maximum file size of 2097152 kilobytes. > O_DIRECT feature enabled > SYNC Mode.=20 > OPS Mode. Output is in operations per second. > Command line used: iozone -a -y 1m -q 1m -n 2g -g 2g -C -I -o = -O -i 0 -i 1 -i 2 > Time Resolution =3D 0.000001 seconds. > Processor cache size set to 1024 Kbytes. > Processor cache line size set to 32 bytes. > File stride size set to 17 * record size. > random = random bkwd record stride =20 > KB reclen write rewrite read reread read = write read rewrite read fwrite frewrite fread freread > 2097152 1024 73 64 477 486 496 = 61 =20 >=20 >=20 > drc3.patch >=20 > TEST WITH 8K > = --------------------------------------------------------------------------= ----------------------- > Auto Mode > Using Minimum Record Size 8 KB > Using Maximum Record Size 8 KB > Using minimum file size of 2097152 kilobytes. > Using maximum file size of 2097152 kilobytes. > O_DIRECT feature enabled > SYNC Mode.=20 > OPS Mode. Output is in operations per second. > Command line used: iozone -a -y 8k -q 8k -n 2g -g 2g -C -I -o = -O -i 0 -i 1 -i 2 > Time Resolution =3D 0.000001 seconds. > Processor cache size set to 1024 Kbytes. > Processor cache line size set to 32 bytes. > File stride size set to 17 * record size. > random = random bkwd record stride =20 > KB reclen write rewrite read reread read = write read rewrite read fwrite frewrite fread freread > 2097152 8 2108 2397 3001 3013 3010 = 2389 =20 >=20 >=20 > TEST WITH 1M > = --------------------------------------------------------------------------= ----------------------- > Auto Mode > Using Minimum Record Size 1024 KB > Using Maximum Record Size 1024 KB > Using minimum file size of 2097152 kilobytes. > Using maximum file size of 2097152 kilobytes. > O_DIRECT feature enabled > SYNC Mode.=20 > OPS Mode. Output is in operations per second. > Command line used: iozone -a -y 1m -q 1m -n 2g -g 2g -C -I -o = -O -i 0 -i 1 -i 2 > Time Resolution =3D 0.000001 seconds. > Processor cache size set to 1024 Kbytes. > Processor cache line size set to 32 bytes. > File stride size set to 17 * record size. > random = random bkwd record stride =20 > KB reclen write rewrite read reread read = write read rewrite read fwrite frewrite fread freread > 2097152 1024 80 79 521 536 528 = 75 =20 >=20 >=20 > Also with drc3 the CPU usage on the server is noticeably lower. Most = of the time I could see only the geom{g_up}/{g_down} threads, > and a few nfsd threads, before that nfsd's were much more prominent. >=20 > I guess under bigger load the performance improvement can be bigger. >=20 > I'll run some more tests with heavier loads this week. >=20 > Thanks, > Nikolay >=20 >=20 If anyone is interested here's a FlameGraph generated using DTrace and Brendan Gregg's tools from https://github.com/brendangregg/FlameGraph : https://home.totalterror.net/freebsd/goliath-kernel.svg It was sampled during Oracle database restore from Linux host over the = nfs mount. Currently all IO on the dataset that the linux machine writes is stuck, = simple ls in the directory hangs for maybe 10-15 minutes and then eventually completes. Looks like some weird locking issue. [*] http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/ P.S.: The machine runs with drc3.patch for the NFS server. P.S.2: The nfsd server is configured for vfs.nfsd.maxthreads=3D200, = maybe that's too much?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?65F06188-F333-4961-B3E9-CB8EB8696945>