Date: Sun, 03 Jan 2016 12:45:18 -0500 From: "Mikhail T." <mi+thun@aldan.algebra.com> To: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>, Tom Curry <thomasrcurry@gmail.com> Cc: freebsd-fs@freebsd.org Subject: Re: NFS reads vs. writes Message-ID: <56895E2E.8060405@aldan.algebra.com> In-Reply-To: <alpine.GSO.2.01.1601031006020.28454@freddy.simplesystems.org> References: <568880D3.3010402@aldan.algebra.com> <alpine.GSO.2.01.1601031006020.28454@freddy.simplesystems.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 03.01.2016 10:32, Tom Curry wrote: > What does disk activity (gstat or iostat) look like when this is going on? I use systat for such observations. Here is a typical snapshot of machine a, when it reads its own /a and writes over NFS to b:/b: 3,6%Sys 0,0%Intr 15,6%User 0,0%Nice 80,8%Idle ozfod 2 ata0 14 | | | | | | | | | | %ozfod 69 uhci0 ehci ==>>>>>>>> daefr 205 uhci1 ahci 13 dtbuf prcfr 2577 hpet0 20 Namei Name-cache Dir-cache 248362 desvn 2374 totfr 1277 em0 256 Calls hits % hits % 95980 numvn react hdac0 257 1575 1508 96 62091 frevn pdwak 94 hdac1 258 288 pdpgs vgapci0 Disks md0 ada0 ada1 ada2 ada3 da0 da1 intrn KB/t 0,00 119 26,42 19,10 21,72 0,00 0,00 6720528 wire tps 0 47 72 64 69 0 0 721552 act MB/s 0,00 5,42 1,87 1,19 1,45 0,00 0,00 2331396 inact %busy 0 4 19 11 11 0 0 88 cache 403384 free 1056992 buf The ada0 is the ssd hosting both read cache and zil devices, ada{1,2,3} are the three disks comprising a RAID5 zpool. Meanwhile on the b-side the following is going on: 4,2%Sys 0,0%Intr 0,0%User 0,0%Nice 95,8%Idle ozfod hdac0 18 | | | | | | | | | | %ozfod fwohci0 19 == daefr 429 hpet0 uhci 22 dtbuf prcfr 50 uhci0 uhci Namei Name-cache Dir-cache 282383 desvn totfr 598 atapci1 23 Calls hits % hits % 107825 numvn react 141 mpt0 257 18 17 94 70416 frevn pdwak 1025 bce1 258 50 pdpgs Disks md0 ada0 da0 da1 da2 da3 da4 intrn KB/t 0,00 6,50 0,00 80,21 16,00 79,59 68,42 4794972 wire tps 0 594 0 53 2 55 39 130060 act MB/s 0,00 3,77 0,00 4,18 0,03 4,29 2,63 7153984 inact %busy 0 95 0 10 1 14 8 131100 cache Here too the ada0 hosts the log-device and appears to be the bottleneck. There is no read-cache on b, and the zpool consists of da1, da3, and da4 simply striped together (no redundancy). When, instead of /pushing/ data out of a, I begin /pulling/ it (a different file from the same directory) from b, things change drastically. a looks like this: Disks md0 ada0 ada1 ada2 ada3 da0 da1 intrn KB/t 0,00 83,00 64,00 64,00 64,00 0,00 0,00 6547524 wire tps 0 27 469 456 472 0 0 744768 act MB/s 0,00 2,16 29,32 28,49 29,50 0,00 0,00 2722100 inact %busy 0 1 13 13 13 0 0 108 cache and b like this: Disks md0 ada0 da0 da1 da2 da3 da4 intrn KB/t 0,00 15,46 0,00 114 0,00 116 112 4627944 wire tps 0 45 0 189 0 192 160 130376 act MB/s 0,00 0,68 0,00 20,98 0,00 21,74 17,45 7308284 inact %busy 0 81 0 19 0 37 28 145200 cache ada0 is no longer the bottleneck and the copy is over almost instantly. > What is the average latency between the two machines? ping-ing b from a: round-trip min/avg/max/stddev = 0.137/0.156/0.178/0.015 ms ping-ing a from b: round-trip min/avg/max/stddev = 0.114/0.169/0.220/0.036 ms On 03.01.2016 11:09, Bob Friesenhahn wrote: > The most likely issue is a latency problem with synchronous writes on > 'b'. The main pool disks seem to be working ok. Make sure that the > SSD you are using for slog is working fine. Maybe it is abnormally slow. Why would the same ZFS -- with the same slog -- be working faster, when written to locally, than when over NFS? -mi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56895E2E.8060405>