Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 27 May 2022 13:30:15 +0200
From:      Jan Bramkamp <crest@rlwinm.de>
To:        freebsd-fs@freebsd.org
Subject:   Re: zfs/nfsd performance limiter
Message-ID:  <cb3ad0a7-12e8-5cd2-3fcb-490344ad6ea1@rlwinm.de>
In-Reply-To: <YQBPR0101MB9742056AFEF03C6CAF2B7F56DDD59@YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM>
References:  <CAJwHY9WMOOLy=rb9FNjExQtYej21Zv=Po9Cbg=19gkw1SLFSww@mail.gmail.com> <YonqGfJST09cUV6W@FreeBSD.org> <CAJwHY9W-3eEXR%2BjTw40thcio65Ukjw8qgnp-qPiS3bdeZS0kLw@mail.gmail.com> <YQBPR0101MB97429323AD5F921BE76C613EDDD59@YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM> <CAJwHY9WHE4MFScuhry7v9MqRQBSTNY5XYCH5qfO4xEn6Swwtrw@mail.gmail.com> <YQBPR0101MB9742056AFEF03C6CAF2B7F56DDD59@YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
On 23.05.22 00:26, Rick Macklem wrote:
> Adam Stylinski <kungfujesus06@gmail.com> wrote:
> [stuff snipped]
>> However, in general, RPC RTT will define how well NFS performs and not
>> the I/O rate for a bulk file read/write.
> Lets take this RPC RTT thing a step further...
> - If I got the math right, at 40Gbps, 1Mbyte takes about 200usec on the wire.
> Without readahead, the protocol looks like this:
> Client                                     Server (time going down the screen)
>          small Read request --->
>          <-- 1Mbyte reply
>          small Read request -->
>          <-- 1Mbyte reply
> The 1Mbyte replies take 200usec on the wire.
>
> Then suppose your ping time is 400usec (I see about 350usec on my little lan).
> - The wire is only transferring data about half of the time, because the small
>    request message takes almost as long as the 1Mbyte reply.
>
> As you can see, readahead (where multiple reads are done concurrently)
> is critical for this case. I have no idea how Linux decides to do readahead.
> (FreeBSD defaults to 1 readahead, with a mount option that can increase
>   that.)
>
> Now, net interfaces normally do interrupt  moderation. This is done to
> avoid an interrupt storm during bulk data transfer. However, interrupt
> moderation results in interrupt delay for handling the small Read request
> message.
> --> Interrupt moderation can increase RPC RTT. Turning it off, if possible,
>        might help.
>
> So, ping the server from the client to see what your RTT roughly is.
> Also, you could look at some traffic in wireshark, to see what readahead
> is happening and what the RPC RTT is.
> (You can capture with "tcpdump", but wireshark knows how to decode
>   NFS properly.)
>
> As you can see, RPC traffic is very different from bulk data transfer.
Would it make sense to extend nconnect to apply different QoS markings 
to the control connection and the bulk connections to prioritize 
small(ish) RPC calls over the bulk transfer RPCs? Failing that is it 
possible to connect to the NFS server through different addresses for 
small RPC and large RPCs to use different NICs and switch ports?



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?cb3ad0a7-12e8-5cd2-3fcb-490344ad6ea1>