Date: Thu, 28 Jan 2016 15:24:29 -0500 From: Andrew Gallatin <gallatin@netflix.com> To: transport@freebsd.org Subject: xmit_more / packet batching Message-ID: <56AA78FD.1010007@netflix.com>
next in thread | raw e-mail | index | archive | help
I brought up packet batching in today's meeting, and mentioned Mellanox asking about a feature like xmit_more in Linux. To recap, "xmit_more" is a flag to an skb in linux that the stack can use to tell the driver that there are more packets coming immediately, and so it is allowed to delay writing any doorbells to notify the NIC about the transmission. This offers a fairly large amount of savings, and it avoids mmio access to the NIC doorbells, and (in Mellanox's case) can be used to reduce transmit completions. There is a description in more detail of how it works in linux at http://netoptimizer.blogspot.com/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html (as I said .. benchmarks .. ) It looks like they are bulking up things as they come out of their qdisc layer, above the drivers. Note that they have a fairly sophisticated set of qdiscs that are usable with modern, multi-queue drivers, as well as centralized software queuing that all drivers use. Without properly adding all those layers, the simplest thing to do would be to just have tcp_output (and other proto output routines) set a similar flag when they are looping, and sending down mbufs to ip_output. The problem with that is simply that (at least for our internet facing workloads at netflix), sending more data on a socket than the tso max seg size is quite rare. (see appended dtrace output from a machine serving ~90K connections at ~85Gb/s) What is unclear to me is whether or not linux would see any benefit from xmit more in our sort of workload. My guess is that it would not, as the qdiscs will not be delaying things, and they'll still be mostly limited to client ack packing, so will also rarely be sending more than a TSO max seg size. Drew c094.ord001.dev# dtrace -s ./xmit_len.d dtrace: script './xmit_len.d' matched 1 probe ^C 0 value ------------- Distribution ------------- count < 1514 |@@@ 1454289 1514 |@@@@@@@@ 4601582 2514 |@@@@@@ 3529106 3514 |@@@@@ 2902881 4514 | 150322 5514 |@@@ 1433990 6514 |@ 722206 7514 | 121788 8514 |@ 855718 9514 |@ 441330 10514 | 86424 11514 |@ 742827 12514 |@ 405603 13514 | 33753 14514 |@ 432492 15514 |@ 451240 16514 |@ 309429 17514 | 67458 18514 | 243994 19514 | 275286 20514 | 27206 21514 | 198855 22514 |@ 291526 23514 | 20636 24514 | 210524 25514 | 175945 26514 | 13526 27514 | 156973 28514 | 174066 29514 | 115377 30514 | 24194 31514 | 138833 32514 | 104053 33514 | 23385 34514 | 131241 35514 | 92529 36514 | 27125 37514 | 88839 38514 | 76915 39514 | 6554 40514 | 86519 41514 | 70078 42514 | 55289 43514 | 16175 44514 | 74943 45514 | 66529 46514 | 12766 47514 | 66808 48514 | 51397 49514 | 14020 50514 | 46155 51514 | 37579 52514 | 9177 53514 | 39342 54514 | 31170 55514 | 8080 56514 | 34876 57514 | 40574 58514 | 32921 59514 | 6768 60514 | 30252 61514 | 25051 62514 | 5609 63514 | 174059 >= 64514 |@ 445810 0 65516 c094.ord001.dev# cat xmit_len.d fbt::mlx5e_xmit:entry { m = (struct mbuf *)arg1; /* @len[0] = quantize(m->M_dat.MH.MH_pkthdr.len); */ @len[0] = lquantize(m->M_dat.MH.MH_pkthdr.len, 1514, 65000, 1000); @m[0] = max(m->M_dat.MH.MH_pkthdr.len); }
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56AA78FD.1010007>