Date: Sat, 7 Jul 2018 10:08:55 -0400 From: Andrew Gallatin <gallatin@cs.duke.edu> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: "src-committers@freebsd.org" <src-committers@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "svn-src-head@freebsd.org" <svn-src-head@freebsd.org> Subject: Re: svn commit: r335967 - head/sys/dev/mxge Message-ID: <b58d62c9-bb81-f214-f3d0-8f5a388479db@cs.duke.edu> In-Reply-To: <YTOPR0101MB09538327E9FF2BF025485638DD400@YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM> References: <201807050120.w651KP5K045633@pdx.rh.CN85.dnsmgr.net> <97ae3381-7c25-7b41-9670-84b825722f52@cs.duke.edu> <YTOPR0101MB09538327E9FF2BF025485638DD400@YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
On 07/05/18 17:14, Rick Macklem wrote: > Andrew Gallatin wrote: > On 7/4/18 9:20 PM, Rodney W. Grimes wrote: > [stuff snipped] >>> >>> It is using a magic constant twice, where one has a >>> derived value that is dependent on the value of the other. >>> That is bad and error prone and does not document that >>> one depends on the other. Please fix this. Or at least >>> make 65536 a #define so that it only needs changed one >>> place and clearly shows the interdependence of these >>> values. >> >> To me, 65536 is one of the few cases where the magic number is >> more meaningful than a name. But fine, if you feel that >> strongly about it, I'll change it for you. > > Btw, in general, if_hw_tsomax and if_hw_tsomaxsegsize are not > related or the same value. It just happens that they both appear > to be related to 64K in this case. (I believe this is fairly common, > since the original Microsoft "standard" used 64K as a limit, since > it was stored in 16bits.) Yes; exactly. > if_hw_tsomax is the maximum size of the entire TSO segment, > including MAC level headers (commonly 64K, due to Mircosoft... > but could be larger if the hardware guys chose to do so). Given that we do TSO like Linux, and not like MS (meaning we express the size of the pre-segmented packet using the a 16-bit value in the IPv4/IPv6 header), supporting more than 64K is not possible in FreeBSD, so I'm basically saying "nerf this constraint". MS windows does it better / different; they express the size of the pre-segmented packet in packet metadata, leaving ip->ip_len = 0. This is better, since then the pseudo hdr checksum in the template header can be re-used (with the len added) for every segment by the NIC. If you've ever seen a driver set ip->ip_len = 0, and re-calc the pseudo-hdr checksum, that's why. This is also why MS LSOv2 can support TSO of packets larger than 64K, since they're not constrained by the 16-bit value in the IP{4,6} header. The value of TSO larger than 64K is questionable at best though. Without pacing, you'd just get more packets dropped when talking across the internet.. > if_hw_tsomaxsegsize is the maximum size of contiguous memory > that a "chunk" of the TSO segment can be stored in for handling by > the driver's transmit side. Since higher And this is what I object to. TCP should not care about this. Drivers should use busdma, or otherwise be capable of chopping large contig regions down to chunks that they can handle. If a driver can really only handle 2K, then it should be having busdma give it an s/g list that is 2x as long, not having TCP call m_dupcl() 2x as often on page-sized data generated by sendfile (or more on non-x86 with larger pages). > level code such as NFS (and iSCSI, I think?) uses MCLBYTE clusters, > anything 2K or higher normally works the same. Not sure about > sosend(), but I think it also copies the data into MCLBYTE clusters? > This would change if someday jumbo mbuf clusters become the norm. > (I tried changing the NFS code to use jumbo clusters, but it would > result in fragmentation of the memory used for mbuf cluster allocation, > so I never committed it.) At least for sendfile(), vm pages are wrapped up and attached to mbufs, so you have 4K (and potentially much more on non-x86). Doesn't NFS do something similar when sending data, or do you copy into clusters? I have changes which I have not upstreamed yet which enhance mbufs to carry TLS metadata & vector of physical addresses (which I call unmapped mbufs) for sendfile and kernel TLS. As part of that, sosend (for kTLS) can allocate many pages and attach them to one mbuf. The idea (for kTLS) is that you can keep an entire TLS record (with framing information) in a single unmapped mbuf, which saves a huge amount of CPU which would be lost to cache misses doing pointer-chasing of really long mbuf chains (TLS hdrs and trailers are generally 13 and 16 bytes). The goal was to regain CPU during Netflix's transition to https streaming. However, it is unintentionally quite helpful on i386, since it reduces overhead from having to map/unmap sf_bufs. FWIW, these mbufs have been in production at Netflix for over a year, and carry a large fraction of the worlds internet traffic :) > rick > ps: And I'll admit I don't find 65536 very magic;-) > :) Drew
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b58d62c9-bb81-f214-f3d0-8f5a388479db>