Date: Fri, 1 Dec 1995 14:02:03 +0100 (MET) From: Luigi Rizzo <luigi@labinfo.iet.unipi.it> To: hackers@freebsd.org Subject: A proposal for selective acks Message-ID: <199512011302.OAA09790@labinfo.iet.unipi.it>
next in thread | raw e-mail | index | archive | help
I posted this recently on comp.protocols.tcp.ip I believe this is something that could be implemented without much difficulty in FreeBSD (possibly in -current), and almost without side effects to the rest of the kernel. I would appreciate receiving opinions from you guys. It appears that selective acks (SACKs) were a popular subject 5-10 years ago, to the point that NETBLT [RFC998] included them, and there was even a proposal for TCP [RFC1072]. I have a proposal for an efficient encoding of the SACK option described in RFC1072. In the following I assume that the reader has read the section of RFC1072 dealing with selective ACKS (which is a nice reading, as most papers from Van Jacobson). Comments are solicited and welcome. Luigi ----------- The proposal for SACKs by Van Jacobson [RFC1072] is very clear and detailed about a possible implementation, to the point that one wonders why they haven't been actually included in RFC1323 and implemented. A possible explaination is that the SACK option, as specified in RFC1072, might suffer from the limited amount of space available for TCP options. This only enables to selectively ack a limited number of out-of-sequence segments, thus reducing the effectiveness of the mechanism. The problem comes from the need of passing both the relative origin and the size of each segment, totalling 4 bytes per segment. However, consider the typical case in which SACKs would be useful, i.e. the transfer of large blocks of data. In these conditions, the sender should mostly send maximum-sized segments. Now look at the reassembly queue at the receiving station: starting from the first unacked byte, we have a sequence of holes (H) and segments (S), starting with a hole and ending with a segment, as below. HHH SSSS HH SS HH SS HHHH SSSS HH SSS What I expect to see (have not verified, though) is a sequence of Holes-Segments whose size is generally a multiple of the MSS in use for the connection. The only exceptions could possibly be first hole (because of some odd-length communication before the large block of data begins), and the last segment (which may contain less than an MSS because no more data are available). If the above is true, a SACK option could comprise the following fields: h1 the size of the first hole. The hole starts at the point indicated by the ACK field in the header. s1 the size of the last segment ss the GCD of all the remaining segments/holes. This should usually represent the MSS in use. b[] a bitmap indicating which segments (of size ss) are present or not after the first hole lb the number of elements in the bitmap b[]. Actually, only 3 bits are needed for lb because the option length could be used to derive the size of b[] in bytes. Following the description of the SACK option in RFC1072, we could represent h1, s1, ss as 16-bit integers, and use 2 bytes for option kind+len. This would leave up to (44-8)=36 bytes for b[]+lb, i.e. room for over 250 segments. Even if some room is taken by other options (e.g. timestamps), there is still sufficient room to deal with large windows. Note that only ones in b[] are meaningful, so the receiver can deal with the following situations by sending several SACK with different values for h1: * the send window is very large, and the bitmap is not large enough; * for some reason (the sender changes to a new MSS which is prime wrt the previous one; or the sender is sending very small segments, etc.) ss becomes very small. As a matter of fact, h1 could be thought of as a "skip" field, thus it might be necessary to encode it with a larger number of bits. Maybe the data in b[] can be compressed further, especially if the losses are not very large (in this case runs of '1' could be encoded efficiently, as in FAX documenta). If the losses become large, though, I suspect that there is little chance for compression. --------- ==================================================================== Luigi Rizzo Dip. di Ingegneria dell'Informazione email: luigi@iet.unipi.it Universita' di Pisa tel: +39-50-568533 via Diotisalvi 2, 56126 PISA (Italy) fax: +39-50-568522 http://www.iet.unipi.it/~luigi/ ====================================================================
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199512011302.OAA09790>