Date: Fri, 1 Dec 1995 14:02:03 +0100 (MET) From: Luigi Rizzo <luigi@labinfo.iet.unipi.it> To: hackers@freebsd.org Subject: A proposal for selective acks Message-ID: <199512011302.OAA09790@labinfo.iet.unipi.it>
next in thread | raw e-mail | index | archive | help
I posted this recently on comp.protocols.tcp.ip
I believe this is something that could be implemented without much
difficulty in FreeBSD (possibly in -current), and almost without
side effects to the rest of the kernel. I would appreciate receiving
opinions from you guys.
It appears that selective acks (SACKs) were a popular subject 5-10
years ago, to the point that NETBLT [RFC998] included them, and
there was even a proposal for TCP [RFC1072].
I have a proposal for an efficient encoding of the SACK option
described in RFC1072.
In the following I assume that the reader has read the section of
RFC1072 dealing with selective ACKS (which is a nice reading, as
most papers from Van Jacobson).
Comments are solicited and welcome.
Luigi
-----------
The proposal for SACKs by Van Jacobson [RFC1072] is very clear and
detailed about a possible implementation, to the point that one
wonders why they haven't been actually included in RFC1323 and
implemented.
A possible explaination is that the SACK option, as specified in
RFC1072, might suffer from the limited amount of space available
for TCP options. This only enables to selectively ack a limited
number of out-of-sequence segments, thus reducing the effectiveness
of the mechanism. The problem comes from the need of passing both
the relative origin and the size of each segment, totalling 4 bytes
per segment.
However, consider the typical case in which SACKs would be useful,
i.e. the transfer of large blocks of data. In these conditions,
the sender should mostly send maximum-sized segments. Now look at
the reassembly queue at the receiving station: starting from the
first unacked byte, we have a sequence of holes (H) and segments
(S), starting with a hole and ending with a segment, as below.
HHH SSSS HH SS HH SS HHHH SSSS HH SSS
What I expect to see (have not verified, though) is a sequence of
Holes-Segments whose size is generally a multiple of the MSS in
use for the connection. The only exceptions could possibly be first
hole (because of some odd-length communication before the large
block of data begins), and the last segment (which may contain less
than an MSS because no more data are available).
If the above is true, a SACK option could comprise the following fields:
h1 the size of the first hole. The hole starts at the
point indicated by the ACK field in the header.
s1 the size of the last segment
ss the GCD of all the remaining segments/holes. This
should usually represent the MSS in use.
b[] a bitmap indicating which segments (of size ss) are
present or not after the first hole
lb the number of elements in the bitmap b[]. Actually,
only 3 bits are needed for lb because the option length
could be used to derive the size of b[] in bytes.
Following the description of the SACK option in RFC1072, we could
represent h1, s1, ss as 16-bit integers, and use 2 bytes for option
kind+len. This would leave up to (44-8)=36 bytes for b[]+lb, i.e.
room for over 250 segments. Even if some room is taken by other
options (e.g. timestamps), there is still sufficient room to deal
with large windows.
Note that only ones in b[] are meaningful, so the receiver can deal
with the following situations by sending several SACK with different
values for h1:
* the send window is very large, and the bitmap is not large enough;
* for some reason (the sender changes to a new MSS which is prime
wrt the previous one; or the sender is sending very small segments,
etc.) ss becomes very small.
As a matter of fact, h1 could be thought of as a "skip" field, thus
it might be necessary to encode it with a larger number of bits.
Maybe the data in b[] can be compressed further, especially if the
losses are not very large (in this case runs of '1' could be encoded
efficiently, as in FAX documenta). If the losses become large,
though, I suspect that there is little chance for compression.
---------
====================================================================
Luigi Rizzo Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it Universita' di Pisa
tel: +39-50-568533 via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522 http://www.iet.unipi.it/~luigi/
====================================================================
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199512011302.OAA09790>
