Date: Sat, 10 Jul 2010 08:05:29 -0400 From: Sergey Babkin <babkin@verizon.net> To: hackers@freebsd.org Subject: TCP over UDP Message-ID: <4C386208.291D2FB5@verizon.net>
next in thread | raw e-mail | index | archive | help
Hi guys, I've got this idea, and I wonder if anyone has done it already, and if not then why. The idea is to put the TCP logic over UDP. I've done some googling and all I've found is some academical user-space implementations of TCP that actually try to interoperate with "real" TCP. What I'm thinking about is different. It's to use the TCP-derived logic as a portable library that would do the good flow control, retransmitting, delivery confirmations etc over UDP. Basically, every time you use UDP, you've got to reinvent your own retransmission and reliability protocol. And these protocols are typically no good at all, as the story with NFS switching from UDP to TCP and improving the performance shows. At the same time TCP provides a very good transport control logic, so why not just reuse this logic in a library to solve the UDP issues once and for all? Then of course, why not just use TCP? The problem of TCP is that it's expensive. It uses the kernel memory for its contexts. It also requires a file descriptor per each connection. The file descriptors are an expensive resource, and besides, even if the limit is raised, there is the issue with historic select() fd_set allocating only 1024 bits and nobody checking for the overflow. Even if your own code is carefully designed to avoid using select() at all and/or create large enough bitmasks, it could always happen to use some stupid library that doesn't do that and causes the interesting one-bit memory corruptions. Moving the connection logic to the user space makes the connections cheap. A hundred bytes or so per connection state is no big deal, you can easily create a million of these connections to the same process. All the state stays in the user-space pageable memory. Well, all of them sending data at the same time might not work so well, but caching a large number of currently inactive connections becomes cheap. Think of XMLRPC or SOAP or anything else over HTTP reusing the same TCP connection for multiple sequential requests. Now there is a painful balance of inactivity timeouts: make them too long and you overload the server, make them too short and the connections get dropped all the time. The cheap connections would allow to keep the much longer timeouts. Then there are other interesting possibilities arising from the easy access to the protocol state. The underlying datagramness can be exposed to the top level, and this immediately gives the transactional TCP. Or we could look at the state and find out if the data has been actually delivered to and confirmed by the other side. Or we can even drop the inactive connections at the server without notifying the client. Then if the client sends more requests on this connection, the server could semi-transparently re-establish it (OK, this would require an extension from TCP). Or we can do the better keep-alives, not the TCP's hour-long ones, but something within a few seconds (would not work too well with millions of connections, but it's a different use case where we want to detect the lost peer fast). Or having "sub-channels", each with its own sequence number. If the data gets transferred over 100 parallel logical connections, few bytes at a time for each of them, combining the whole bunch into one datagram would be much more efficient tahn sending 100 datagrams. These are just the ideas off the bat, there's got to be more of these interesting usages. It all looks like such an obviously good idea, that I wonder, why didn't anyone else try it before? Or have they tried it and found that it's not such a good idea after all? -SB
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C386208.291D2FB5>