From owner-freebsd-hackers Sun Apr 14 19:09:46 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id TAA06200 for hackers-outgoing; Sun, 14 Apr 1996 19:09:46 -0700 (PDT) Received: from whizzo.transsys.com (whizzo.TransSys.COM [144.202.42.10]) by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id TAA06184 for ; Sun, 14 Apr 1996 19:09:38 -0700 (PDT) Received: from localhost.transsys.com (localhost.transsys.com [127.0.0.1]) by whizzo.transsys.com (8.7.5/8.7.3) with SMTP id WAA06890 for ; Sun, 14 Apr 1996 22:09:33 -0400 (EDT) Message-Id: <199604150209.WAA06890@whizzo.transsys.com> X-Authentication-Warning: whizzo.transsys.com: Host localhost.transsys.com [127.0.0.1] didn't use HELO protocol To: hackers@freebsd.org From: "Louis A. Mamakos" Subject: new socket option for timestamps, plus a "bug" fix Date: Sun, 14 Apr 1996 22:09:32 -0400 Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Based on the discussion on the mailing list last week regarding SIGIO and why you'd use it for different applications, I decided to reimplement the code I had put into 4.3BSD-tahoe some years ago to add a timestamp socket option. That is, on a UDP socket, you can enable a timestamp to be associated with a message as it's queued to the socket buffer. The code to do this was rather simple. Here's the diffs, to a recent FreeBSD-current kernel. What this does is retain a timestamp gathered from microtime() which can be returned as control information by the user using the recvmsg() system call. The returned data in the control info buffer is a struct cmsghdr followed by a struct timeval. Index: sys/sys/socket.h =================================================================== RCS file: /usr/local/FreeBSD/cvs/src/sys/sys/socket.h,v retrieving revision 1.10 diff -u -r1.10 socket.h --- socket.h 1996/02/07 16:19:02 1.10 +++ socket.h 1996/04/09 03:06:21 @@ -63,6 +63,7 @@ #define SO_LINGER 0x0080 /* linger on close if data present */ #define SO_OOBINLINE 0x0100 /* leave received OOB data in line */ #define SO_REUSEPORT 0x0200 /* allow local address & port reuse */ +#define SO_TIMESTAMP 0x0400 /* timestamp received dgram traffic */ /* * Additional options, not kept in so_options. @@ -296,6 +297,7 @@ /* "Socket"-level control message types: */ #define SCM_RIGHTS 0x01 /* access rights (array of int) */ +#define SCM_TIMESTAMP 0x02 /* timestamp (struct timeval) */ /* * 4.3 compat sockaddr, move to compat file later Index: sys/sys/netinet/udp_usrreq.c =================================================================== RCS file: /usr/local/FreeBSD/cvs/src/sys/netinet/udp_usrreq.c,v retrieving revision 1.21 diff -u -r1.21 udp_usrreq.c --- udp_usrreq.c 1996/04/04 10:46:44 1.21 +++ udp_usrreq.c 1996/04/09 04:13:28 @@ -95,6 +95,9 @@ struct mbuf *)); static void udp_notify __P((struct inpcb *, int)); static struct mbuf *udp_saveopt __P((caddr_t, int, int)); +#if defined(SO_TIMESTAMP) && defined(SCM_TIMESTAMP) +static struct mbuf *udp_timestamp __P((void)); +#endif void udp_init() @@ -300,9 +303,20 @@ */ udp_in.sin_port = uh->uh_sport; udp_in.sin_addr = ip->ip_src; - if (inp->inp_flags & INP_CONTROLOPTS) { + if (inp->inp_flags & INP_CONTROLOPTS +#if defined(SO_TIMESTAMP) && defined(SCM_TIMESTAMP) + || inp->inp_socket->so_options & SO_TIMESTAMP +#endif + ) { struct mbuf **mp = &opts; +#if defined(SO_TIMESTAMP) && defined(SCM_TIMESTAMP) + if (inp->inp_socket->so_options & SO_TIMESTAMP) { + if (*mp = udp_timestamp()) + mp = &(*mp)->m_next; + } +#endif + if (inp->inp_flags & INP_RECVDSTADDR) { *mp = udp_saveopt((caddr_t) &ip->ip_dst, sizeof(struct in_addr), IP_RECVDSTADDR); @@ -367,6 +381,29 @@ cp->cmsg_type = type; return (m); } + +#if defined(SO_TIMESTAMP) && defined(SCM_TIMESTAMP) +static struct mbuf * +udp_timestamp() +{ + register struct cmsghdr *cp; + struct mbuf *m; + struct timeval tv; + + MGET(m, M_DONTWAIT, MT_CONTROL); + if (m == 0) + return (struct mbuf *) 0; + + microtime(&tv); + cp = (struct cmsghdr *) mtod(m, struct cmsghdr *); + cp->cmsg_len = + m->m_len = sizeof(*cp) + sizeof(struct timeval); + cp->cmsg_level = SOL_SOCKET; + cp->cmsg_type = SCM_TIMESTAMP; + (void) memcpy(CMSG_DATA(cp), &tv, sizeof(struct timeval)); + return (m); +} +#endif /* defined(SO_TIMESTAMP) && defined(SCM_TIMESTAMP) */ /* * Notify a udp user of an asynchronous error; Just One Ugly Thing: It's really distasteful to put this socket option implemention into netinet/udp_usrreq.c; it really is a socket-level option and not specific only to UDP sockets. Logically, it belongs in kern/uipc_socket2.c which is where the data actually gets queued to the sockbuf. The problem though, is that the functions there all get called with a "struct sockbuf *", and the function isn't able to find the enclosing socket structure to look at the socket options which are enabled. This feels, somehow, like a layering/API kinda problem, but it's much less clear how to fix this, if you do at all. There's no reason why this shouldn't also work on AF_UNIX, er, AF_LOCAL flavored sockets without having to reimplement the code there. What I noticed in poking around the code is that it seems to only be possible to return a single element of control information using the recvmsg() system call. It seems to have been intended to accumulate a number of distinct entities; for example, look at netinet/udp_usrreq.c where multiple mbufs can be queued up depending on which socket options are turned off. Some of this code is #ifdef'ed out at the moment. If you look at the code in sys/kern/uipc_socket2.c (in the sbappendaddr() function, for example), you'll see that you can queue more than one mbuf of control information. And looking at sys/kern/uipc_socket.c at so, multiple mbufs of control information are almost lovingly extracted from the sockbuf in soreceive() to be returned to the caller. However, in sys/kern/uipc_syscalls.c, only the first of these was ever actually extracted and returned to the user; the rest just get released. This didn't seem to be "correct", so I whacked on that code to support returning multiple mbufs of control information into the user's buffer (since they are self-describing in length, etc.). Index: sys/kern/uipc_syscalls.c =================================================================== RCS file: /usr/local/FreeBSD/cvs/src/sys/kern/uipc_syscalls.c,v retrieving revision 1.16 diff -u -r1.16 uipc_syscalls.c --- uipc_syscalls.c 1996/03/11 15:37:33 1.16 +++ uipc_syscalls.c 1996/04/10 05:29:08 @@ -636,7 +636,8 @@ register struct iovec *iov; register int i; int len, error; - struct mbuf *from = 0, *control = 0; + struct mbuf *m, *from = 0, *control = 0; + caddr_t ctlbuf; #ifdef KTRACE struct iovec *ktriov = NULL; #endif @@ -735,17 +736,29 @@ } #endif len = mp->msg_controllen; - if (len <= 0 || control == 0) - len = 0; - else { - if (len >= control->m_len) - len = control->m_len; - else + m = control; + mp->msg_controllen = 0; + ctlbuf = (caddr_t) mp->msg_control; + + while (m && len > 0) { + unsigned int tocopy; + + if (len >= m->m_len) + tocopy = m->m_len; + else { mp->msg_flags |= MSG_CTRUNC; - error = copyout((caddr_t)mtod(control, caddr_t), - (caddr_t)mp->msg_control, (unsigned)len); + tocopy = len; + } + + if (error = copyout((caddr_t)mtod(m, caddr_t), + ctlbuf, tocopy)) + goto out; + + ctlbuf += tocopy; + len -= tocopy; + m = m->m_next; } - mp->msg_controllen = len; + mp->msg_controllen = ctlbuf - mp->msg_control; } out: if (from) Anyway, this code has been running on my box for a few days, and seems to be pretty happy. I've modified the ntptrace program in a recent version of the xntp 3.5c distribution as a test case, and intend to modify xntpd in that version to use this code. So far, so good. While we can debate the merits of the timestamp socket option, I think there is a genuine bug (or at least misimplementation) in not being able to return more than one chunk of control message info with the recvmsg system call. Is there any interest importing this into the source tree? If so, I'll be happy to pass along further changes to things like xntpd which take advantage of this code. Actually, I'd like to see a more recent version of xntp imported as well, but that's another story.. Louis Mamakos