Date: Tue, 25 Jan 2000 21:01:08 -0500 (EST) From: dwm@caida.org To: FreeBSD-gnats-submit@freebsd.org Subject: kern/16360: kernel timestamping of ICMP echo requests and replies Message-ID: <200001260201.VAA67885@arthur.caida.org>
next in thread | raw e-mail | index | archive | help
>Number: 16360 >Category: kern >Synopsis: kernel timestamping of ICMP echo requests and replies >Confidential: no >Severity: non-critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: change-request >Submitter-Id: current-users >Arrival-Date: Tue Jan 25 18:10:01 PST 2000 >Closed-Date: >Last-Modified: >Originator: Daniel McRobb >Release: FreeBSD 3.4-STABLE i386 >Organization: CAIDA >Environment: >Description: Patches to put timestamps in payload of ICMP echo requests and replies if a new socket option (IP_PINGTIMESTAMP) is enabled. The socket option (IP_PINGTIMESTAMP) causes the first 12 bytes of the data portion of an echo packet to be filled with timestamps by the kernel. The 12 bytes is divided into 3 microsecond timestamps in network byte order. The first timestamp is a reception time in microseconds (the time at which an echo reply was received). The second timestamp is a transmit timestamp (the time at which an echo reply was sent) in microseconds. The third timestamp is intended to contain the transit round trip time, but is currently not filled by the kernel. You can calculate the transit time by subtracting the transmit timestamp from the reception timestamp. The main reason for this option: reduction of time variance caused by context switching and scheduling when making round-trip measurements and timestamping from user space. The symptoms from these causes are very significant in a LAN environment (I've seen upwards of 30% differences in RTT using this option vs. the 'normal' method of timestamping in user space). In a WAN environment, the difference in RTT between using this option and using user space timestamping is not horribly significant for a single measurement but rears its ugly head when trying to measure variance over time-series data and/or under heavy process load. There are 2 reasons it's implemented as payload modifications. The first is that it doesn't require additional state in applications (unlike solutions involving auxiliary data in calls to sendmsg() and/or recvmsg(), such as our current SO_TIMESTAMP option). This is significant for applications that use active ICMP measurements to many destinations (very common in network management systems, where ICMP pollers may be querying 10s of thousands or more destinations). A second is that it permits one-way transit time measurements if both transmit and receive hosts have the option available and have synchronized clocks. Side-effects ------------ This is a fairly trivial kernel change, and should not affect applications which do not enable the socket option (on the receive side, we timestamp after m_pullup()). I've been using it for over 2 years now (dating back to 2.2.5-R). Files Changed ------------- /sys/netinet/in.h /sys/netinet/in_pcb.h /sys/netinet/ip_icmp.h /sys/netinet/raw_ip.c /usr/include/netinet/in.h /usr/include/netinet/in_pcb.h /usr/include/netinet/ip_icmp.h More Information ---------------- See: http://www.caida.org/Tools/Skitter/skping/index.html http://www.caida.org/Tools/Skitter/index.html >How-To-Repeat: >Fix: Here's a shar of the patches and a Makefile. These were done against 3.3-RC but work on 3.4-stable as of 1/25/2000. # This is a shell archive. Save it in a file, remove anything before # this line, and then unpack it by entering "sh file". Note, it may # create directories; files and directories will be owned by you and # have default permissions. # # This archive contains: # # Makefile # in.h.patch # in_pcb.h.patch # ip_icmp.h.patch # raw_ip.c.patch # echo x - Makefile sed 's/^X//' >Makefile << 'END-of-Makefile' XKERNTARGETS = /sys/netinet/in.h /sys/netinet/in_pcb.h \ X /sys/netinet/ip_icmp.h /sys/netinet/raw_ip.c XUSERTARGETS = /usr/include/netinet/in.h /usr/include/netinet/in_pcb.h \ X /usr/include/netinet/ip_icmp.h X Xall: ${KERNTARGETS} ${USERTARGETS} X X/sys/netinet/in.h:: X patch /sys/netinet/in.h ./in.h.patch X X/sys/netinet/in_pcb.h:: X patch /sys/netinet/in_pcb.h ./in_pcb.h.patch X X/sys/netinet/ip_icmp.h:: X patch /sys/netinet/ip_icmp.h ./ip_icmp.h.patch X X/sys/netinet/raw_ip.c:: X patch /sys/netinet/raw_ip.c ./raw_ip.c.patch X X/usr/include/netinet/in.h:: X patch /usr/include/netinet/in.h ./in.h.patch X X/usr/include/netinet/in_pcb.h:: X patch /usr/include/netinet/in_pcb.h ./in_pcb.h.patch X X/usr/include/netinet/ip_icmp.h:: X patch /usr/include/netinet/ip_icmp.h ./ip_icmp.h.patch X Xundo:: X if [ -f /sys/netinet/in.h.orig ]; then \ X mv /sys/netinet/in.h.orig /sys/netinet/in.h ; fi X if [ -f /sys/netinet/in_pcb.h.orig ]; then \ X mv /sys/netinet/in_pcb.h.orig /sys/netinet/in_pcb.h ; fi X if [ -f /sys/netinet/ip_icmp.h.orig ]; then \ X mv /sys/netinet/ip_icmp.h.orig /sys/netinet/ip_icmp.h ; fi X if [ -f /sys/netinet/raw_ip.c.orig ]; then \ X mv /sys/netinet/raw_ip.c.orig /sys/netinet/raw_ip.c ; fi X if [ -f /usr/include/netinet/in.h.orig ]; then \ X mv /usr/include/netinet/in.h.orig /usr/include/netinet/in.h ; fi X if [ -f /usr/include/netinet/in_pcb.h.orig ]; then \ X mv /usr/include/netinet/in_pcb.h.orig /usr/include/netinet/in_pcb.h ; fi X if [ -f /usr/include/netinet/ip_icmp.h.orig ]; then \ X mv /usr/include/netinet/ip_icmp.h.orig /usr/include/netinet/ip_icmp.h ; fi X X END-of-Makefile echo x - in.h.patch sed 's/^X//' >in.h.patch << 'END-of-in.h.patch' X*** in.h.orig Mon Sep 13 18:19:20 1999 X--- in.h Mon Sep 13 18:50:44 1999 X*************** X*** 328,337 **** X--- 328,339 ---- X #define IP_DUMMYNET_CONFIGURE 60 /* add/configure a dummynet pipe */ X #define IP_DUMMYNET_DEL 61 /* delete a dummynet pipe from chain */ X #define IP_DUMMYNET_FLUSH 62 /* flush dummynet */ X #define IP_DUMMYNET_GET 64 /* get entire dummynet pipes */ X X+ #define IP_PINGTIMESTAMP 99 /* bool; Add time stamp in echo packet */ X+ X /* X * Defaults and limits for options X */ X #define IP_DEFAULT_MULTICAST_TTL 1 /* normally limit m'casts to 1 hop */ X #define IP_DEFAULT_MULTICAST_LOOP 1 /* normally hear sends if a member */ END-of-in.h.patch echo x - in_pcb.h.patch sed 's/^X//' >in_pcb.h.patch << 'END-of-in_pcb.h.patch' X*** in_pcb.h.orig Mon Sep 13 19:26:16 1999 X--- in_pcb.h Mon Sep 13 19:27:07 1999 X*************** X*** 138,147 **** X--- 138,148 ---- X #define INP_HIGHPORT 0x10 /* user wants "high" port binding */ X #define INP_LOWPORT 0x20 /* user wants "low" port binding */ X #define INP_ANONPORT 0x40 /* port chosen for user */ X #define INP_RECVIF 0x80 /* receive incoming interface */ X #define INP_MTUDISC 0x100 /* user can do MTU discovery */ X+ #define INP_PINGTIMESTAMP 0x1000 /* time stamp ICMP echo packets */ X #define INP_CONTROLOPTS (INP_RECVOPTS|INP_RECVRETOPTS|INP_RECVDSTADDR|\ X INP_RECVIF) X X #define INPLOOKUP_WILDCARD 1 X END-of-in_pcb.h.patch echo x - ip_icmp.h.patch sed 's/^X//' >ip_icmp.h.patch << 'END-of-ip_icmp.h.patch' X*** ip_icmp.h.orig Mon Sep 13 19:28:56 1999 X--- ip_icmp.h Mon Sep 13 19:29:32 1999 X*************** X*** 180,189 **** X--- 180,198 ---- X (type) == ICMP_ROUTERADVERT || (type) == ICMP_ROUTERSOLICIT || \ X (type) == ICMP_TSTAMP || (type) == ICMP_TSTAMPREPLY || \ X (type) == ICMP_IREQ || (type) == ICMP_IREQREPLY || \ X (type) == ICMP_MASKREQ || (type) == ICMP_MASKREPLY) X X+ #define ICMP_LEN(t) (((t) == ICMP_TSTAMP || (t) == ICMP_TSTAMPREPLY)? \ X+ ICMP_TSLEN: ICMP_MINLEN) X+ X+ struct ping_ip { X+ n_time ipping_rcv; X+ n_time ipping_snd; X+ n_time ipping_trans; X+ }; X+ X #ifdef KERNEL X void icmp_error __P((struct mbuf *, int, int, n_long, struct ifnet *)); X void icmp_input __P((struct mbuf *, int)); X #endif X END-of-ip_icmp.h.patch echo x - raw_ip.c.patch sed 's/^X//' >raw_ip.c.patch << 'END-of-raw_ip.c.patch' X*** raw_ip.c.orig Mon Sep 13 19:31:53 1999 X--- raw_ip.c Mon Sep 13 19:38:58 1999 X*************** X*** 52,61 **** X--- 52,62 ---- X X #define _IP_VHL X #include <netinet/in.h> X #include <netinet/in_systm.h> X #include <netinet/ip.h> X+ #include <netinet/ip_icmp.h> X #include <netinet/in_pcb.h> X #include <netinet/in_var.h> X #include <netinet/ip_var.h> X #include <netinet/ip_mroute.h> X X*************** X*** 79,88 **** X--- 80,131 ---- X * Nominal space allocated to a raw ip socket. X */ X #define RIPSNDQ 8192 X #define RIPRCVQ 8192 X X+ /* Dirty (?) hack to add a time stamp on ICMP echo reply packets. X+ * PING_TSADD_RCV(struct ip *ip, struct mbuf *m, struct socket *last) X+ * PING_TSADD_SND(struct ip *ip, struct mbuf *m, struct inpcb *inp); X+ * May null m if m_pullup() fails. (ip)->ip_len does not include the IP header X+ * when receiving. X+ */ X+ u_long pingtime(); X+ X+ struct mbuf *ping_setsnd(); X+ X+ X+ #ifdef IP_PINGTIMESTAMP X+ #define PING_ICMPSIZE(ip) ICMP_LEN(((struct icmp *)((ip) + 1))->icmp_type) X+ #define PING_SIZE(ip) (PING_ICMPSIZE(ip) + sizeof(struct ping_ip)) X+ #define PING_TSADD_RCV(ip, m, so) do { \ X+ if ((so) && \ X+ ((so)->inp_flags & INP_PINGTIMESTAMP) && \ X+ (ip)->ip_p == IPPROTO_ICMP && (ip)->ip_len >= PING_SIZE(ip) && \ X+ ((m) = m_pullup((m), ((ip->ip_vhl & 0x0f) << 2) + PING_SIZE(ip)))) { \ X+ struct icmp *__icmp; \ X+ (ip) = mtod((m), struct ip *); \ X+ __icmp = (struct icmp *)(mtod(m, char *) + (((ip)->ip_vhl & 0x0f) << 2)); \ X+ if (__icmp->icmp_type == ICMP_ECHOREPLY || \ X+ __icmp->icmp_type == ICMP_ECHO) { \ X+ ((struct ping_ip *)(mtod(m, char *)+(((ip)->ip_vhl & 0x0f) << 2) \ X+ + PING_ICMPSIZE(ip)))->ipping_rcv = pingtime(); \ X+ } \ X+ } \ X+ } while(0) X+ X+ #define PING_TSADD_SND(ip, m, inp) do { \ X+ if (((inp)->inp_flags & INP_PINGTIMESTAMP) && \ X+ (ip)->ip_p == IPPROTO_ICMP && \ X+ (ip)->ip_len >= sizeof(struct ip) + PING_SIZE(ip)) { \ X+ (m) = ping_setsnd((m)); \ X+ } \ X+ } while(0) X+ #else X+ #define PING_TSADD_RCV(ip, m, so) X+ #define PING_TSADD_SND(ip, m, inp) X+ #endif X+ X /* X * Raw interface to IP protocol. X */ X X /* X*************** X*** 131,140 **** X--- 174,184 ---- X inp->inp_faddr.s_addr != ip->ip_src.s_addr) X continue; X if (last) { X struct mbuf *n = m_copy(m, 0, (int)M_COPYALL); X if (n) { X+ PING_TSADD_RCV(ip, n, last); X if (last->inp_flags & INP_CONTROLOPTS || X last->inp_socket->so_options & SO_TIMESTAMP) X ip_savecontrol(last, &opts, ip, n); X if (sbappendaddr(&last->inp_socket->so_rcv, X (struct sockaddr *)&ripsrc, n, X*************** X*** 149,158 **** X--- 193,203 ---- X } X } X last = inp; X } X if (last) { X+ PING_TSADD_RCV(ip, m, last); X if (last->inp_flags & INP_CONTROLOPTS || X last->inp_socket->so_options & SO_TIMESTAMP) X ip_savecontrol(last, &opts, ip, m); X if (sbappendaddr(&last->inp_socket->so_rcv, X (struct sockaddr *)&ripsrc, m, opts) == 0) { X*************** X*** 219,228 **** X--- 264,274 ---- X ip->ip_id = htons(ip_id++); X /* XXX prevent ip_output from overwriting header fields */ X flags |= IP_RAWOUTPUT; X ipstat.ips_rawout++; X } X+ PING_TSADD_SND(ip, m, inp); X return (ip_output(m, inp->inp_options, &inp->inp_route, flags, X inp->inp_moptions)); X } X X /* X*************** X*** 301,310 **** X--- 347,369 ---- X inp->inp_flags |= INP_HDRINCL; X else X inp->inp_flags &= ~INP_HDRINCL; X break; X X+ #ifdef IP_PINGTIMESTAMP X+ case IP_PINGTIMESTAMP: X+ error = sooptcopyin(sopt, &optval, sizeof optval, X+ sizeof optval); X+ if (error) X+ break; X+ if (optval) X+ inp->inp_flags |= INP_PINGTIMESTAMP; X+ else X+ inp->inp_flags &= ~INP_PINGTIMESTAMP; X+ break; X+ #endif X+ X #ifdef COMPAT_IPFW X case IP_FW_ADD: X case IP_FW_DEL: X case IP_FW_FLUSH: X case IP_FW_ZERO: X*************** X*** 655,659 **** X--- 714,768 ---- X pru_connect2_notsupp, in_control, rip_detach, rip_disconnect, X pru_listen_notsupp, in_setpeeraddr, pru_rcvd_notsupp, X pru_rcvoob_notsupp, rip_send, pru_sense_null, rip_shutdown, X in_setsockaddr, sosend, soreceive, sopoll X }; X+ X+ u_long pingtime() X+ { X+ struct timeval atv; X+ u_long t; X+ X+ microtime(&atv); X+ t = atv.tv_sec * 1000000 + atv.tv_usec; X+ return (htonl(t)); X+ } X+ X+ /* Set time stamp in ICMP ping packet. At this stage ip->ip_hl is not set. The X+ * ip header lenght is the default one. X+ */ X+ struct mbuf * X+ ping_setsnd(m) X+ struct mbuf *m; X+ { X+ struct icmp *icmp; X+ struct ip *ip = mtod(m, struct ip *); X+ X+ /* Put the IP + ICMP + pingTS header in same mbuf X+ */ X+ if (!(m = m_pullup(m, sizeof(struct ip) + PING_SIZE(ip)))) X+ return(NULL); X+ X+ ip = mtod(m, struct ip *); X+ (char *)icmp = (char *)ip + sizeof(struct ip); X+ X+ /* We want to put the timestamp on ECHO packets only. X+ */ X+ if (icmp->icmp_type != ICMP_ECHOREPLY && icmp->icmp_type != ICMP_ECHO) X+ return(m); X+ X+ /* Set time stamp. X+ */ X+ ((struct ping_ip *)((char *)icmp + ICMP_LEN(icmp->icmp_type)))->ipping_snd = pingtime(); X+ X+ /* Recompute the ICMP checksum X+ */ X+ m->m_data += sizeof(struct ip); X+ m->m_len -= sizeof(struct ip); X+ icmp->icmp_cksum = 0; X+ icmp->icmp_cksum = in_cksum(m, ip->ip_len - sizeof(struct ip)); X+ m->m_data -= sizeof(struct ip); X+ m->m_len += sizeof(struct ip); X+ X+ return(m); X+ } X+ END-of-raw_ip.c.patch exit --UAA67605.948851702/arthur.caida.org-- >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200001260201.VAA67885>