Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Jan 2000 21:01:08 -0500 (EST)
From:      dwm@caida.org
To:        FreeBSD-gnats-submit@freebsd.org
Subject:   kern/16360: kernel timestamping of ICMP echo requests and replies
Message-ID:  <200001260201.VAA67885@arthur.caida.org>

next in thread | raw e-mail | index | archive | help

>Number:         16360
>Category:       kern
>Synopsis:       kernel timestamping of ICMP echo requests and replies
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jan 25 18:10:01 PST 2000
>Closed-Date:
>Last-Modified:
>Originator:     Daniel McRobb
>Release:        FreeBSD 3.4-STABLE i386
>Organization:
CAIDA
>Environment:

	

>Description:

	Patches to put timestamps in payload of ICMP echo requests and
	replies if a new socket option (IP_PINGTIMESTAMP) is enabled.
	The socket option (IP_PINGTIMESTAMP) causes the first 12 bytes
	of the data portion of an echo packet to be filled with 
	timestamps by the kernel.  The 12 bytes is divided into 3
	microsecond timestamps in network byte order.  The first 
	timestamp is a reception time in microseconds (the time at
	which an echo reply was received).  The second timestamp is a
	transmit timestamp (the time at which an echo reply was sent) in
	microseconds.  The third timestamp is intended to contain the
	transit round trip time, but is currently not filled by the
	kernel.  You can calculate the transit time by subtracting the
	transmit timestamp from the reception timestamp.

	The main reason for this option: reduction of time variance
	caused by context switching and scheduling when making
	round-trip measurements and timestamping from user space.  The
	symptoms from these causes are very significant in a LAN
	environment (I've seen upwards of 30% differences in RTT using
	this option vs. the 'normal' method of timestamping in user
	space).  In a WAN environment, the difference in RTT between
	using this option and using user space timestamping is not
	horribly significant for a single measurement but rears its ugly
	head when trying to measure variance over time-series data and/or
	under heavy process load.

	There are 2 reasons it's implemented as payload modifications.
	The first is that it doesn't require additional state in
	applications (unlike solutions involving auxiliary data in calls
	to sendmsg() and/or recvmsg(), such as our current SO_TIMESTAMP
	option).  This is significant for applications that use active
	ICMP measurements to many destinations (very common in network
	management systems, where ICMP pollers may be querying 10s of
	thousands or more destinations).  A second is that it permits
	one-way transit time measurements if both transmit and receive
	hosts have the option available and have synchronized clocks.

	Side-effects
	------------
	This is a fairly trivial kernel change, and should not affect
	applications which do not enable the socket option (on the
	receive side, we timestamp after m_pullup()).  I've been using it
        for over 2 years now (dating back to 2.2.5-R).

        Files Changed
	-------------
	/sys/netinet/in.h
	/sys/netinet/in_pcb.h
	/sys/netinet/ip_icmp.h
	/sys/netinet/raw_ip.c
	/usr/include/netinet/in.h
	/usr/include/netinet/in_pcb.h
	/usr/include/netinet/ip_icmp.h

	More Information
	----------------
	See:
	  http://www.caida.org/Tools/Skitter/skping/index.html
	  http://www.caida.org/Tools/Skitter/index.html
	
>How-To-Repeat:



>Fix:
	Here's a shar of the patches and a Makefile.  These were done
	against 3.3-RC but work on 3.4-stable as of 1/25/2000.

# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#	Makefile
#	in.h.patch
#	in_pcb.h.patch
#	ip_icmp.h.patch
#	raw_ip.c.patch
#
echo x - Makefile
sed 's/^X//' >Makefile << 'END-of-Makefile'
XKERNTARGETS = /sys/netinet/in.h /sys/netinet/in_pcb.h \
X	      /sys/netinet/ip_icmp.h /sys/netinet/raw_ip.c
XUSERTARGETS = /usr/include/netinet/in.h /usr/include/netinet/in_pcb.h \
X	      /usr/include/netinet/ip_icmp.h
X
Xall: ${KERNTARGETS} ${USERTARGETS}
X
X/sys/netinet/in.h::
X	patch /sys/netinet/in.h ./in.h.patch
X
X/sys/netinet/in_pcb.h::
X	patch /sys/netinet/in_pcb.h ./in_pcb.h.patch
X
X/sys/netinet/ip_icmp.h::
X	patch /sys/netinet/ip_icmp.h ./ip_icmp.h.patch
X
X/sys/netinet/raw_ip.c::
X	patch /sys/netinet/raw_ip.c ./raw_ip.c.patch
X
X/usr/include/netinet/in.h::
X	patch /usr/include/netinet/in.h ./in.h.patch
X
X/usr/include/netinet/in_pcb.h::
X	patch /usr/include/netinet/in_pcb.h ./in_pcb.h.patch
X
X/usr/include/netinet/ip_icmp.h::
X	patch /usr/include/netinet/ip_icmp.h ./ip_icmp.h.patch
X
Xundo::
X	if [ -f /sys/netinet/in.h.orig ]; then \
X	  mv /sys/netinet/in.h.orig /sys/netinet/in.h ; fi
X	if [ -f /sys/netinet/in_pcb.h.orig ]; then \
X	  mv /sys/netinet/in_pcb.h.orig /sys/netinet/in_pcb.h ; fi
X	if [ -f /sys/netinet/ip_icmp.h.orig ]; then \
X	  mv /sys/netinet/ip_icmp.h.orig /sys/netinet/ip_icmp.h ; fi
X	if [ -f /sys/netinet/raw_ip.c.orig ]; then \
X	  mv /sys/netinet/raw_ip.c.orig /sys/netinet/raw_ip.c ; fi
X	if [ -f /usr/include/netinet/in.h.orig ]; then \
X	  mv /usr/include/netinet/in.h.orig /usr/include/netinet/in.h ; fi
X	if [ -f /usr/include/netinet/in_pcb.h.orig ]; then \
X	  mv /usr/include/netinet/in_pcb.h.orig /usr/include/netinet/in_pcb.h ; fi
X	if [ -f /usr/include/netinet/ip_icmp.h.orig ]; then \
X	  mv /usr/include/netinet/ip_icmp.h.orig /usr/include/netinet/ip_icmp.h ; fi
X
X
END-of-Makefile
echo x - in.h.patch
sed 's/^X//' >in.h.patch << 'END-of-in.h.patch'
X*** in.h.orig	Mon Sep 13 18:19:20 1999
X--- in.h	Mon Sep 13 18:50:44 1999
X***************
X*** 328,337 ****
X--- 328,339 ----
X  #define	IP_DUMMYNET_CONFIGURE	60   /* add/configure a dummynet pipe */
X  #define	IP_DUMMYNET_DEL		61   /* delete a dummynet pipe from chain */
X  #define	IP_DUMMYNET_FLUSH	62   /* flush dummynet */
X  #define	IP_DUMMYNET_GET		64   /* get entire dummynet pipes */
X  
X+ #define IP_PINGTIMESTAMP	99   /* bool; Add time stamp in echo packet */
X+ 
X  /*
X   * Defaults and limits for options
X   */
X  #define	IP_DEFAULT_MULTICAST_TTL  1	/* normally limit m'casts to 1 hop  */
X  #define	IP_DEFAULT_MULTICAST_LOOP 1	/* normally hear sends if a member  */
END-of-in.h.patch
echo x - in_pcb.h.patch
sed 's/^X//' >in_pcb.h.patch << 'END-of-in_pcb.h.patch'
X*** in_pcb.h.orig	Mon Sep 13 19:26:16 1999
X--- in_pcb.h	Mon Sep 13 19:27:07 1999
X***************
X*** 138,147 ****
X--- 138,148 ----
X  #define	INP_HIGHPORT		0x10	/* user wants "high" port binding */
X  #define	INP_LOWPORT		0x20	/* user wants "low" port binding */
X  #define	INP_ANONPORT		0x40	/* port chosen for user */
X  #define	INP_RECVIF		0x80	/* receive incoming interface */
X  #define	INP_MTUDISC		0x100	/* user can do MTU discovery */
X+ #define INP_PINGTIMESTAMP	0x1000  /* time stamp ICMP echo packets */
X  #define	INP_CONTROLOPTS		(INP_RECVOPTS|INP_RECVRETOPTS|INP_RECVDSTADDR|\
X  					INP_RECVIF)
X  
X  #define	INPLOOKUP_WILDCARD	1
X  
END-of-in_pcb.h.patch
echo x - ip_icmp.h.patch
sed 's/^X//' >ip_icmp.h.patch << 'END-of-ip_icmp.h.patch'
X*** ip_icmp.h.orig	Mon Sep 13 19:28:56 1999
X--- ip_icmp.h	Mon Sep 13 19:29:32 1999
X***************
X*** 180,189 ****
X--- 180,198 ----
X  	(type) == ICMP_ROUTERADVERT || (type) == ICMP_ROUTERSOLICIT || \
X  	(type) == ICMP_TSTAMP || (type) == ICMP_TSTAMPREPLY || \
X  	(type) == ICMP_IREQ || (type) == ICMP_IREQREPLY || \
X  	(type) == ICMP_MASKREQ || (type) == ICMP_MASKREPLY)
X  
X+ #define ICMP_LEN(t)     (((t) == ICMP_TSTAMP || (t) == ICMP_TSTAMPREPLY)? \
X+                          ICMP_TSLEN: ICMP_MINLEN)
X+ 
X+ struct ping_ip {
X+        n_time ipping_rcv;
X+        n_time ipping_snd;
X+        n_time ipping_trans;
X+ };
X+ 
X  #ifdef KERNEL
X  void	icmp_error __P((struct mbuf *, int, int, n_long, struct ifnet *));
X  void	icmp_input __P((struct mbuf *, int));
X  #endif
X  
END-of-ip_icmp.h.patch
echo x - raw_ip.c.patch
sed 's/^X//' >raw_ip.c.patch << 'END-of-raw_ip.c.patch'
X*** raw_ip.c.orig	Mon Sep 13 19:31:53 1999
X--- raw_ip.c	Mon Sep 13 19:38:58 1999
X***************
X*** 52,61 ****
X--- 52,62 ----
X  
X  #define _IP_VHL
X  #include <netinet/in.h>
X  #include <netinet/in_systm.h>
X  #include <netinet/ip.h>
X+ #include <netinet/ip_icmp.h>
X  #include <netinet/in_pcb.h>
X  #include <netinet/in_var.h>
X  #include <netinet/ip_var.h>
X  #include <netinet/ip_mroute.h>
X  
X***************
X*** 79,88 ****
X--- 80,131 ----
X   * Nominal space allocated to a raw ip socket.
X   */
X  #define	RIPSNDQ		8192
X  #define	RIPRCVQ		8192
X  
X+ /* Dirty (?) hack to add a time stamp on ICMP echo reply packets.
X+  * PING_TSADD_RCV(struct ip *ip, struct mbuf *m, struct socket *last)
X+  * PING_TSADD_SND(struct ip *ip, struct mbuf *m, struct inpcb *inp);
X+  * May null m if m_pullup() fails. (ip)->ip_len does not include the IP header
X+  * when receiving.
X+  */
X+ u_long pingtime();
X+ 
X+ struct mbuf *ping_setsnd();
X+ 
X+ 
X+ #ifdef IP_PINGTIMESTAMP
X+   #define PING_ICMPSIZE(ip)    ICMP_LEN(((struct icmp *)((ip) + 1))->icmp_type)
X+   #define PING_SIZE(ip)        (PING_ICMPSIZE(ip) + sizeof(struct ping_ip))
X+   #define PING_TSADD_RCV(ip, m, so) do {                                      \
X+     if ((so) &&                                                               \
X+         ((so)->inp_flags & INP_PINGTIMESTAMP) &&                              \
X+         (ip)->ip_p == IPPROTO_ICMP && (ip)->ip_len >= PING_SIZE(ip) &&        \
X+         ((m) = m_pullup((m), ((ip->ip_vhl & 0x0f) << 2) + PING_SIZE(ip)))) {  \
X+           struct icmp *__icmp;                                                \
X+           (ip) = mtod((m), struct ip *);                                      \
X+           __icmp = (struct icmp *)(mtod(m, char *) + (((ip)->ip_vhl & 0x0f) << 2));     \
X+           if (__icmp->icmp_type == ICMP_ECHOREPLY ||                          \
X+               __icmp->icmp_type == ICMP_ECHO) {                               \
X+             ((struct ping_ip *)(mtod(m, char *)+(((ip)->ip_vhl & 0x0f) << 2)  \
X+                               + PING_ICMPSIZE(ip)))->ipping_rcv = pingtime(); \
X+           }                                                                   \
X+     }                                                                         \
X+   } while(0)
X+ 
X+   #define PING_TSADD_SND(ip, m, inp) do {                                     \
X+           if (((inp)->inp_flags & INP_PINGTIMESTAMP) &&                       \
X+               (ip)->ip_p == IPPROTO_ICMP &&                                   \
X+               (ip)->ip_len >= sizeof(struct ip) + PING_SIZE(ip)) {            \
X+             (m) = ping_setsnd((m));                                           \
X+           }                                                                   \
X+   } while(0)
X+ #else
X+   #define PING_TSADD_RCV(ip, m, so)
X+   #define PING_TSADD_SND(ip, m, inp)
X+ #endif
X+ 
X  /*
X   * Raw interface to IP protocol.
X   */
X  
X  /*
X***************
X*** 131,140 ****
X--- 174,184 ----
X                    inp->inp_faddr.s_addr != ip->ip_src.s_addr)
X  			continue;
X  		if (last) {
X  			struct mbuf *n = m_copy(m, 0, (int)M_COPYALL);
X  			if (n) {
X+ 				PING_TSADD_RCV(ip, n, last);
X  				if (last->inp_flags & INP_CONTROLOPTS ||
X  				    last->inp_socket->so_options & SO_TIMESTAMP)
X  				    ip_savecontrol(last, &opts, ip, n);
X  				if (sbappendaddr(&last->inp_socket->so_rcv,
X  				    (struct sockaddr *)&ripsrc, n,
X***************
X*** 149,158 ****
X--- 193,203 ----
X  			}
X  		}
X  		last = inp;
X  	}
X  	if (last) {
X+ 		PING_TSADD_RCV(ip, m, last);
X  		if (last->inp_flags & INP_CONTROLOPTS ||
X  		    last->inp_socket->so_options & SO_TIMESTAMP)
X  			ip_savecontrol(last, &opts, ip, m);
X  		if (sbappendaddr(&last->inp_socket->so_rcv,
X  		    (struct sockaddr *)&ripsrc, m, opts) == 0) {
X***************
X*** 219,228 ****
X--- 264,274 ----
X  			ip->ip_id = htons(ip_id++);
X  		/* XXX prevent ip_output from overwriting header fields */
X  		flags |= IP_RAWOUTPUT;
X  		ipstat.ips_rawout++;
X  	}
X+ 	PING_TSADD_SND(ip, m, inp);
X  	return (ip_output(m, inp->inp_options, &inp->inp_route, flags,
X  			  inp->inp_moptions));
X  }
X  
X  /*
X***************
X*** 301,310 ****
X--- 347,369 ----
X  				inp->inp_flags |= INP_HDRINCL;
X  			else
X  				inp->inp_flags &= ~INP_HDRINCL;
X  			break;
X  
X+ #ifdef IP_PINGTIMESTAMP
X+               case IP_PINGTIMESTAMP:
X+                       error = sooptcopyin(sopt, &optval, sizeof optval,
X+                                           sizeof optval);
X+                       if (error)
X+                               break;
X+                       if (optval)
X+                               inp->inp_flags |= INP_PINGTIMESTAMP;
X+                       else
X+                               inp->inp_flags &= ~INP_PINGTIMESTAMP;
X+                       break;
X+ #endif
X+ 
X  #ifdef COMPAT_IPFW
X  		case IP_FW_ADD:
X  		case IP_FW_DEL:
X  		case IP_FW_FLUSH:
X  		case IP_FW_ZERO:
X***************
X*** 655,659 ****
X--- 714,768 ----
X  	pru_connect2_notsupp, in_control, rip_detach, rip_disconnect,
X  	pru_listen_notsupp, in_setpeeraddr, pru_rcvd_notsupp,
X  	pru_rcvoob_notsupp, rip_send, pru_sense_null, rip_shutdown, 
X  	in_setsockaddr, sosend, soreceive, sopoll
X  };
X+ 
X+ u_long pingtime()
X+ {
X+         struct timeval atv;
X+         u_long t;
X+ 
X+         microtime(&atv);
X+         t = atv.tv_sec * 1000000 + atv.tv_usec;
X+         return (htonl(t));
X+ }
X+ 
X+ /* Set time stamp in ICMP ping packet. At this stage ip->ip_hl is not set. The
X+  * ip header lenght is the default one.
X+  */
X+ struct mbuf *
X+ ping_setsnd(m)
X+         struct mbuf *m;
X+ {
X+         struct icmp *icmp;
X+         struct ip *ip = mtod(m, struct ip *);
X+ 
X+         /* Put the IP + ICMP + pingTS header in same mbuf
X+          */
X+         if (!(m = m_pullup(m, sizeof(struct ip) + PING_SIZE(ip))))
X+                 return(NULL);
X+ 
X+         ip = mtod(m, struct ip *);
X+         (char *)icmp = (char *)ip + sizeof(struct ip);
X+ 
X+         /* We want to put the timestamp on ECHO packets only.
X+          */
X+         if (icmp->icmp_type != ICMP_ECHOREPLY && icmp->icmp_type != ICMP_ECHO)
X+                 return(m);
X+ 
X+         /* Set time stamp.
X+          */
X+         ((struct ping_ip *)((char *)icmp + ICMP_LEN(icmp->icmp_type)))->ipping_snd = pingtime();
X+ 
X+         /* Recompute the ICMP checksum
X+          */
X+         m->m_data += sizeof(struct ip);
X+         m->m_len -= sizeof(struct ip);
X+         icmp->icmp_cksum = 0;
X+         icmp->icmp_cksum = in_cksum(m, ip->ip_len - sizeof(struct ip));
X+         m->m_data -= sizeof(struct ip);
X+         m->m_len += sizeof(struct ip);
X+ 
X+         return(m);
X+ }
X+ 
END-of-raw_ip.c.patch
exit


	

--UAA67605.948851702/arthur.caida.org--


>Release-Note:
>Audit-Trail:
>Unformatted:


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200001260201.VAA67885>