Date: Tue, 25 Jan 2000 21:01:08 -0500 (EST) From: dwm@caida.org To: FreeBSD-gnats-submit@freebsd.org Subject: kern/16360: kernel timestamping of ICMP echo requests and replies Message-ID: <200001260201.VAA67885@arthur.caida.org>
next in thread | raw e-mail | index | archive | help
>Number: 16360
>Category: kern
>Synopsis: kernel timestamping of ICMP echo requests and replies
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: change-request
>Submitter-Id: current-users
>Arrival-Date: Tue Jan 25 18:10:01 PST 2000
>Closed-Date:
>Last-Modified:
>Originator: Daniel McRobb
>Release: FreeBSD 3.4-STABLE i386
>Organization:
CAIDA
>Environment:
>Description:
Patches to put timestamps in payload of ICMP echo requests and
replies if a new socket option (IP_PINGTIMESTAMP) is enabled.
The socket option (IP_PINGTIMESTAMP) causes the first 12 bytes
of the data portion of an echo packet to be filled with
timestamps by the kernel. The 12 bytes is divided into 3
microsecond timestamps in network byte order. The first
timestamp is a reception time in microseconds (the time at
which an echo reply was received). The second timestamp is a
transmit timestamp (the time at which an echo reply was sent) in
microseconds. The third timestamp is intended to contain the
transit round trip time, but is currently not filled by the
kernel. You can calculate the transit time by subtracting the
transmit timestamp from the reception timestamp.
The main reason for this option: reduction of time variance
caused by context switching and scheduling when making
round-trip measurements and timestamping from user space. The
symptoms from these causes are very significant in a LAN
environment (I've seen upwards of 30% differences in RTT using
this option vs. the 'normal' method of timestamping in user
space). In a WAN environment, the difference in RTT between
using this option and using user space timestamping is not
horribly significant for a single measurement but rears its ugly
head when trying to measure variance over time-series data and/or
under heavy process load.
There are 2 reasons it's implemented as payload modifications.
The first is that it doesn't require additional state in
applications (unlike solutions involving auxiliary data in calls
to sendmsg() and/or recvmsg(), such as our current SO_TIMESTAMP
option). This is significant for applications that use active
ICMP measurements to many destinations (very common in network
management systems, where ICMP pollers may be querying 10s of
thousands or more destinations). A second is that it permits
one-way transit time measurements if both transmit and receive
hosts have the option available and have synchronized clocks.
Side-effects
------------
This is a fairly trivial kernel change, and should not affect
applications which do not enable the socket option (on the
receive side, we timestamp after m_pullup()). I've been using it
for over 2 years now (dating back to 2.2.5-R).
Files Changed
-------------
/sys/netinet/in.h
/sys/netinet/in_pcb.h
/sys/netinet/ip_icmp.h
/sys/netinet/raw_ip.c
/usr/include/netinet/in.h
/usr/include/netinet/in_pcb.h
/usr/include/netinet/ip_icmp.h
More Information
----------------
See:
http://www.caida.org/Tools/Skitter/skping/index.html
http://www.caida.org/Tools/Skitter/index.html
>How-To-Repeat:
>Fix:
Here's a shar of the patches and a Makefile. These were done
against 3.3-RC but work on 3.4-stable as of 1/25/2000.
# This is a shell archive. Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file". Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
# Makefile
# in.h.patch
# in_pcb.h.patch
# ip_icmp.h.patch
# raw_ip.c.patch
#
echo x - Makefile
sed 's/^X//' >Makefile << 'END-of-Makefile'
XKERNTARGETS = /sys/netinet/in.h /sys/netinet/in_pcb.h \
X /sys/netinet/ip_icmp.h /sys/netinet/raw_ip.c
XUSERTARGETS = /usr/include/netinet/in.h /usr/include/netinet/in_pcb.h \
X /usr/include/netinet/ip_icmp.h
X
Xall: ${KERNTARGETS} ${USERTARGETS}
X
X/sys/netinet/in.h::
X patch /sys/netinet/in.h ./in.h.patch
X
X/sys/netinet/in_pcb.h::
X patch /sys/netinet/in_pcb.h ./in_pcb.h.patch
X
X/sys/netinet/ip_icmp.h::
X patch /sys/netinet/ip_icmp.h ./ip_icmp.h.patch
X
X/sys/netinet/raw_ip.c::
X patch /sys/netinet/raw_ip.c ./raw_ip.c.patch
X
X/usr/include/netinet/in.h::
X patch /usr/include/netinet/in.h ./in.h.patch
X
X/usr/include/netinet/in_pcb.h::
X patch /usr/include/netinet/in_pcb.h ./in_pcb.h.patch
X
X/usr/include/netinet/ip_icmp.h::
X patch /usr/include/netinet/ip_icmp.h ./ip_icmp.h.patch
X
Xundo::
X if [ -f /sys/netinet/in.h.orig ]; then \
X mv /sys/netinet/in.h.orig /sys/netinet/in.h ; fi
X if [ -f /sys/netinet/in_pcb.h.orig ]; then \
X mv /sys/netinet/in_pcb.h.orig /sys/netinet/in_pcb.h ; fi
X if [ -f /sys/netinet/ip_icmp.h.orig ]; then \
X mv /sys/netinet/ip_icmp.h.orig /sys/netinet/ip_icmp.h ; fi
X if [ -f /sys/netinet/raw_ip.c.orig ]; then \
X mv /sys/netinet/raw_ip.c.orig /sys/netinet/raw_ip.c ; fi
X if [ -f /usr/include/netinet/in.h.orig ]; then \
X mv /usr/include/netinet/in.h.orig /usr/include/netinet/in.h ; fi
X if [ -f /usr/include/netinet/in_pcb.h.orig ]; then \
X mv /usr/include/netinet/in_pcb.h.orig /usr/include/netinet/in_pcb.h ; fi
X if [ -f /usr/include/netinet/ip_icmp.h.orig ]; then \
X mv /usr/include/netinet/ip_icmp.h.orig /usr/include/netinet/ip_icmp.h ; fi
X
X
END-of-Makefile
echo x - in.h.patch
sed 's/^X//' >in.h.patch << 'END-of-in.h.patch'
X*** in.h.orig Mon Sep 13 18:19:20 1999
X--- in.h Mon Sep 13 18:50:44 1999
X***************
X*** 328,337 ****
X--- 328,339 ----
X #define IP_DUMMYNET_CONFIGURE 60 /* add/configure a dummynet pipe */
X #define IP_DUMMYNET_DEL 61 /* delete a dummynet pipe from chain */
X #define IP_DUMMYNET_FLUSH 62 /* flush dummynet */
X #define IP_DUMMYNET_GET 64 /* get entire dummynet pipes */
X
X+ #define IP_PINGTIMESTAMP 99 /* bool; Add time stamp in echo packet */
X+
X /*
X * Defaults and limits for options
X */
X #define IP_DEFAULT_MULTICAST_TTL 1 /* normally limit m'casts to 1 hop */
X #define IP_DEFAULT_MULTICAST_LOOP 1 /* normally hear sends if a member */
END-of-in.h.patch
echo x - in_pcb.h.patch
sed 's/^X//' >in_pcb.h.patch << 'END-of-in_pcb.h.patch'
X*** in_pcb.h.orig Mon Sep 13 19:26:16 1999
X--- in_pcb.h Mon Sep 13 19:27:07 1999
X***************
X*** 138,147 ****
X--- 138,148 ----
X #define INP_HIGHPORT 0x10 /* user wants "high" port binding */
X #define INP_LOWPORT 0x20 /* user wants "low" port binding */
X #define INP_ANONPORT 0x40 /* port chosen for user */
X #define INP_RECVIF 0x80 /* receive incoming interface */
X #define INP_MTUDISC 0x100 /* user can do MTU discovery */
X+ #define INP_PINGTIMESTAMP 0x1000 /* time stamp ICMP echo packets */
X #define INP_CONTROLOPTS (INP_RECVOPTS|INP_RECVRETOPTS|INP_RECVDSTADDR|\
X INP_RECVIF)
X
X #define INPLOOKUP_WILDCARD 1
X
END-of-in_pcb.h.patch
echo x - ip_icmp.h.patch
sed 's/^X//' >ip_icmp.h.patch << 'END-of-ip_icmp.h.patch'
X*** ip_icmp.h.orig Mon Sep 13 19:28:56 1999
X--- ip_icmp.h Mon Sep 13 19:29:32 1999
X***************
X*** 180,189 ****
X--- 180,198 ----
X (type) == ICMP_ROUTERADVERT || (type) == ICMP_ROUTERSOLICIT || \
X (type) == ICMP_TSTAMP || (type) == ICMP_TSTAMPREPLY || \
X (type) == ICMP_IREQ || (type) == ICMP_IREQREPLY || \
X (type) == ICMP_MASKREQ || (type) == ICMP_MASKREPLY)
X
X+ #define ICMP_LEN(t) (((t) == ICMP_TSTAMP || (t) == ICMP_TSTAMPREPLY)? \
X+ ICMP_TSLEN: ICMP_MINLEN)
X+
X+ struct ping_ip {
X+ n_time ipping_rcv;
X+ n_time ipping_snd;
X+ n_time ipping_trans;
X+ };
X+
X #ifdef KERNEL
X void icmp_error __P((struct mbuf *, int, int, n_long, struct ifnet *));
X void icmp_input __P((struct mbuf *, int));
X #endif
X
END-of-ip_icmp.h.patch
echo x - raw_ip.c.patch
sed 's/^X//' >raw_ip.c.patch << 'END-of-raw_ip.c.patch'
X*** raw_ip.c.orig Mon Sep 13 19:31:53 1999
X--- raw_ip.c Mon Sep 13 19:38:58 1999
X***************
X*** 52,61 ****
X--- 52,62 ----
X
X #define _IP_VHL
X #include <netinet/in.h>
X #include <netinet/in_systm.h>
X #include <netinet/ip.h>
X+ #include <netinet/ip_icmp.h>
X #include <netinet/in_pcb.h>
X #include <netinet/in_var.h>
X #include <netinet/ip_var.h>
X #include <netinet/ip_mroute.h>
X
X***************
X*** 79,88 ****
X--- 80,131 ----
X * Nominal space allocated to a raw ip socket.
X */
X #define RIPSNDQ 8192
X #define RIPRCVQ 8192
X
X+ /* Dirty (?) hack to add a time stamp on ICMP echo reply packets.
X+ * PING_TSADD_RCV(struct ip *ip, struct mbuf *m, struct socket *last)
X+ * PING_TSADD_SND(struct ip *ip, struct mbuf *m, struct inpcb *inp);
X+ * May null m if m_pullup() fails. (ip)->ip_len does not include the IP header
X+ * when receiving.
X+ */
X+ u_long pingtime();
X+
X+ struct mbuf *ping_setsnd();
X+
X+
X+ #ifdef IP_PINGTIMESTAMP
X+ #define PING_ICMPSIZE(ip) ICMP_LEN(((struct icmp *)((ip) + 1))->icmp_type)
X+ #define PING_SIZE(ip) (PING_ICMPSIZE(ip) + sizeof(struct ping_ip))
X+ #define PING_TSADD_RCV(ip, m, so) do { \
X+ if ((so) && \
X+ ((so)->inp_flags & INP_PINGTIMESTAMP) && \
X+ (ip)->ip_p == IPPROTO_ICMP && (ip)->ip_len >= PING_SIZE(ip) && \
X+ ((m) = m_pullup((m), ((ip->ip_vhl & 0x0f) << 2) + PING_SIZE(ip)))) { \
X+ struct icmp *__icmp; \
X+ (ip) = mtod((m), struct ip *); \
X+ __icmp = (struct icmp *)(mtod(m, char *) + (((ip)->ip_vhl & 0x0f) << 2)); \
X+ if (__icmp->icmp_type == ICMP_ECHOREPLY || \
X+ __icmp->icmp_type == ICMP_ECHO) { \
X+ ((struct ping_ip *)(mtod(m, char *)+(((ip)->ip_vhl & 0x0f) << 2) \
X+ + PING_ICMPSIZE(ip)))->ipping_rcv = pingtime(); \
X+ } \
X+ } \
X+ } while(0)
X+
X+ #define PING_TSADD_SND(ip, m, inp) do { \
X+ if (((inp)->inp_flags & INP_PINGTIMESTAMP) && \
X+ (ip)->ip_p == IPPROTO_ICMP && \
X+ (ip)->ip_len >= sizeof(struct ip) + PING_SIZE(ip)) { \
X+ (m) = ping_setsnd((m)); \
X+ } \
X+ } while(0)
X+ #else
X+ #define PING_TSADD_RCV(ip, m, so)
X+ #define PING_TSADD_SND(ip, m, inp)
X+ #endif
X+
X /*
X * Raw interface to IP protocol.
X */
X
X /*
X***************
X*** 131,140 ****
X--- 174,184 ----
X inp->inp_faddr.s_addr != ip->ip_src.s_addr)
X continue;
X if (last) {
X struct mbuf *n = m_copy(m, 0, (int)M_COPYALL);
X if (n) {
X+ PING_TSADD_RCV(ip, n, last);
X if (last->inp_flags & INP_CONTROLOPTS ||
X last->inp_socket->so_options & SO_TIMESTAMP)
X ip_savecontrol(last, &opts, ip, n);
X if (sbappendaddr(&last->inp_socket->so_rcv,
X (struct sockaddr *)&ripsrc, n,
X***************
X*** 149,158 ****
X--- 193,203 ----
X }
X }
X last = inp;
X }
X if (last) {
X+ PING_TSADD_RCV(ip, m, last);
X if (last->inp_flags & INP_CONTROLOPTS ||
X last->inp_socket->so_options & SO_TIMESTAMP)
X ip_savecontrol(last, &opts, ip, m);
X if (sbappendaddr(&last->inp_socket->so_rcv,
X (struct sockaddr *)&ripsrc, m, opts) == 0) {
X***************
X*** 219,228 ****
X--- 264,274 ----
X ip->ip_id = htons(ip_id++);
X /* XXX prevent ip_output from overwriting header fields */
X flags |= IP_RAWOUTPUT;
X ipstat.ips_rawout++;
X }
X+ PING_TSADD_SND(ip, m, inp);
X return (ip_output(m, inp->inp_options, &inp->inp_route, flags,
X inp->inp_moptions));
X }
X
X /*
X***************
X*** 301,310 ****
X--- 347,369 ----
X inp->inp_flags |= INP_HDRINCL;
X else
X inp->inp_flags &= ~INP_HDRINCL;
X break;
X
X+ #ifdef IP_PINGTIMESTAMP
X+ case IP_PINGTIMESTAMP:
X+ error = sooptcopyin(sopt, &optval, sizeof optval,
X+ sizeof optval);
X+ if (error)
X+ break;
X+ if (optval)
X+ inp->inp_flags |= INP_PINGTIMESTAMP;
X+ else
X+ inp->inp_flags &= ~INP_PINGTIMESTAMP;
X+ break;
X+ #endif
X+
X #ifdef COMPAT_IPFW
X case IP_FW_ADD:
X case IP_FW_DEL:
X case IP_FW_FLUSH:
X case IP_FW_ZERO:
X***************
X*** 655,659 ****
X--- 714,768 ----
X pru_connect2_notsupp, in_control, rip_detach, rip_disconnect,
X pru_listen_notsupp, in_setpeeraddr, pru_rcvd_notsupp,
X pru_rcvoob_notsupp, rip_send, pru_sense_null, rip_shutdown,
X in_setsockaddr, sosend, soreceive, sopoll
X };
X+
X+ u_long pingtime()
X+ {
X+ struct timeval atv;
X+ u_long t;
X+
X+ microtime(&atv);
X+ t = atv.tv_sec * 1000000 + atv.tv_usec;
X+ return (htonl(t));
X+ }
X+
X+ /* Set time stamp in ICMP ping packet. At this stage ip->ip_hl is not set. The
X+ * ip header lenght is the default one.
X+ */
X+ struct mbuf *
X+ ping_setsnd(m)
X+ struct mbuf *m;
X+ {
X+ struct icmp *icmp;
X+ struct ip *ip = mtod(m, struct ip *);
X+
X+ /* Put the IP + ICMP + pingTS header in same mbuf
X+ */
X+ if (!(m = m_pullup(m, sizeof(struct ip) + PING_SIZE(ip))))
X+ return(NULL);
X+
X+ ip = mtod(m, struct ip *);
X+ (char *)icmp = (char *)ip + sizeof(struct ip);
X+
X+ /* We want to put the timestamp on ECHO packets only.
X+ */
X+ if (icmp->icmp_type != ICMP_ECHOREPLY && icmp->icmp_type != ICMP_ECHO)
X+ return(m);
X+
X+ /* Set time stamp.
X+ */
X+ ((struct ping_ip *)((char *)icmp + ICMP_LEN(icmp->icmp_type)))->ipping_snd = pingtime();
X+
X+ /* Recompute the ICMP checksum
X+ */
X+ m->m_data += sizeof(struct ip);
X+ m->m_len -= sizeof(struct ip);
X+ icmp->icmp_cksum = 0;
X+ icmp->icmp_cksum = in_cksum(m, ip->ip_len - sizeof(struct ip));
X+ m->m_data -= sizeof(struct ip);
X+ m->m_len += sizeof(struct ip);
X+
X+ return(m);
X+ }
X+
END-of-raw_ip.c.patch
exit
--UAA67605.948851702/arthur.caida.org--
>Release-Note:
>Audit-Trail:
>Unformatted:
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200001260201.VAA67885>
