Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Jul 2014 22:39:11 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Xin LI <delphij@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r269180 - head/sbin/ping6
Message-ID:  <20140728204019.V2446@besplex.bde.org>
In-Reply-To: <201407280822.s6S8M8nZ012194@svn.freebsd.org>
References:  <201407280822.s6S8M8nZ012194@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 28 Jul 2014, Xin LI wrote:

> Log:
>  When interval is set to very small value with limited amount of packets,
>  ping6(8) would quit before the remote side gets a chance to respond.
>
>  Solve this by resetting the itimer when we have reached the maximum packet
>  number have reached, but let the other handling to continue.
>
>  PR:		bin/151023
>  Submitted by:	tjmao at tjmao.net
>  MFC after:	2 weeks

ping6 is still using the old signal code which fenner improved in ping
in 1998.  It still uses SIGALRM for the main timeout, and doesn't even
use sigaction() to set up signals.  ping uses select() for the main
timeout.  It never uses setitimer() directly and only uses alarm() for
-t.  ping6 only uses a timeout with select() in the -f case.  Except
it has another misfeature: it has extra code to optionally use poll()
instead of select().  This option is hard-configured to always enabled.
So ping6 uses timeouts with poll() instead of with select(), but only
in the -f case.  Apart from the extra code, poll() also loses by only
having millisecond resolution.  Since the poll() timeout is currently
only used for -f, this doesn't matter.  Since 1/hz is normally 1
millisecond or larger and intervals smaller than 1 millisecond are not
very useful, this doesn't matter much.

Perhaps updating ping6 to the 1998 quality of ping would fix the bug.

> Modified:
>  head/sbin/ping6/ping6.c
>
> Modified: head/sbin/ping6/ping6.c
> ==============================================================================
> --- head/sbin/ping6/ping6.c	Mon Jul 28 07:20:22 2014	(r269179)
> +++ head/sbin/ping6/ping6.c	Mon Jul 28 08:22:08 2014	(r269180)
> @@ -1090,8 +1090,14 @@ main(int argc, char *argv[])
> 		/* signal handling */
> 		if (seenalrm) {
> 			/* last packet sent, timeout reached? */
> -			if (npackets && ntransmitted >= npackets)
> -				break;
> +			if (npackets && ntransmitted >= npackets) {
> +				struct timeval zerotime = {0, 0};
> +				itimer.it_value = zerotime;
> +				itimer.it_interval = zerotime;
> +				(void)setitimer(ITIMER_REAL, &itimer, NULL);
> +				seenalrm = 0;   /* clear flag */
> +				continue;
> +			}

Style bugs:
- nested declaration
- initialization in declaration
- pessimal declaration and initialization.  The variable should be static
   const with default initialization to 0.  The compiler might optimize it
   to much the same
- extra code/initializations.  You can use a static const itimer with
   default initialization to 0 and don't need to laboriously copy timevals
   into it.  Nothing is gained from not assuming anything about the
   internals of struct itimer, since the above initialization assumes
   the internals of struct timeval.  It's actually a smaller assumption
   to use the default initialization of a static object -- this initializes
   all the padding, and the only assumption is that all-bits-0 gives values
   of 0 for all the fields in the structure and that there are no fields
   which don't really have a value.
- banal comment.

> 			retransmit();
> 			seenalrm = 0;
> 			continue;

I forget how this works.  The corresponding code in ping is:

% 			if (!npackets || ntransmitted < npackets)
% 				pinger();
% 			else {
% 				if (almost_done)
% 					break;
% 				almost_done = 1;
% 				intvl.tv_usec = 0;
% 				if (nreceived) {
% 					intvl.tv_sec = 2 * tmax / 1000;
% 					if (!intvl.tv_sec)
% 						intvl.tv_sec = 1;
% 				} else {
% 					intvl.tv_sec = waittime / 1000;
% 					intvl.tv_usec = waittime % 1000 * 1000;
% 				}
% 			}

This seems to do things related to the patch but is more sophisticated.
Canceling the timer completely doesn't seem right.  ping sets a special
nonzero timeout for completion unless it is already set.  waittime used
to be hard-coded to 10 seconds but is now controlled by the -W option
which allows it to be 0.

Other bugs related to timeout intervals:
- the hard-coded flood timeout of 10 msec may have been correct for 1 Mbps
   ethernet, but is now nonsense.  You can set a much smaller timeout using
   -i.  This loads the network more than -f but doesn't give other features
   of -f.
- the restrictions to privileged users on -i and some on -f are nonsense.
   Anyone can load the network about as much and the local machine much
   more using "while :; do ping -c1 host; done" with about as many instances
   as CPUs.
- ping* has mounds of bad code for -i:

ping.c:
% 		case 'i':		/* wait between sending packets */
% 			t = strtod(optarg, &ep) * 1000.0;

The API made the timeout in milliseconds, so it is excessively restricted
in another way and wouldn't be further restricted by using poll().

% 			if (*ep || ep == optarg || t > (double)INT_MAX)
% 				errx(EX_USAGE, "invalid timing interval: `%s'",
% 				    optarg);

Missing checking for negative intervals and overflow of such.  I like
negative intervals meaning 0, but with no overflow checking and blind
passing of them to functions not expecting them, they are worse than
an up-front error.  I think this timeout is only passed to functions
like select() which consider negative timeouts invalid.  Negative timevals
are valid, but negative timeouts can reasonably be considered invalid.

% 			options |= F_INTERVAL;
% 			interval = (int)t;
% 			if (uid && interval < 1000) {
% 				errno = EPERM;
% 				err(EX_NOPERM, "-i interval too short");
% 			}

Bogus restriction on unprivileged users.

% 			break;

ping6.c:
% 		case 'i':		/* wait between sending packets */
% 			intval = strtod(optarg, &e);

Overflow before overflow can be checked for.  ping assigned to a variable
of the correct type.

Using strtod() is bogus since ping6 still hasn't caught up with ping's
change to support sub-second intervals, except possibly to change the
function name here.

% 			if (*optarg == '\0' || *e != '\0')
% 				errx(1, "illegal timing interval %s", optarg);

Less canonical error checking than for ping.  Missing overflow checking.
Bad wording "illegal".  There are no lawyers here.  Invalid timing
intervals are not even detected.  Only invalid formats are detected.

% 			if (intval < 1 && getuid()) {
% 				errx(1, "%s: only root may use interval < 1s",
% 				    strerror(EPERM));
% 			}

Same restriction as ping (it's to 1 second instead of 1000 milliseconds).

Better wording giving more details about who is restricted.

Missing the style bug of conversion to sysexits.

% 			interval.tv_sec = (long)intval;

Bogus cast (no effect since intval has type int).  tv_sec has type time_t,
not long.  Casting to that would make some sense to turn off warnings
(we check for overflow later), but in practice time_t is going to be no
smaller than int so again the cast has no effect.

% 			interval.tv_usec =
% 			    (long)((intval - interval.tv_sec) * 1000000);
% 			if (interval.tv_sec < 0)
% 				errx(1, "illegal timing interval %s", optarg);

Negative values are detected here.  A silly way to do it.  The check should
be up-front together with the check that the value fits in intval.  Then
only 1 errx() describes both.  The different error messages for slightly
different cases are a little over-engineered, especially when the checks
to classify the cases are broken.

% 			/* less than 1/hz does not make sense */
% 			if (interval.tv_sec == 0 && interval.tv_usec < 1) {
% 				warnx("too small interval, raised to .000001");
% 				interval.tv_usec = 1;
% 			}

This code is nonsense.  All it does is prevent root from using a timeout
of 0, but that works fine in ping (I think it really does mean a timeout
of 0.  This almost makes sense, and gives an effect that cannot be
achieved by root users in another way.  It causes select() to not wait,
so the select() calls just waste time).  For non-root, the timeout is >= 1,
so it cannot be < 1/hz.  The restriction to 1/hz is now wrong, since
fine-grained timouts can do better with that.  However, the code doesn't
implement the 1/hz restriction; it only implements a nonzero restriction.
So all it does is warn root users about an interval of 0 being too small
and giving a misleading indication of what it is raised to.  Even
fine-grained timeouts will probably raise it to a more than 1 microsecond
(I think they can only give 10-20 usec on fast x86's).

% 			options |= F_INTERVAL;
% 			break;

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140728204019.V2446>