Date: Thu, 19 Dec 2013 18:00:55 +0100 From: Stefan Esser <se@freebsd.org> To: Bruce Evans <brde@optusnet.com.au>, Stefan Esser <se@localhost.FreeBSD.org> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r259609 - head/sys/kern Message-ID: <52B32647.2030008@freebsd.org> In-Reply-To: <20131219204903.V24189@besplex.bde.org> References: <201312190901.rBJ91ko3036881@svn.freebsd.org> <20131219204903.V24189@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Am 19.12.2013 11:49, schrieb Bruce Evans: > On Thu, 19 Dec 2013, Stefan Esser wrote: > >> Log: >> Fix overflow for timeout values of more than 68 years, which is the >> maximum >> covered by sbintime (LONG_MAX seconds). > > Not LONG_MAX seconds, but INT32_MAX seconds. LONG_MAX seconds is about > 2**32 > times larger on 64-bit arches. Hi Bruce, yes, you are of course correct ... The limit is 2^31-2^-32 seconds, to be exact ;-) This is represented by LONG_MAX in sbintime, which made me use that wrong constant in the commit message. > sbintimes and their complexity give many more possibilities for overflow. > I thought that the new overflow bugs and some old ones were already fixed. > >> Some programs use timeout values in excess of 1000 years. The conversion >> to sbintime caused wrap-around on overflow, which resulted in short or >> negative timeout values. This caused long delays on sockets opened by >> affected programs (e.g. OpenSSH). >> >> Kernels compiled without -fno-strict-overflow were not affected, >> apparently >> because the compiler tested the sign of the timeout value before >> performing >> the multiplication that lead to overflow. > > I think it just gave a garbage value that was accidentally less harmful. The result of the multiplication was performed modulo 2^64 due to the limited range of the operation. The result was then interpreted as a signed 64 bit 2s-complement number. The factor 2^32/1000 is 0x418937, which multiplied with a round decimal number over 1000*2^32 will result in "random" bit sequence in the resulting sbintime. Half the values will have been positive, and many will have corresponded to substantial timeout values, but some will have been very low, resulting in unexpectedly low timeouts. >> When the -fno-strict-overflow option was added to CFLAGS, this >> optimization >> was disabled and the test was performed on the result of the >> multiplication. >> Negative products were caught and resulted in EINVAL being returned, but >> wrap-around to positive values just shortened the timeout value to the >> residue of the result that could be represented by sbintime. > > This shows one reason why -fno-strict-overflow shouldn't be used. It just > wastes time to give different undefined behaviour with undocumented > details. Well, I thought so when I found the cause of the breakage (long delays opening TCP connections) that were the result of the compilation with -fno-strict-overflow. But I reconsidered, because a real bug in the code has been identied, this way. The bug existed, without being detected (because too short but non-zero timeout values were caught at the application layer, which just re-issued the call with the remaining time as timeout parameter). If applications didn't have to worry about short timeouts for other reasons, then this bug would have led to observable problems, even with -fno-strict-overflow. >> The fix is to cap the timeout values at the maximum that can be >> represented >> by sbintime, which is 2^31 - 1 seconds or more than 68 years. > > 2^31 - 1 is a correct spelling of INT32_MAX, unlike LONG_MAX. Yes, sbintime is defined in units of 2^-32 seconds, which makes the highest representable time exactly (2^31 - 2^-32) seconds. This value is represented by LLONG_MAX units of 2^32 seconds ... >> After this change, the kernel can be compiled with -fno-strict-overflow >> with no ill effects. > >> Modified: head/sys/kern/kern_event.c >> ============================================================================== >> >> --- head/sys/kern/kern_event.c Thu Dec 19 07:33:07 2013 (r259608) >> +++ head/sys/kern/kern_event.c Thu Dec 19 09:01:46 2013 (r259609) >> @@ -526,7 +526,8 @@ knote_fork(struct knlist *list, int pid) >> static __inline sbintime_t >> timer2sbintime(intptr_t data) >> { >> - >> + if (data > LLONG_MAX / SBT_1MS) >> + return LLONG_MAX; >> return (SBT_1MS * data); >> } > > This has the following style bugs: > - it removes the empty line after the (null) declarations > - it is missing parentheses around the return value > - it uses the long long abomination Ughh, I'll fix the style bugs ... It had taken me several hours over the last week to find this bug, and I committed the fix that worked on my system without making it compliant with FreeBSD style. Sorry for that ... > This has the following type errors: > - using the long long abomination is not just a style bug. sbintime_t has > type int64_t, not long long. The overflow check will break if long long > becomes longer than int64_t The definition of sbintime is int64_t on all architectures. If any architecture used a wider integer type for sbintime, the fix would just limit the maximum timeout to 68 years (instead of many magnitudes longer than the universe will last), which is still way beyond any sensible timeout value. And functions will have to check for too short timeouts to be robust, anyway. The functions in kern_timeout.c heavily depend on sbintime having at least 64 bits and perform calculations under the assumption, that no value above INT64_MAX can be passed as sbintime. So, even if there was a wider sbintime, the highest value supported was still represented by INT64_MAX. > - returning LLONG_MAX is usually just a style bug. When long long is > longer > than int64_t, the return value will overflow, but the > implementation-defined > behaviour for this is usually to convert it to the correct value. Yes, it was a silly mistake to use LLONG_MAX, even though it is identical to INT64_MAX on all architectures currently supported by FreeBSD. > INT64_MAX is a dangerous default maximum timeout. You can't even add the > minimum sbintime_t of 2**-32 seconds to this without overflowing. The function timer2sbintime is used in only two places in the kernel, one being filt_timerattach(), which caused the kernel breakage, the other with a constant timeout of 0. No further additions are ever performed on the value, so there is no risk of overflow. The function is declared "static inline", and will therefore not be called from outside of this source file. > Old timeout code used the following method to reduce the problem of > overflowing timeouts: > For alarm() and setitimer(), limit the timeout to 100 million seconds > (3.17 years) and return EINVAL if it is too large. This works with > 32-bit time_t until about 2035. POSIX doesn't allow this. Clamping > the time to 100 million seconds isn't allowed either, since the > residual time can be seen from applications. Note that 64-bit time_t > has little effect on this problem. Uses wishing to overflow the kernel > simply ask for a timeout in a timeval of INT64_MAX seconds plus 999999 > seconds so that adding just 1 microseconds to it overflow. With > seconds in sbintime_t instead of in time_t, overflow occurs much > sooner. The brokenness of the kernel compiled with -fno-strict-overflow was due to EINVAL being returned to callers of filt_timerattach(), which apparently did not check that return code (or at least endlessly retried with the same invalid timeout parameter). > This is quite broken now: > - the limit of 100000000 used to be enforced in itimerfix(), but this was > removed in connection with implementing POSIX realtime timers without > even breaking the documentaion to match or touching PR(s) about the > POSIX non-conformance of the limit. > - setitimer() mostly uses sbintimes now, so the old limit is irrelevant > and enforcing it in itimerfix() would break the new sbintime code and > presumably the not so new realtime timers code. The sbintime code > uses a modified limit that is also not allowed by POSIX. This limit is > of course undocumented. It is INT32_MAX / 2, so that there is plenty > of room for expansion. This is what the above should use, modulo the > conformance bug. EINVAL is returned when this limit is exceeded. Are you sure about the limit of INT32_MAX / 2 (i.e. 2^30 -1) seconds? My reading is that sbintime = INT64_MAX represents (2^31-2^-32) seconds, and that this value is correctly treated in the timeout code. It is possible to pass values of more than 2^31 seconds to setitimer() on architectures where time_t is defined as int64_t. There are no tests for overflow in tvtosbt() in /sys/sys/time.h. > - select() and poll() were more careful and seem to be correct now. > The timeout in poll() is limited to INT_MAX milliseconds, so it fits > in sbintime_t. Except this assumes that int is 32 bits. select() > limits the timeout to INT32_MAX seconds so that it is representable > as an sbintime_t and then does a further overflow check involving > INT64_MAX. When overflow would occur, it sets asbt to -1, which I > think means an infinite timeout. I don't like this. -1 is a tiny > amount less than 0, not near plus infinity. > - nanosleep() used to be less careful than select() and poll(), but seems > to have been fixed. For some reason, it uses a mixture of the above > methods. It limits sleep times to INT32_MAX / 2 seconds but repeats > the sleep as necessary so as not to return an error. The logic for > this is subtle and I haven't checked it. I just looked at nanosleep() and have to admit, that I do not fully understand its semantics. A tv_nsec value < 0 or > 10^9 results in EINVAL, while a negative tv_sec is silently treated as 0. If a timeout of 2^30 seconds or more is requested, the sleep time is limited to 2^30 - 1 seconds, but the difference is returned in the timespec value pointed to by the optional return value pointer (if present). That value may thus be non-zero, even if the function had returned 0 (indicating that the timeout has occured). At least for the case of tv_sec in the range of 2^30 to 2^31-1, the caller will assume that the full time has elapsed, while in fact only 2^30 seconds have passed and the call should be repeated with the remaining sleep time returned in rmtp ... > davide@ has patches related to fixing this, but none seem to have been > committed yet, except some old ones that gave some of the above overflow > checking. Thanks for the detailed information and sorry about the style violation. I'll fix it, if consensus is reached about the correct style. I'd replace the two occurances of LLONG_MAX with INT64_MAX and add the missing empty line: static __inline sbintime_t timer2sbintime(intptr_t data) { if (data > INT64_MAX / SBT_1MS) return INT64_MAX; return (SBT_1MS * data); } If you can show evidence that a limit of INT64_MAX/2 is more appropriate (2^30 seconds or 34 years), the limit could be of course be reduced to that value. I could not find any code that would not tolerate INT64_MAX, though ... Regards, STefan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52B32647.2030008>