Date: Sat, 5 Aug 2017 11:28:48 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Colin Percival <cperciva@tarsnap.com> Cc: Hans Petter Selasky <hps@selasky.org>, cem@freebsd.org, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r321985 - head/sys/ofed/drivers/infiniband/core Message-ID: <20170805105048.J1055@besplex.bde.org> In-Reply-To: <0100015dabf98abb-dde34a5d-d5bf-4dab-ae6f-897cd12526dc-000000@email.amazonses.com> References: <201708030918.v739IPVY034866@repo.freebsd.org> <CAG6CVpVL49nVqRs5atub=d2P39EGOqcNtx_Raa8fWtV=BFZXbw@mail.gmail.com> <bd19c6f8-fa97-c9c5-6318-1778e38dd0a9@selasky.org> <0100015dabf98abb-dde34a5d-d5bf-4dab-ae6f-897cd12526dc-000000@email.amazonses.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 4 Aug 2017, Colin Percival wrote: > On 08/03/17 23:28, Hans Petter Selasky wrote: >> On 08/03/17 16:37, Conrad Meyer wrote: >>> Is it not important that the subtraction and result are evaluated >>> without truncation? >> >> ticks is a circular counter. Assuming time = 0 and jiffies = -1U, then "delay" >> becomes a very large negative value, because long is used, and the delay <= 0 >> check, is no longer working like expected. >> >> Casting to "int" or truncating is the right thing to do in this case. > > Signed integer overflow is undefined. Using 'int' is liable to cause problems > after 2^32 ticks. There is no (new) overflow here. Converting a higher rank type to int has implementation-defined behaviour. I have yet to see an implementation that defines its behaviour except in its source code, but all implementations that I have seen do the right thing of truncating 2's complement integers. Perverse implementations would produce specific garbage values or perhaps rand(). They just have to document this. 'int overflows after 2^31-1 ticks if int is 32 bits. It is using u_int that is liable to cause problems after 2^32 ticks. Using long and u_long causes exactly the same problems if long is 32 bits. >>>> Modified: head/sys/ofed/drivers/infiniband/core/addr.c >>>> ============================================================================== >>>> --- head/sys/ofed/drivers/infiniband/core/addr.c Thu Aug 3 09:14:43 >>>> 2017 (r321984) >>>> +++ head/sys/ofed/drivers/infiniband/core/addr.c Thu Aug 3 09:18:25 >>>> 2017 (r321985) >>>> @@ -187,10 +187,10 @@ EXPORT_SYMBOL(rdma_translate_ip); >>>> >>>> static void set_timeout(unsigned long time) >>>> { >>>> - unsigned long delay; >>>> + int delay; /* under FreeBSD ticks are 32-bit */ >>>> >>>> delay = time - jiffies; There is no overflow here. The RHS is calculated in u_long arithmetic which has truncation instead of overflow. First, jiffies is converted to u_long in a standards-defined way. If jiffies is non-negative, then its value is unchanged by the conversion. I think jiffies can't be negative, but if it is then there are complications -- the value must be changed; it is changed by adding a multiple of ULONG_MAX which is normally 1, but it not easy to see that the multiple is always 1 (if that uis the case). Then the subtraction can't overflow, but it produces a garbage result of time < jiffies. As usual, using unsigned is a bug. Here it breaks calculations of negative differences. >>>> - if ((long)delay <= 0) >>>> + if (delay <= 0) >>>> delay = 1; Then we cast to long in the old code, and assign the RHS to int in the new code. Both are implementation-defined. The cast does nothing different from assignment except it might change compiler warnings. The old code seems to be correct enough. Suppose time is 59 and jiffies is 60. Then time - jiffies is 0xffffffffLU on 32-bit systems and 0xffffffffffffffffLU on 64-bit systems. If the implementation-defined behaviour were documented, then we would know that the result of casting either of these to long is -1L. This is < 0, so delay gets fixed up to +1. Similarly with the new code -- the assignment converts the huge unsigned values to -1. Problems with the type mismatch might occur later, but I don't see any. On 64-bit systems, the difference can be really huge without tripping the (long)delay <= 0 test. Above INT_MAX, later truncation to int in APIs using ticks will give garbage (possibly 0 or negative). I guess that is the problem. But even on 32-bit systems, the difference can be INT_MAX and that seems too large to be correct. If jiffies are 1/1000 seconds, then it is thewell-known magic number 24+ days. User code might generate that, but kernel driver code shouldn't. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170805105048.J1055>