Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 10 Dec 2012 12:25:12 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Andre Oppermann <andre@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Alan Cox <alc@rice.edu>
Subject:   Re: svn commit: r243668 - in head/sys: kern sys
Message-ID:  <20121210111347.L871@besplex.bde.org>
In-Reply-To: <50C513D4.4000306@freebsd.org>
References:  <201211290730.qAT7Uhkv016745@svn.freebsd.org> <50C4F5F7.7080101@rice.edu> <50C513D4.4000306@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 9 Dec 2012, Andre Oppermann wrote:

> On 09.12.2012 21:35, Alan Cox wrote:
>> ...
>> 
>> I believe that this change did not actually correct the overflow
>> problem.  See below for an explanation.
>> 
>> On 11/29/2012 01:30, Andre Oppermann wrote:
>>> ...

Please trim quotes.

>>> 
>>> Log:
>>>    Using a long is the wrong type to represent the realmem and maxmbufmem
>>>    variable as they may overflow on i386/PAE and i386 with > 2GB RAM.
>>> 
>>>    Use 64bit quad_t instead.  It has broader kernel infrastructure support
>>>    with TUNABLE_QUAD_FETCH() and qmin/qmax() than other available types.
>>> 
>>>    Pointed out by:	alc, bde
>>> ...
>>> Modified: head/sys/kern/subr_param.c
>>> ...
>>> @@ -332,10 +332,10 @@ init_param2(long physpages)
>>>   	 * available kernel memory (physical or kmem).
>>>   	 * At most it can be 3/4 of available kernel memory.
>>>   	 */
>>> -	realmem = lmin(physpages * PAGE_SIZE,
>>> +	realmem = qmin(physpages * PAGE_SIZE,
>>>   			VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS);
>> 
>> 
>> "physpages" is a signed long.  Suppose it is 1,000,000.  On i386/PAE,
>> the product of 1,000,000 and PAGE_SIZE will be a negative number.
>> Likewise, quad_t is a signed type.  So, the negative product of
>> 1,000,000 and PAGE_SIZE will be sign extended to a 64-bit signed value
>> when it is passed to qmin(), and qmin() will return a negative number.
>
> Thank you taking a second look it.
>
> To be honest I got a bit confused on which automatic type expansion
> applies here.

This is a vanilla overflow bug.  It has nothing to do with type extension,
except that type extension beyond long doesn't occur for multiplication
of longs.  It has little to do with signedness, except that with signed
longs the overflow occurs at LONG_MAX (2G-1 on i386) instead of at
ULONG_MAX (4G-1 on i386).

The above part of the patch also shows one of the many style bugs in this
and previous commits (random indentation of the continued line -- 2 tabs
instead of 4 spaces).

> qmax() is defined as this in libkern.h:
> static __inline quad_t qmax(quad_t a, quad_t b) { return (a > b ? a : b); }
>
> Wouldn't physpages be expanded to quad_t?  Hmm, no, only the result of the
> computation, which happens in long space, is passed on.  This is a function,
> not a macro.  Dang...

No, that is backwards.  With a macro, the result of the multiplicaton is
passed on (unless the macro is very broken and has type casts which to
break its type-genericness).  With the qmin() function, the result of
the multiplication is converted to quad_t.  But the mmultiplication has
already overflowed, giving an undefined result of type long.  (This
assumes that PAGE_SIZE has type long or smaller.  It is normally a
literal int constant like 4096.  Any type for this constant would
increase bugs in this area.  physpages has type long.  As with all
binary operations, the operands are first converted to a minimal
common type that is at least as large as both and also at least as
large as int.  This type is long.  Multiplication of longs then overflows
mainly in the PAE case.)

> This change will force the computation to be in quad_t space:
>
> -       realmem = qmin(physpages * PAGE_SIZE,
> +       realmem = qmin((quad_t)physpages * PAGE_SIZE,
>                        VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS);

Indeed.

> VM_[MAX|MIN]_KERNEL_ADDRESS is safe as it is of the unsigned vm_offset_t
> type.

This has little to do with its type, except that the type is not larger than
quad_t.  Actually, it is larger on most 64-bit arches including amd6d, so
the above code only works accidentally.  It depends on kva not
covering more than half of the address space.  Now the type errors are:

- there is no such type as 'unsigned vm_offset_t' (unsigned foo_t is a
   syntax error for any typedefed type foo_t).  I think you mean that
   vm_offset_t is an unsigned type.  It is not the plain unsigned type
   on 64-bit arches, so you shouldn't mean that.

- the type of constants like VM_MAX_KERNEL_ADDRESS is unclear.  Most
   constants are defined as literals.  But VM_{MAX,MIN}_KERNEL_ADDRESS
   happen to be defined using macros.  The macros are quite MD.  On
   i386, the macro is VADDR() which casts parameters to vm_offset_t.
   I think most parameters in the macro are literals with type int.
   Also, vm_offset_t is larger than int, so no further promotion occurs
   and the result has type vm_offset_t.  On amd64, the macro is KVADDR()
   which casts parameters to unsigned long.  Again (but more obviously)
   there is no further promotion, and the result has type unsigned long.
   vm_offset_t is unsigned long too, so the result has type unsigned long
   too.

   I discussed casts of parameters in macros with jake@ in connection with
   PAE.  I had added casts to some macros to reduce some popular overflow
   bugs and type errors.  jake preferred not to cast.  It can be difficult
   to find the right cast to use, since the right cast is often different
   for PAE (but many cases could be handled by using a typedef that varies
   with PAE), and the correct cast is context-dependent (sometimes you
   know that a page count is small, so multiplying it by PAGE_SIZE can't
   overflow a signed int).  I now agree with jake -- the caller must cast
   as necessary, the same as if the computation is explicit.

- it is not a type error to do the subtraction in vm_offset_t arithmetic.
   VM_MAX_KERNEL_ADDRESS must be >= VM_MIN_KERNEL_ADDRESS, so the result
   cannot "overflow" to a "negative" value.

- the result of the subtraction has type unsigned or unsigned long on
   most or all arches.  On most 32-bit arches including i386, even
   unsigned long is smaller than quad_t (but vm_offset_t is plain
   unsigned).  On most or all 64-bit arches including amd64, unsigned
   long is larger than quad_t (since both are 64 bits but quad_t is
   signed).  Thus the result of the subtraction is demoted when qmin()
   is called.  Overflow may occur.  Overflow can easily occur, but
   apparently doesn't.  It would occur if the kernel used more than
   half of the 64-bit address space.  This would take a very sparse
   mapping and hardware support for this so that it doesn't take too
   many page tables.  Apparently, amd64 doesn't have this, so overflow
   doesn't occur in practice on at least amd64.

   If overflow occurred, it would normally give a negative value and the
   result of the qmin() would be clamping down to this negative value.
   So the case where kva is "infinite" (more than half of a 64-bit
   address space) is broken.

I like to use signed types and thus qmin().  qmin() would be necessary
if any of the expressions had a subtraction that could give a negative
result.  But it doesn't work for certain unsigned types like 64-bit
vm_offset_t.

>>>   	maxmbufmem = realmem / 2;
>>> -	TUNABLE_LONG_FETCH("kern.maxmbufmem", &maxmbufmem);
>>> +	TUNABLE_QUAD_FETCH("kern.maxmbufmem", &maxmbufmem);
>>>   	if (maxmbufmem > (realmem / 4) * 3)
>>>   		maxmbufmem = (realmem / 4) * 3;

More style bugs (excessive parentheses around division) are visible in
this part of the patch.  Some of the instances of this style bug in
kern now are:

% subr_param.c:	if (maxfiles > (physpages / 4))

It is even more excessive to have parentheses around a simple arithmetic
operand.

% subr_param.c:	if (maxmbufmem > (realmem / 4) * 3)

There are no excessive parentheses around the arithmetic operand here,
but internal ones in it.

% subr_param.c:		maxmbufmem = (realmem / 4) * 3;

% sys_pipe.c:	if (amountpipekva > (3 * maxpipekva) / 4) {

This opposite order for multiplying by 3/4 is safe here because maxpipekva
can't be very large.  It might be safe above too.  It is more natural,
so should be used if possible.

% sys_pipe.c:	if ((amountpipekva > (3 * maxpipekva) / 4) &&

This condition also has excessive parentheses around all of its '&&' terms
and misindentation for all of its continuation lines containing the other
terms.

% uipc_socket.c:	over = (head->so_qlen > 3 * head->so_qlimit / 2);

Correct use of parentheses.  None for the arithmetic expression.  The ones
used are unnecessary too, but now they are useful because the expression
is farily complicated and not idiomatic.  Any parentheses for the idiomatic
inner terms would obfuscate the outer ones.

% vfs_bio.c:	maxbuf = (LONG_MAX / 3) / BKVASIZE;

Excessive.

% vfs_bio.c:	while ((long)hidirtybuffers * BKVASIZE > 3 * hibufspace / 4) {

Correct.  3 * x / 4 may be more readable than x * 3 / 4 too.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121210111347.L871>