Date: Mon, 10 Dec 2012 12:25:12 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Andre Oppermann <andre@freebsd.org> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Alan Cox <alc@rice.edu> Subject: Re: svn commit: r243668 - in head/sys: kern sys Message-ID: <20121210111347.L871@besplex.bde.org> In-Reply-To: <50C513D4.4000306@freebsd.org> References: <201211290730.qAT7Uhkv016745@svn.freebsd.org> <50C4F5F7.7080101@rice.edu> <50C513D4.4000306@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 9 Dec 2012, Andre Oppermann wrote: > On 09.12.2012 21:35, Alan Cox wrote: >> ... >> >> I believe that this change did not actually correct the overflow >> problem. See below for an explanation. >> >> On 11/29/2012 01:30, Andre Oppermann wrote: >>> ... Please trim quotes. >>> >>> Log: >>> Using a long is the wrong type to represent the realmem and maxmbufmem >>> variable as they may overflow on i386/PAE and i386 with > 2GB RAM. >>> >>> Use 64bit quad_t instead. It has broader kernel infrastructure support >>> with TUNABLE_QUAD_FETCH() and qmin/qmax() than other available types. >>> >>> Pointed out by: alc, bde >>> ... >>> Modified: head/sys/kern/subr_param.c >>> ... >>> @@ -332,10 +332,10 @@ init_param2(long physpages) >>> * available kernel memory (physical or kmem). >>> * At most it can be 3/4 of available kernel memory. >>> */ >>> - realmem = lmin(physpages * PAGE_SIZE, >>> + realmem = qmin(physpages * PAGE_SIZE, >>> VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS); >> >> >> "physpages" is a signed long. Suppose it is 1,000,000. On i386/PAE, >> the product of 1,000,000 and PAGE_SIZE will be a negative number. >> Likewise, quad_t is a signed type. So, the negative product of >> 1,000,000 and PAGE_SIZE will be sign extended to a 64-bit signed value >> when it is passed to qmin(), and qmin() will return a negative number. > > Thank you taking a second look it. > > To be honest I got a bit confused on which automatic type expansion > applies here. This is a vanilla overflow bug. It has nothing to do with type extension, except that type extension beyond long doesn't occur for multiplication of longs. It has little to do with signedness, except that with signed longs the overflow occurs at LONG_MAX (2G-1 on i386) instead of at ULONG_MAX (4G-1 on i386). The above part of the patch also shows one of the many style bugs in this and previous commits (random indentation of the continued line -- 2 tabs instead of 4 spaces). > qmax() is defined as this in libkern.h: > static __inline quad_t qmax(quad_t a, quad_t b) { return (a > b ? a : b); } > > Wouldn't physpages be expanded to quad_t? Hmm, no, only the result of the > computation, which happens in long space, is passed on. This is a function, > not a macro. Dang... No, that is backwards. With a macro, the result of the multiplicaton is passed on (unless the macro is very broken and has type casts which to break its type-genericness). With the qmin() function, the result of the multiplication is converted to quad_t. But the mmultiplication has already overflowed, giving an undefined result of type long. (This assumes that PAGE_SIZE has type long or smaller. It is normally a literal int constant like 4096. Any type for this constant would increase bugs in this area. physpages has type long. As with all binary operations, the operands are first converted to a minimal common type that is at least as large as both and also at least as large as int. This type is long. Multiplication of longs then overflows mainly in the PAE case.) > This change will force the computation to be in quad_t space: > > - realmem = qmin(physpages * PAGE_SIZE, > + realmem = qmin((quad_t)physpages * PAGE_SIZE, > VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS); Indeed. > VM_[MAX|MIN]_KERNEL_ADDRESS is safe as it is of the unsigned vm_offset_t > type. This has little to do with its type, except that the type is not larger than quad_t. Actually, it is larger on most 64-bit arches including amd6d, so the above code only works accidentally. It depends on kva not covering more than half of the address space. Now the type errors are: - there is no such type as 'unsigned vm_offset_t' (unsigned foo_t is a syntax error for any typedefed type foo_t). I think you mean that vm_offset_t is an unsigned type. It is not the plain unsigned type on 64-bit arches, so you shouldn't mean that. - the type of constants like VM_MAX_KERNEL_ADDRESS is unclear. Most constants are defined as literals. But VM_{MAX,MIN}_KERNEL_ADDRESS happen to be defined using macros. The macros are quite MD. On i386, the macro is VADDR() which casts parameters to vm_offset_t. I think most parameters in the macro are literals with type int. Also, vm_offset_t is larger than int, so no further promotion occurs and the result has type vm_offset_t. On amd64, the macro is KVADDR() which casts parameters to unsigned long. Again (but more obviously) there is no further promotion, and the result has type unsigned long. vm_offset_t is unsigned long too, so the result has type unsigned long too. I discussed casts of parameters in macros with jake@ in connection with PAE. I had added casts to some macros to reduce some popular overflow bugs and type errors. jake preferred not to cast. It can be difficult to find the right cast to use, since the right cast is often different for PAE (but many cases could be handled by using a typedef that varies with PAE), and the correct cast is context-dependent (sometimes you know that a page count is small, so multiplying it by PAGE_SIZE can't overflow a signed int). I now agree with jake -- the caller must cast as necessary, the same as if the computation is explicit. - it is not a type error to do the subtraction in vm_offset_t arithmetic. VM_MAX_KERNEL_ADDRESS must be >= VM_MIN_KERNEL_ADDRESS, so the result cannot "overflow" to a "negative" value. - the result of the subtraction has type unsigned or unsigned long on most or all arches. On most 32-bit arches including i386, even unsigned long is smaller than quad_t (but vm_offset_t is plain unsigned). On most or all 64-bit arches including amd64, unsigned long is larger than quad_t (since both are 64 bits but quad_t is signed). Thus the result of the subtraction is demoted when qmin() is called. Overflow may occur. Overflow can easily occur, but apparently doesn't. It would occur if the kernel used more than half of the 64-bit address space. This would take a very sparse mapping and hardware support for this so that it doesn't take too many page tables. Apparently, amd64 doesn't have this, so overflow doesn't occur in practice on at least amd64. If overflow occurred, it would normally give a negative value and the result of the qmin() would be clamping down to this negative value. So the case where kva is "infinite" (more than half of a 64-bit address space) is broken. I like to use signed types and thus qmin(). qmin() would be necessary if any of the expressions had a subtraction that could give a negative result. But it doesn't work for certain unsigned types like 64-bit vm_offset_t. >>> maxmbufmem = realmem / 2; >>> - TUNABLE_LONG_FETCH("kern.maxmbufmem", &maxmbufmem); >>> + TUNABLE_QUAD_FETCH("kern.maxmbufmem", &maxmbufmem); >>> if (maxmbufmem > (realmem / 4) * 3) >>> maxmbufmem = (realmem / 4) * 3; More style bugs (excessive parentheses around division) are visible in this part of the patch. Some of the instances of this style bug in kern now are: % subr_param.c: if (maxfiles > (physpages / 4)) It is even more excessive to have parentheses around a simple arithmetic operand. % subr_param.c: if (maxmbufmem > (realmem / 4) * 3) There are no excessive parentheses around the arithmetic operand here, but internal ones in it. % subr_param.c: maxmbufmem = (realmem / 4) * 3; % sys_pipe.c: if (amountpipekva > (3 * maxpipekva) / 4) { This opposite order for multiplying by 3/4 is safe here because maxpipekva can't be very large. It might be safe above too. It is more natural, so should be used if possible. % sys_pipe.c: if ((amountpipekva > (3 * maxpipekva) / 4) && This condition also has excessive parentheses around all of its '&&' terms and misindentation for all of its continuation lines containing the other terms. % uipc_socket.c: over = (head->so_qlen > 3 * head->so_qlimit / 2); Correct use of parentheses. None for the arithmetic expression. The ones used are unnecessary too, but now they are useful because the expression is farily complicated and not idiomatic. Any parentheses for the idiomatic inner terms would obfuscate the outer ones. % vfs_bio.c: maxbuf = (LONG_MAX / 3) / BKVASIZE; Excessive. % vfs_bio.c: while ((long)hidirtybuffers * BKVASIZE > 3 * hibufspace / 4) { Correct. 3 * x / 4 may be more readable than x * 3 / 4 too. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121210111347.L871>