From owner-svn-src-head@FreeBSD.ORG Tue Nov 12 11:13:59 2013 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EC910565; Tue, 12 Nov 2013 11:13:59 +0000 (UTC) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 6B3E52389; Tue, 12 Nov 2013 11:13:59 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 7714DD43887; Tue, 12 Nov 2013 22:13:48 +1100 (EST) Date: Tue, 12 Nov 2013 22:13:46 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov Subject: Re: svn commit: r258039 - in head/sys: kern vm In-Reply-To: <201311120847.rAC8lwi8053235@svn.freebsd.org> Message-ID: <20131112215200.Y1480@besplex.bde.org> References: <201311120847.rAC8lwi8053235@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=DstvpgP+ c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=rSyRge8hxN4A:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=Nkx1_21gIi4A:10 a=vJLSphRGJJGTM60Ikw8A:9 a=l6HgkuSRgOQPMSJ2:21 a=rQehvd6613DB2eQT:21 a=CjuIK1q_8ugA:10 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Nov 2013 11:14:00 -0000 On Tue, 12 Nov 2013, Konstantin Belousov wrote: > Log: > Avoid overflow for the page counts. > > Reported and tested by: pho > Sponsored by: The FreeBSD Foundation > MFC after: 1 week > Modified: head/sys/kern/vfs_vnops.c > ============================================================================== > --- head/sys/kern/vfs_vnops.c Tue Nov 12 08:32:10 2013 (r258038) > +++ head/sys/kern/vfs_vnops.c Tue Nov 12 08:47:58 2013 (r258039) > @@ -933,8 +933,9 @@ vn_io_fault(struct file *fp, struct uio > void *rl_cookie; > struct mount *mp; > vm_page_t *prev_td_ma; > - int cnt, error, save, saveheld, prev_td_ma_cnt; > + int error, save, saveheld, prev_td_ma_cnt; > vm_offset_t addr, end; > + vm_size_t cnt; int was correct for a count. You can't possibly have the 8TB of physical memory needed to overflow a 32-bit int page count. It is reasonably to assume 32-bit ints. vm mostly uses u_int for page count (starting with cnt.v_page_count for the total number of pages in the system). This will need to be fixed when you have 16TB of physical memory. This is worse than int because it asks for sign extension bugs and for not trapping on overflow. vm_size_t is a very bogus type for a page count. It is the type for (virtual only?) sizes in bytes. Since sizes are in bytes, 32-bit int isn't quite large enough even on 32-bit systems. The signeness change in this asks for sign extension bugs. In this function, cnt is still compared with -1 after first assigning the int result returned by vm_fault_quick_hold_pages. vm_size_t is an unsuitable type for holding this result, but the comparison still works because -1 gets converted to vm_size_t and there are no sign extension bugs in this case. If there is an overflow error, then it is for inadequate conversion of types in expressions like (cnt * PAGE_SIZE). Here the only problem seems to be in the error checking: % addr = (vm_offset_t)uio_clone->uio_iov->iov_base; % end = round_page(addr + len); % cnt = howmany(end - trunc_page(addr), PAGE_SIZE); % /* % * A perfectly misaligned address and length could cause % * both the start and the end of the chunk to use partial % * page. +2 accounts for such a situation. % */ % if (cnt > io_hold_cnt + 2) { If the parameters are untrusted, then howmany() can be almost anything, including negative. Assigning it to "int cnt" overflows it before it can be checked. I would also worry about round_page(addr + len) overflowing. This can overflow for even the valid range: addr = base of highest page in address space len = 1 end of page = end of address space end = 0 (overflow) io_hold_count is the constant 12, so cnt is limited to 14 if it doesn't overflow before checking it. There was no check for negative values. Now there is a bogus one. The int could hold negative values and the range check was only from above, so if a negative value occurred then it caused worse problems later. Now, if howmany() is negative then the negative value is corrupted to a large unsigned one. This exceeds 14, so it is reduced to 12 and doesn't cause further problems, at least from its size. Checking for negative values in the old version and converting them to 12 would have worked much the same. > vm_prot_t prot; > size_t len, resid; > ssize_t adv; > > Modified: head/sys/vm/vm_fault.c > ============================================================================== > --- head/sys/vm/vm_fault.c Tue Nov 12 08:32:10 2013 (r258038) > +++ head/sys/vm/vm_fault.c Tue Nov 12 08:47:58 2013 (r258039) > @@ -1074,7 +1074,7 @@ vm_fault_quick_hold_pages(vm_map_t map, > { > vm_offset_t end, va; > vm_page_t *mp; > - int count; > + vm_size_t count; > boolean_t pmap_failed; > > if (len == 0) > This has similar code, but more robust checking and I think it can almost rule out bad args: % if (len == 0) % return (0); % end = round_page(addr + len); % addr = trunc_page(addr); % % /* % * Check for illegal addresses. % */ % if (addr < vm_map_min(map) || addr > end || end > vm_map_max(map)) % return (-1); % % count = howmany(end - addr, PAGE_SIZE); % if (count > max_count) % panic("vm_fault_quick_hold_pages: count > max_count"); It checks for illegal addresses after allowing round_page() to overflow. I think trunc_page() can't overflow. The check detects overflow in round_page(addr + len). So howmany() can't be negative, and the end > vm_map_max(map) check should prevent it overflowing to 0. It can only be very large if we will panic anyway. But it is technically incorrect to assign it to "int count" before checking that it fits in an int. There are no problems with breaking the type of 'count', since the above range-checks it to fit in an int, and later uses of it just return or use it as an int. This is easy to fix by doing some up-front check that len is not too large (not more than max_count * PAGE_SIZE after adjusting it to cover full pages). We should be careful that max_count * PAGE size doesn't overflow, and it seems better to not do that multiplication. The adjusted length is (end - addr). This is a difference of vm_offset_t's and can be represented in a size_t (like len), but can be left in the expression's result type which is usually vm_offset_t. Since the result it is a multiple of PAGE_SIZE, howmany() is not needed and we can just divide it by PAGE_SIZE. (vm rarely uses howmany(), and this use is just a style bug. vm uses macros like btoc() and atop() to convert byte counts to page counts. There are a lot of style bugs and logic errors in these too. btoc() converts to "clicks" and code that uses it assumes that clicks are pages. The better named btop() is never used in MI vm code. atop() converts addresses from bytes to pages, but is more often abused to convert sizes from bytes to pages. Some code is so uncouth as to not even use any of these macros, but hard-codes them using PAGE_SHIFT or PAGE_SIZE.) So the correct spelling of the conversion to bytes seems to be btop(), and using this spelling we get a fairly short check with repeated code: if (btop(end - addr) > max_count) panic("vm_fault_quick_hold_pages: count > max_count"); count = btop(end - addr); This depends on btop() being a simple macro and/or the compiler only evaluating it once for efficiency. It avoids overflow in the sanity test and having to find a type for the possibly-overflowing value by not assigning the value before it is checked. The code could be rearranged a bit to make this clearer. vm_mmap() and obreak() need to be even more careful since they have to check untrusted application args, but they are too complicated to provide good examples. vm_size_t is already abused for page counts in at least. I have previously objected using u_long for page counts. Grepping for long shows many more errors now. redzone.c uses caddr_t as well as u_long, and doesn't use any vm types. C's type system is too weak to prevent these errors, but some could be detected by temporarily changing the type of vm_size_t, etc., to somthing exotic so that it cannot be used without going through the correct conversion macro. Bruce