From owner-svn-src-head@FreeBSD.ORG Tue Jul 9 05:36:25 2013 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 08AE1C56; Tue, 9 Jul 2013 05:36:25 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 7A97318D2; Tue, 9 Jul 2013 05:36:23 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 9C4BA420985; Tue, 9 Jul 2013 15:36:15 +1000 (EST) Date: Tue, 9 Jul 2013 15:36:13 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: "Pedro F. Giffuni" Subject: Re: svn commit: r253045 - head/sys/fs/ext2fs In-Reply-To: <201307082021.r68KLanT005030@svn.freebsd.org> Message-ID: <20130709131546.K1312@besplex.bde.org> References: <201307082021.r68KLanT005030@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Q6eKePKa c=1 sm=1 a=Ot8oeJroF2sA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=AD96D3NyImQA:10 a=UqKkTzXlbx42FBnIQX8A:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, kib@freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jul 2013 05:36:25 -0000 On Mon, 8 Jul 2013, Pedro F. Giffuni wrote: > Log: > Avoid a panic and return EINVAL instead. > > Merge from UFS r232692: > syscall() fuzzing can trigger this panic. This breaks the assertion without fixing the bug. I don't know what is in the inscrutable reference r232692, but UFS doesn't exist, and ffs still has the assertions. Even ext2fs_write() still has the assertions. > Modified: head/sys/fs/ext2fs/ext2_vnops.c > ============================================================================== > --- head/sys/fs/ext2fs/ext2_vnops.c Mon Jul 8 19:40:50 2013 (r253044) > +++ head/sys/fs/ext2fs/ext2_vnops.c Mon Jul 8 20:21:36 2013 (r253045) > @@ -1598,11 +1598,11 @@ ext2_read(struct vop_read_args *ap) > } else if (vp->v_type != VREG && vp->v_type != VDIR) > panic("%s: type %d", "ext2_read", vp->v_type); > #endif > + if (uio->uio_resid < 0 || uio->uio_offset < 0) > + return (EINVAL); > orig_resid = uio->uio_resid; > - KASSERT(orig_resid >= 0, ("ext2_read: uio->uio_resid < 0")); > if (orig_resid == 0) > return (0); Syscall fuzzing doesn't trigger this panic. One of the many type errors in ext2fs used to be trigger this panic. Now this type error just gives EINVAL for valid syscalls. The type error is that orig_resid stil has type int, but uio->uio_resid has type ssize_t. SSIZE_MAX is is perfectly broken to POSIX spec, so on 64-bit arches ssize_t is int64_t and SSIZE_MAX is 2**63-1. So on 64-bit arches, anyone can easily trigger the panic using invalid syscall args where uio->uio_resid is larger than INT_MAX and this is larger than the buffer; anyone with a large amount of virtual memory and a large data segment size limit can not quite so easily trigger the panic using valid syscall args where uio->uio_resid is larger than INT_MAX and this is not larger than the buffer. Assignment to orig_resid then overflows. The behaviour is implementation-defined and is normally to truncate the value (the FreeBSD implementation is missing documentation to say this). Sometimes the result is negative so the error is detected. Othertimes a truncated positive or zero value is used, and the final result either a too-short i/o (if more than the truncated count could be done) or actually correct (if the i/o is a read() and EOF is reached before the non-truncated part of the count is needed). > - KASSERT(uio->uio_offset >= 0, ("ext2_read: uio->uio_offset < 0")); > fs = ip->i_e2fs; > if (uio->uio_offset < ip->i_size && > uio->uio_offset >= fs->e2fs_maxfilesize) This part of the change is just wrong. uio->uio_offset is not copied to a local variable of either the correct type or a wrong type, and there are no problems with checking it directly. Also, it has been 64 bits on all arches in FreeBSD since FreeBSD-2, so doing a wrong check of it here is presumably not necessary for avoiding worse problems later. This commit doesn't change the KASSERT()s in ext2_write(). Now the truncation bug doesn't affect the first KASSERT(), since uio->uio_resid is not corrupted before checking it. I thought that kib swept the tree to fix all the resid types to ssize_t when he expanded the limit for read() and write(), etc. Grep shows that orig_resid has the correct type in all file systems in /sys/fs except ext2fs. It is spelled the same in ext2fs, msdosfs and nfsclient. udf doesn't need it since its only use in the other file systems is to handle setting of atime at the end, and udf is read-only. But udf has the following bugs handling uio->uio_resid: in udf_read(): - uio->uio_resid < 0 is not checked on entry, either to return EINVAL or panic. Of course, this can't happen and the check doesn't belong in every file system. Perhaps it could be moved to VOP_READ() and VOP_WRITE(), and omitted there more often than most KASSERT()s. - uio->uio_resid == 0 is checked on entry. Most or all file systems do this, but I think this can't happen either. - uio->uio_offset < 0 is checked on entry, but not under a KASSERT() as on most file systems. Putting it under KASSERT() saves space when invariants are not configured. - uio->uio_resid is assigned to a variable n of the correct type. The code using this seems to be OK. - there is the following very bad code: n = min((u_int)(udfmp->bsize - on), uio->uio_resid); Here n has the correct type, but min() corrupts its uio->uio_resid arg to u_int. The corruption gives much the same implementation-defined bugs as truncation by assignment to "int uio_resid", except more than 1 of the misbehaviours may occur in a single syscall as uio->resid mod 2**32 crosses the 0 boundary. This and related code has many style bugs: - bogus cast to u_int. min() converts both its args to u_int. The conversion is normally (too) silent and the cast is even more silent. So the cast can at most break warnings about dubious types. - dubious and wrong types include: ssize_t n long size, on Except for the previous unrelated use of n, these variables are all for the block size and small offsets within a block. They should have type int or u_int. 'on' is short for "old n". It should have the same type as n. I think the type of n was changed to ssize_t so that its other use was correct. But it shouldn't have been used for unrelated variables. - n is bogusly cast to int in the call to uiomove() later. I think this dates from 1980's code that this was cloned from (n had type long for bogus reasons, but fitted on an int, and uiomove() wasn't prototyped so this cast was required to support K&R). Not quite similarly in cd9660_read(). It was just like udf_read() except its n is not aboved for an unrelated purpose and it still has type long. 'long' is accidentally the same as ssize_t on all supported arches, but isn't used to hold uio->uio_resid, and using it for n is bogus too, as above above. cd9660_read() used to use min() in the same place as udf_read(). But someone fixed the corruption of uio->uio_resid in min() by changing min() to MIN(). Using MIN() is a style bug. It was intentionally left out of 4.4BSD in the kernel, but contribed code kept using this deprecated KPI and the KPI was broken to match and then non- contribed code started using it too :-(. MIN() has the disadvantages of being unsafe and ugly. The min() family has the disadantage of not being type-generic; type errors using it are common. Not similarly in devfs. devfs_read() spells orig_resid worse as plain resid, but doesn't have the type error. Not similarly in fdescfs. It uses uio_resid in fdesc_readdir(). It has a lower limit of UIO_MAX instead of 1. Similarly in fuse. Not similarly in nandfs. It doesn't check for uio_resid < 0. It replaces min() by omin(). This gives another type that is harmless in practice. omin() is for off_t's, but is used with an ssize_t resid and an off_t filesize, after previously corrupting the filesize variable from a uint64_t to an off_t. 64 bits signed should be enough for anyone. The logic seems to be broken if uio_resid < 0 actually occurs. Not similarly in pseudofs. pfs_read() checks correctly for uio_resid < 0, but handles this by returning EINVAL instead of KASSERT(). It has a variable resid of type ssize_t, but doesn't do anything with this except assign uio_resid to it so as to get an overflow if the types are broken and the value doesn't fit. This made more sense when resid had type int. It still has an INT_MAX limit on buflen and a bogus overflow check for buflen (check for overflow after overflow has already caused undefined behaviour). The undefined behaviour can still easily occur on 32-bit systems, since exoanding resid and buflen to ssize_t doesn't really change them on 32-bit systems. pfs_readdir() still assigns uio_resid to "int resid". Then it has a lower limit of more than 0 on resid, as in fdesc_readdir() but now the check is broken since uio_resid was corrupted by assignment to resid. Not similarly in tmpfs. tmpfs manages to combine most of the bugs. tmpfs_read() starts with "int resid". Then it has null sanity checking for uio_resid (it does check uio_offset and handles it using EINVAL instead of KASSERT().) Then it corrupts uio_resid by assigning it to resid in the main loop. The loop continuation criterion is that corrupted resid is > 0. Of course it uses the MIN() style bug. It doesn't use the old 'n' and 'on' variable names. tmpfs_write() also uses "int resid", but first does some checks using the uncorrupted uio_resid. The one for EFBIG depends on undefined behaviour to work. Where ffs carefully casts the offset + resid expression to use uoff_t, tmpfs lets it overflow and then converts the result to u_int64_t (sic), provided off_t is no larger than 64 bits. Normally the overflow is benign and off_t is 64 bits, so this works. Many cases where uio_resid < 0 are detected accidentally here, since they cause offset + resid to be negative and converting this this to u_int64_t often overflows to a value > maxfilesize. Bruce