Date: Thu, 31 Dec 2015 14:03:43 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: "Jonathan T. Looney" <jtl@freebsd.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r292955 - head/lib/libmd Message-ID: <20151231115651.R995@besplex.bde.org> In-Reply-To: <201512301804.tBUI4oGp065466@repo.freebsd.org> References: <201512301804.tBUI4oGp065466@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 30 Dec 2015, Jonathan T. Looney wrote: > Log: > Fix a file descriptor leak in mdXhl.c (which is used by numerous hashing > algorithms. This code had amazingly low quality and is still especially broken near the main leak fixed. > Modified: head/lib/libmd/mdXhl.c > ============================================================================== > --- head/lib/libmd/mdXhl.c Wed Dec 30 17:36:34 2015 (r292954) > +++ head/lib/libmd/mdXhl.c Wed Dec 30 18:04:50 2015 (r292955) > @@ -59,14 +59,18 @@ MDXFileChunk(const char *filename, char > f = open(filename, O_RDONLY); > if (f < 0) > return 0; > - if (fstat(f, &stbuf) < 0) > - return 0; > + if (fstat(f, &stbuf) < 0) { > + i = -1; > + goto error; > + } > if (ofs > stbuf.st_size) > ofs = stbuf.st_size; st_size is only valid for regular files. The first bug is using it here. Usually it is 0 when it is invalid. This code sets the offset to 0 then. I think this is just to avoid a false negative in the buggy seek test. > if ((len == 0) || (len > stbuf.st_size - ofs)) > len = stbuf.st_size - ofs; Style bugs (extra parentheses) and second use of the invalid st_size. > - if (lseek(f, ofs, SEEK_SET) < 0) > - return 0; > + if (lseek(f, ofs, SEEK_SET) < 0) { > + i = -1; > + goto error; > + } md5 is very broken by using this function for non-seekable files. The main case of a non-seekable file is a pipe. This case works for at least md5(1) by not using this function -- it uses MDFilter() which uses stdio's fread() which works. (stdio knows better than to use st_size, but it naively trusts st_blksize and uses it to give a pessimized small block size for read() called from fread(). This code handles buffer sizing and allocation worse using its low quality methods: - its buffer used to have size BUFSIZ. This size is not usable for anything except possibly buffering keyboard input and stderr. It is mainly part of the broken setbuf() API. Its value is 1024, perhaps to avoid changing this API. 1024 large enough for more uses in 1980. - its buffer now has size 16*1024 (spelled with a style bug like that). This sometimes matches and sometimes exceeds the size used by stdio. stdio at least attempts to choose the best size, but is defeated by stat() putting useless sizes in st_blksize. - it also tries to pessimize the i/o by asking for a misaligned buffer. Its buffer is just a local char[] variable. Compilers usally mostly foil this attempt by aligning large variables on the stack. Misaligned usr buffers should only pessimize the i/o, but last time I looked they cause DMA errors in ahci. This breaks bsdlabel(8). bsdlabel ask for a misaligned buffer and gets it at least when statically linked because its buffer is a global char [] variable and the neither the compiler nor the linker gives more alignment than requested. The DMA error would break md5 (or just cat) similarly if the buffer were misaligend. But the main bug here breaks md5 on disks without trying any i/o. Back to the main bug. This seek test detects some cases of non-regular files. These files indeed cannot be handled by this code. But the error handling is broken -- it is just to set errno and return 0 (EOF). 0 means no error and errno is no set. So all non-seekable files are treated as empty regular files. E.g., md5 /proc/0/* gives: md5: /proc/0/ctl: Permission denied MD5 (/proc/0/cmdline) = d41d8cd98f00b204e9800998ecf8427e MD5 (/proc/0/etype) = d41d8cd98f00b204e9800998ecf8427e MD5 (/proc/0/rlimit) = d41d8cd98f00b204e9800998ecf8427e MD5 (/proc/0/status) = d41d8cd98f00b204e9800998ecf8427e (/proc gives my favourite examples of irregular regular files. It doesn't work to use S_ISREG(&stbuf) to determine if st_size can be trusted, since too many irregular files claim to be regular: e.g., file /proc/0/* gives: /proc/0/cmdline: empty /proc/0/ctl: empty /proc/0/etype: empty /proc/0/rlimit: empty /proc/0/status: empty wc /proc/0/* works. md5 works like wc using a hack to avoid calling this broken function. E.g., for i in $(ls /proc/0/*); do echo -n "$i: "; md5 <$i; done # gives /proc/0/cmdline: 3c5896b1ac441f4998f052e2126e8d20 /proc/0/ctl: d41d8cd98f00b204e9800998ecf8427e /proc/0/etype: 674441960ca1ba2de08ad4e50c9fde98 /proc/0/rlimit: 67d6ad67b412e1bceb7cb508a3492197 /proc/0/status: 3ccc3067b97c3232ea2dbcb64c458fd4 > n = len; > i = 0; > while (n > 0) { Disk device files are seekable, so the lseek test doesn't result in mishandling them. Instead, they are treated as empty files since their st_size is 0. E.g., md5 /dev/ad* gives: MD5 (/dev/ad0) = d41d8cd98f00b204e9800998ecf8427e MD5 (/dev/ad0s1) = d41d8cd98f00b204e9800998ecf8427e MD5 (/dev/ad0s2) = d41d8cd98f00b204e9800998ecf8427e MD5 (/dev/ad0s2a) = d41d8cd98f00b204e9800998ecf8427e ... The workarounds are much the same as for non-seekable files: for i in /dev/ad0s2a; do echo -n "$i: "; md5 <$i; done # gives /dev/ad0s2a: 3b4c134a41615cbc60ef778acd188ed4 Most disk devices are too large to run md5 on, but this partition has size 64MB. I sometimes run md5 and other utilities like cmp on partitions or USB disks of size up to 10GB and get annoyed by the i/o pessimizations. It is faster to copy everything to a regular file and check that. cp has reasonably good i/o buffer sizes and the disk cache works well. However, copying a disk to a regular file and looking at its partitions there using md(4) works even worse. md(4) defeats the disk cache using IO_DIRECT. > @@ -79,6 +83,7 @@ MDXFileChunk(const char *filename, char > MDXUpdate(&ctx, buffer, i); > n -= i; > } > +error: > e = errno; > close(f); > errno = e; I don't understand the plumbing for MDXFileChunk(). Usually the whole file is wanted. MDXFileChunk() cannot work with a nonzero offset on non-seekable files. But it is preferred to the working MDFilter() even in the usual case of a zero offset. MDFilter() is actually a static function in md5(1). libmd apparently has no working input function, since MDXFileChunk() is its only function that uses *read(). MDXFile() is just a wrapper around MDXFileChunk(). The seek test is also broken for certain offsets. FreeBSD supports kseeking to negative offsets on device files. Such offsets occur for high addresses in /dev/kmem on 64-bit systems. errno should be set to 0 before the call and used instead of the return value to determine if the syscall failed. See dd/dd.c for a seek test that is closer to working. dd checks for EPIPE. It also messes around with S_ISCHR(), S_ISBLK() and FIODTYPE. Its algorithm is: - trust S_ISCHR() and S_ISBLK() and don't do the general lseek test for these cases. These can be trusted more than S_ISREG(). - for device files according to this classification, abort the whole program if FDIOTYPE fails. Trust it for setting ISSEEK and ISTAPE flags if it succeeds. - otherwise, do a general lseek test with an EPIPE. MD shouldn't try as hard as dd since there are probably bugs in the complications. But it should try to fail safe since it is more security-related than dd. That means bailing out on any detected error and not trusting st_size unless it is nonzero even if the file is regular. MDXFileChunk()'s error handling seems to be to return 0 on all errors. This is undocumented, and is ambiguous since 0 is also returned for empty files and for files misclassified as empty. The ambiguity can sometimes be resolved by checking errno, but libmd and md5 never check errno. md5 ends up never checking the error and printing d41d8cd98f00b204e9800998ecf8427e for both empty files and for all errors that occur before any bytes are handled. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151231115651.R995>