Date: Thu, 16 Feb 2012 07:02:48 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-standards@freebsd.org, Nicolas Bourdaud <nicolas.bourdaud@gmail.com> Subject: Re: write system call violates POSIX standard Message-ID: <20120216054457.H3935@besplex.bde.org> In-Reply-To: <20120215163800.GA3283@deviant.kiev.zoral.com.ua> References: <4F3BC2DB.6080703@gmail.com> <20120215163800.GA3283@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 15 Feb 2012, Konstantin Belousov wrote: > On Wed, Feb 15, 2012 at 03:36:11PM +0100, Nicolas Bourdaud wrote: >> When a write() cannot transfer as many bytes as requested (because of a >> file limit), it fails instead of transferring as many bytes as there is >> room to write. >> >> This is a violation of the POSIX standard: >> http://pubs.opengroup.org/onlinepubs/007904975/functions/write.html >> ... > It seems that you are right. Here is a corresponding test to show the complete brokenness of RLIMIT_FSIZE for [f]truncate(): %%% #include <sys/resource.h> #include <sys/stat.h> #include <err.h> #include <fcntl.h> #include <signal.h> #include <stdint.h> #include <stdio.h> #define LIMSIZE 60000 int main(void) { struct rlimit lim; struct stat sb; int fd; if (signal(SIGXFSZ, SIG_IGN) == SIG_ERR) err(1, "signal"); if (getrlimit(RLIMIT_FSIZE, &lim) != 0) err(1, "getrlimit"); lim.rlim_cur = LIMSIZE; if (setrlimit(RLIMIT_FSIZE, &lim) != 0) err(1, "setrlimit"); fd = open("result.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644); if (fd < 0) err(1, "open"); if (fstat(fd, &sb) != 0) err(1, "first stat"); if (sb.st_size != 0) errx(1, "O_TRUNC failed to truncate the file"); if (ftruncate(fd, 2 * LIMSIZE) != 0) err(1, "ftruncate"); if (fstat(fd, &sb) != 0) err(1, "stat"); warnx("size = %jd", (intmax_t)sb.st_size); if (sb.st_size == 2 * LIMSIZE) errx(1, "ftruncate failed to honour RLIMIT_FSIZE, as expected"); if (sb.st_size != 0) errx(1, "ftruncate worked incorrectly, but not as expected"); errx(0, "ftruncate worked correctly, but not as expected"); } %%% > A solution could be to return an error if uio->uio_offset itself is > larger them RLIMIT_FSIZE. If it is less then the limit, the function ^ or equal to, and the count is not 0 (?) > could trim the supplied uio at the RLIMIT_FSIZE value instead. > > Do you want to work on the patch ? Only indirectly for me :-). Note that both of these are XSI extensions, and FreeBSD doesn't claim to support XSI, so it doesn't have to duplicate any XSI bugs in these APIs. But it is clearly a bug to not honor the rlimit at all. Anyone can try to create multiple-petabyte files in FreeBSD, and often succeed, and such files may take a lot of space for metadata although all blocks beyond the rlimit must be zero. Note that the error handling is different but simpler for [f]truncate(). The current error checking in vfs is correct for these. Except, see below about null changes. There is another thread (PR or POSIX mail?) about truncate() having different, broken, semantics than iftruncate() when its effect is null. POSIX specifies that ftruncate() shall mark times for update on successful completion, as usual, but POSIX specifies that "if the file size is changed, this function shall mark for update...". This is a bug in POSIX, but Linux apparently implemented it. FreeBSD doesn't implement it, at least in ffs. In ffs_update(), the implementation is to check for this case early and do nothing except mark for update before returning. POSIX has fuzzy wording for the interaction of these bugs. Suppose that the file size is already larger than the rlimit, and we try to truncate it to its current size. Is this a null change or an EFBIG error? POSIX only says (for [f]truncate) that "if the request _would_ _cause_ the file size to exceed the soft file limit, [then it is an error]". I think a null change "wouldn't cause" the file to exceed the limit in this case, because the cause of exceeding the limit is that the limit was already exceeded. However, it takes a delicate reading of "would case" to get this interpretation, and FreeBSD never did it this way in cases where it actually checks the limit -- for write(), the limit is checked before even looking at the current file size. The centralization of the limit checking makes it harder to change this, because the central function doesn't know the file size. Truncations that would reduce the file size from beyond the limit to less beyond the limit are also interesting. Are these allowed? Now they cause something, but they don't cause the file size to exceed the limit, so a strict reading of "would cause" again allows them. write() has some very nice, different bugs depending on the interpretation of to the corresponding "would cause" for it. In FreeBSD, because the limit checking is done before even looking at the size of the file, write()s to the middle of a big file are rejected if they would extend past the limit. But the POSIX specification is that "if the request _would_ _cause_ the file size to exceed the soft file limit, [then as for truncate, except it is not an error if the write starts before the limit, and bytes shall be written if possible up to the limit in this case]". This wording is not very different that that for ftruncate, but now it seems even harder to blame the write for causing the limit to be exceeded if the write would be in the middle of the file. It seems useful to allow writing in the middle of a big file irrespective of the limit, to allow not-fully-trusted applications to scribble in a big file that you have reserved for them. But the above bug in ftruncate becomes enormous if you allow writing in the middle of a big file that the bug has allowed creation of. Just above the paragraph about this, handling related to ENOSPC is specified as "if a write() requests that more bytes be written than there is room for (for example, ... the physical end of the medium), only as many bytes as there is room for shall be written". This is a little fuzzy, but it seems to be intended to mean that these bytes shall be written with no error if possible. This disallows the ffs treatment of backing out of the entire write after an ENOSPC error in the same way as after an EIO error. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120216054457.H3935>