From owner-freebsd-bugs@FreeBSD.ORG Sun Feb 5 18:54:54 2012 Return-Path: Delivered-To: freebsd-bugs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 752A4106566B; Sun, 5 Feb 2012 18:54:54 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au [211.29.132.190]) by mx1.freebsd.org (Postfix) with ESMTP id 0DB168FC19; Sun, 5 Feb 2012 18:54:53 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q15Iso5N028748 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 6 Feb 2012 05:54:51 +1100 Date: Mon, 6 Feb 2012 05:54:50 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Nicolas Bourdaud In-Reply-To: <201202051142.q15Bgrh6041302@red.freebsd.org> Message-ID: <20120206050042.E2728@besplex.bde.org> References: <201202051142.q15Bgrh6041302@red.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-bugs@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org Subject: Re: kern/164793: 'write' system call violates POSIX standard X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Feb 2012 18:54:54 -0000 On Sun, 5 Feb 2012, Nicolas Bourdaud wrote: >> Description: > When a write() cannot transfer as many bytes as requested (because of a file > limit), it fails instead of transferring as many bytes as there is room to > write. > > This is a violation of the POSIX standard: > http://pubs.opengroup.org/onlinepubs/007904975/functions/write.html FreeBSD's handling of the maxfilesize limits is similar, so it has the same bug. This affects many fileystems which copied the buggy code from ffs. (Both truncate() and write() fail if extending to or writing the full number of bytes would exceed the limit. This is correct for truncate(), but write() is required to creep up on the limit.) I think this is actually a bug in POSIX (XSI). Most programs aren't prepared to deal with short writes, and returning an error like truncate() is specified to is adequate. For regular files, most file systems in FreeBSD back out of writes after an i/o error, using ftruncate() (some truncation is necessary for security, since the place at which the error occurred is usually not known precisely), so the following bug in the upper layer rarely matters. From an old version of sys_generic.c, for writing (reading has a similar bug): % if ((error = fo_write(fp, &auio, td->td_ucred, flags, td))) { % /* XXX short write botch. */ % if (auio.uio_resid != cnt && (error == ERESTART || % error == EINTR || error == EWOULDBLOCK)) % error = 0; The XXX comment is only in my version. Here (auio.uio_resid != cnt) means that some i/o was done. In that case, write() is required to return the amount done, with no error, which is implemented by setting `error' to 0. But this is only done if `error' is one of ERESTART, EINTR or EWOULDBLOCK. At least the case of the most common error that is not one of these, namely EIO, is broken. The handling of the special 3 here is delicate: - ERESTART: hopefully can't happen, since if it happens then we should restart. This error is a non-error that in most cases means that the we handled a signal but are not returning with EINTR because SA_RESTART says to restart instead of returning. - EINTR: since we have this and not ERESTART, it is clearly correct to return, but if we did some i/o then we must return its amount and there is no way to return EINTR. - EWOULDBLOCK: similar to EINTR for a SIGALRM, but more precise. I guess this is here since it is the only other common error, and it is not really an error so failing for it would be obviously wrong (except when no i/o was done, EWOULDBLOCK = EAGAIN is the standard way to indicate this). The flag that controls backing out of writes is IO_UNIT. This is always set for write(2), and probably should be set unconditionally (so it shouldn't exist), since not setting it mainly asks for security holes and most cases are write(2) anyway. IO_UNIT means that the i/o is done as an "atomic unit". The semantics of "unit" probably includes doing all of it or none of it, so it would have to be broken to match the POSIX spec. > Patch attached with submission follows: > ... > int main(void) > { > struct rlimit lim; > int fd; > ssize_t retc; > size_t count = 0; > const char pattern[PATTSIZE] = "Hello world!"; > > signal(SIGXFSZ, SIG_IGN); > lim.rlim_cur = LIMSIZE; > setrlimit(RLIMIT_FSIZE, &lim); This is missing initialization of at least lim.rlim_max in lim. This gave the bizarre behaviour that when the program was statically linked, it failed for the first write, because the stack garbage for lim.rlim_max happened to be 0. Bruce