From owner-freebsd-stable@FreeBSD.ORG Tue Mar 21 22:48:37 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C2A6216A41F; Tue, 21 Mar 2006 22:48:37 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3C94E43D53; Tue, 21 Mar 2006 22:48:37 +0000 (GMT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.13.4/8.13.4) with ESMTP id k2LMmTBq006792; Tue, 21 Mar 2006 14:48:30 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.13.4/8.13.4/Submit) id k2LMmTMj006791; Tue, 21 Mar 2006 14:48:29 -0800 (PST) Date: Tue, 21 Mar 2006 14:48:29 -0800 (PST) From: Matthew Dillon Message-Id: <200603212248.k2LMmTMj006791@apollo.backplane.com> To: Mikhail Teterin References: <200603211607.30372.mi+mx@aldan.algebra.com> <200603212123.k2LLNMhO006344@apollo.backplane.com> <200603211717.34348.mi+mx@aldan.algebra.com> Cc: alc@freebsd.org, stable@freebsd.org Subject: Re: weird bugs with mmap-ing via NFS X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Mar 2006 22:48:37 -0000 : : [Moved from -current to -stable] : :צ×ÔÏÒÏË 21 ÂÅÒÅÚÅÎØ 2006 16:23, Matthew Dillon ÷É ÎÁÐÉÓÁÌÉ: :> š š You might be doing just writes to the mmap()'d memory, but the system :> š š doesn't know that. : :Actually, it does. The program tells it, that I don't care to read, what's :currently there, by specifying the PROT_READ flag only. That's an architectural flag. Very few architectures actually support write-only memory maps. IA32 does not. It does not change the fact that the operating system must validate the memory underlying the page, nor does it imply that the system shouldn't. :Sounds like a missed optimization opportunity :-( Even on architectures that did support write-only memory maps, the system would still have to fault in the rest of the data on the page, because the system would have no way of knowing which bytes in the page you wrote to (that is, whether you wrote to all the bytes in the page or whether you left gaps). The system does not take a fault for every write you issue to the page, only for the first one. So, no matter how you twist it, the system *MUST* validate the entire page when it takes the page fault. :> š š It kinda sounds like the buffer cache is getting blown out, but not :> š š having seen the program I can't really analyze it. : :See http://aldan.algebra.com/~mi/mzip.c I can't access this URL, it says 'not found'. :> š š It will always be more efficient to write to a file using write() then :> š š using mmap() : :I understand, that write() is much better optimized at the moment, but the :mmap interface carries some advantages, which may allow future OSes to :optimize their ways. The application can hint at its planned usage of the :data via madvise, for example. Yes, but those advantages are limited by the way memory mapping hardware works. There are some things that simply cannot be optimized through lack of sufficient information. Reading via mmap() is very well optimized. Making modifications via mmap() is optimized insofar as the expectation that the data is intended to be read, modified, and written back. It is not possible to optimize with the expectation that the data would only be written to the mmap, for the reasons described above. The hardware simply does not provide sufficient information to the operating system to optimize the write-only case. :Unfortunately, my problem, so far, is with it not writing _at all_... Not sure what is going on since I can't access the program yet, but I'd be happy to take a look at the code. The most common mistake people make when trying to write to a file via mmap() is that they forget to ftruncate() the file to the proper length first. Mapped memory beyond the file's EOF is ignored within the last page, and the program will take a page fault if it tries to write to mapped pages that are entire beyond the file's current EOF. Writing to mapped memory does *not* extend the size of a file. Only ftruncate() or write() can extend the size of a file. The second most common mistake is to forget to specify MAP_SHARED in the mmap() call. :Yes, this is an example of how a good implemented mmap can be better than :write. Without explicit writes by the application and without doubling the :memory requirements, the data can be written in the most optimal way. :... :Thanks for your help. Yours, : : -mi I don't think mmap()-based writing will EVER be more efficient then write() except in the case where the entire data set fits into memory and has been entirely cached by the system. In that one case writing via mmap will be faster. In all other cases the system will be taking as many VM faults on the pages as it would be taking system call faults to execute the write()'s. You are making a classic mistake by assuming that the copying overhead of a write() into the file's backing store, verses directly mmap()ing the file's backing store, represents a large chunk of the overhead for the operation. In fact, the copying overhead represents only a small chunk of the related overhead. The vast majority of the overhead is always going to be the disk I/O itself. I/O must occur even in the cached/delayed-write case so on a busy system it still represents the greatest overhead from the point of view of system load. On a lightly loaded system nobody is going to care about a few milliseconds of improved performance here and there since, by definition, the system is lightly loaded and thus has plenty of idle cpu and I/O cycles to spare. -Matt Matthew Dillon