Date: Tue, 21 Mar 2006 16:25:48 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Mikhail Teterin <mi+mx@aldan.algebra.com> Cc: alc@freebsd.org, stable@freebsd.org Subject: Re: more weird bugs with mmap-ing via NFS Message-ID: <200603220025.k2M0PmCt007240@apollo.backplane.com> References: <200603211607.30372.mi%2Bmx@aldan.algebra.com> <200603211831.54172.mi%2Bmx@aldan.algebra.com> <20060321234856.GA24194@hugo10.ka.punkt.de> <200603211858.02801.mi%2Bmx@aldan.algebra.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:I don't specify either, but the default is UDP, is not it? Yes, the default is UDP. :> Now imagine a client that experiences this problem only :> sometimes. Modern hardware, but for some reason (network :> congestion?) some frames are still lost if sent back-to-back. :> (Realtek chipset on the receiving side?) : :No, both sides have em-cards and are only separated by a rather decent large :switch. : :I'll try the TCP mount, workaround. If it helps, we can assume, our UDP NFS is :broken for sustained high bandwidth writes :-( : :Thanks! : : -mi I can't speak for FreeBSD's current implementation, but it should be possible to determine whether there is an issue with packet drops or not by observing the network statistics via netstat -s. Generally speaking, however, I know of no problems with a UDP NFS mount per-say, at least as long reasonable values are chosen for the block size. The mmap() call in your mzip.c program looks ok to me with the exception of the use of PROT_WRITE. Try using PROT_READ|PROT_WRITE. The ftruncate() looks ok as well. If the program works over a local filesystem but fails to produce data in the output file on an NFS mount (but completes otherwise), then there is a bug in NFS somewhere. If the problem is simply due to the program stalling, and not completing due to the stalling, then it could be a problem with dropped packets in the network stack. If the problem is that the program simply runs very inefficiently over NFS, with excessive network bandwidth for the data being written (as you also reported), this is probably an artifact of attempting to use mmap() to write out the data, for reasons previously discussed. I would again caution against using mmap() to populate a file in this manner. Even with MADV_SEQUENTIAL there is no guarentee that the system will actually flush the pages to the actual file on the server sequentially, and you could end up with a very badly fragmented file. When a file is truncated to a larger size the underlying filesystem does not allocate the actual backing store on disk for the data hole created. Allocation winds up being based on the order in which the operating system flushes the VM pages. The VM system does its best, but it is really designed more as a random-access system rather then a sequential system. Pages are flushed based on memory availability and a thousand other factors and may not necessarily be flushed to the file in the order you think they should be. write() is really a much better way to write out a sequential file (on any operating system, not just BSD). -Matt Matthew Dillon <dillon@backplane.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200603220025.k2M0PmCt007240>