Date: Sun, 3 Jul 2016 12:30:04 +1000 From: Paul Koch <paul.koch137@gmail.com> To: Cedric Blancher <cedric.blancher@gmail.com> Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org> Subject: Re: ZFS ARC and mmap/page cache coherency question Message-ID: <20160703123004.74a7385a@splash.akips.com> In-Reply-To: <CALXu0UfxRMnaamh%2Bpo5zp=iXdNUNuyj%2B7e_N1z8j46MtJmvyVA@mail.gmail.com> References: <20160630140625.3b4aece3@splash.akips.com> <CALXu0UfxRMnaamh%2Bpo5zp=iXdNUNuyj%2B7e_N1z8j46MtJmvyVA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Is there a "long story", or is mmap() performance on ZFS doomed for the foreseeable future ? Paul. > Short story: ZFS was tacked on the kernel and was never properly > integrated into the VM page management, which leads to DRAMATIC poor > performance for anything which uses mmap() for write IO. This was > solved in Oracle Solaris with the great VM allocator rewrite which > landed after Opensolaris was made closed source again. > > Without a complete rewrite of the VM system this problem is unsolvable. > > Ced > > On 30 June 2016 at 06:06, Paul Koch <paul.koch137@gmail.com> wrote: > > > > Posted this to -stable on the 15th June, but no feedback... > > > > We are trying to understand a performance issue when syncing large mmap'ed > > files on ZFS. > > > > Example test box setup: > > FreeBSD 10.3-p5 > > Intel i7-5820K 3.30GHz with 64G RAM > > 6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe > > > > Read performance of a sequentially written large file on the pool is > > typically around 950Mbytes/sec using dd. > > > > Our software mmap's some large database files using MAP_NOSYNC, and we > > call fsync() every 10 minutes when we know the file system is mostly > > idle. In our test setup, the database files are 1.1G, 2G, 1.4G, 12G, > > 4.7G and ~20 small files (under 10M). All of the memory pages in the > > mmap'ed files are updated every minute with new values, so the entire > > mmap'ed file needs to be synced to disk, not just fragments. > > > > When the 10 minute fsync() occurs, gstat typically shows very little disk > > reads and very high write speeds, which is what we expect. But, every 80 > > minutes we process the data in the large mmap'ed files and store it in > > highly compressed blocks of a ~300G file using pread/pwrite (i.e. not > > mmap'ed). After that, the performance of the next fsync() of the mmap'ed > > files falls off a cliff. We are assuming it is because the ARC has > > thrown away the cached data of the mmap'ed files. gstat shows lots of > > read/write contention and lots of things tend to stall waiting for disk. > > > > Is this just a lack of ZFS ARC and page cache coherency ?? > > > > Is there a way to prime the ARC with the mmap'ed files again before we > > call fsync() ? > > > > We've tried cat and read() on the mmap'ed files but doesn't seem to touch > > the disk at all and the fsync() performance is still poor, so it looks > > like the ARC is not being filled. msync() doesn't seem to be much > > different. mincore() stats show the mmap'ed data is entirely incore and > > referenced. > > > > Paul. > > _______________________________________________ > > freebsd-hackers@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to > > "freebsd-hackers-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160703123004.74a7385a>