From owner-freebsd-hackers@freebsd.org Sun Jul 3 07:45:28 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 62A4AB902AE for ; Sun, 3 Jul 2016 07:45:28 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from sender163-mail.zoho.com (sender163-mail.zoho.com [74.201.84.163]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 52B3C2D32 for ; Sun, 3 Jul 2016 07:45:27 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1467531924887811.5045019290758; Sun, 3 Jul 2016 00:45:24 -0700 (PDT) Date: Sun, 03 Jul 2016 00:45:24 -0700 From: Matthew Macy To: "Paul Koch" Cc: "Cedric Blancher" , "freebsd-hackers" Message-ID: <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> In-Reply-To: <20160703123004.74a7385a@splash.akips.com> References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> Subject: Re: ZFS ARC and mmap/page cache coherency question MIME-Version: 1.0 User-Agent: Zoho Mail X-Mailer: Zoho Mail Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Jul 2016 07:45:28 -0000 =20 =20 Cedric greatly overstates the intractability of resolving it. N= onetheless, since the initial import very little has=C2=A0been done to impr= ove integration, and I don't know of anyone who is up to the task taking an= interest in it. Consequently, mmap() performance is likely "doomed" for th= e foreseeable future.-M---- On Sat, 02 Jul 2016 19:30:04 -0700 Paul Koch wrote ---- Is there a "long story", or is mmap() per= formance on ZFS doomed for the foreseeable future ? =C2=A0=C2=A0=C2=A0=C2= =A0Paul. > Short story: ZFS was tacked on the kernel and was never properl= y > integrated into the VM page management, which leads to DRAMATIC poor > = performance for anything which uses mmap() for write IO. This was > solved = in Oracle Solaris with the great VM allocator rewrite which > landed after = Opensolaris was made closed source again. > > Without a complete rewrite o= f the VM system this problem is unsolvable. > > Ced > > On 30 June 2016 a= t 06:06, Paul Koch wrote: > > > > Posted this to -= stable on the 15th June, but no feedback... > > > > We are trying to unders= tand a performance issue when syncing large mmap'ed > > files on ZFS. > > >= > Example test box setup: > > FreeBSD 10.3-p5 > > Intel i7-5820K 3.30GHz= with 64G RAM > > 6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe >= > > > Read performance of a sequentially written large file on the pool is= > > typically around 950Mbytes/sec using dd. > > > > Our software mmap's s= ome large database files using MAP_NOSYNC, and we > > call fsync() every 10= minutes when we know the file system is mostly > > idle. In our test setu= p, the database files are 1.1G, 2G, 1.4G, 12G, > > 4.7G and ~20 small files= (under 10M). All of the memory pages in the > > mmap'ed files are updated= every minute with new values, so the entire > > mmap'ed file needs to be s= ynced to disk, not just fragments. > > > > When the 10 minute fsync() occur= s, gstat typically shows very little disk > > reads and very high write spe= eds, which is what we expect. But, every 80 > > minutes we process the dat= a in the large mmap'ed files and store it in > > highly compressed blocks o= f a ~300G file using pread/pwrite (i.e. not > > mmap'ed). After that, the p= erformance of the next fsync() of the mmap'ed > > files falls off a cliff. = We are assuming it is because the ARC has > > thrown away the cached data = of the mmap'ed files. gstat shows lots of > > read/write contention and lo= ts of things tend to stall waiting for disk. > > > > Is this just a lack of= ZFS ARC and page cache coherency ?? > > > > Is there a way to prime the AR= C with the mmap'ed files again before we > > call fsync() ? > > > > We've t= ried cat and read() on the mmap'ed files but doesn't seem to touch > > the = disk at all and the fsync() performance is still poor, so it looks > > like= the ARC is not being filled. msync() doesn't seem to be much > > differen= t. mincore() stats show the mmap'ed data is entirely incore and > > referen= ced. > > > > Paul. > > ____________________________________________= ___ > > freebsd-hackers@freebsd.org mailing list > > https://lists.freebsd.= org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to >= > "freebsd-hackers-unsubscribe@freebsd.org" ____________________________= ___________________ freebsd-hackers@freebsd.org mailing list https://lists.= freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail = to "freebsd-hackers-unsubscribe@freebsd.org"=20 =20 =20 =20 =20