Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 03 Jul 2016 00:45:24 -0700
From:      Matthew Macy <mmacy@nextbsd.org>
To:        "Paul Koch" <paul.koch137@gmail.com>
Cc:        "Cedric Blancher" <cedric.blancher@gmail.com>,  "freebsd-hackers" <freebsd-hackers@freebsd.org>
Subject:   Re: ZFS ARC and mmap/page cache coherency question
Message-ID:  <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
In-Reply-To: <20160703123004.74a7385a@splash.akips.com>
References:  <20160630140625.3b4aece3@splash.akips.com> <CALXu0UfxRMnaamh%2Bpo5zp=iXdNUNuyj%2B7e_N1z8j46MtJmvyVA@mail.gmail.com> <20160703123004.74a7385a@splash.akips.com>

next in thread | previous in thread | raw e-mail | index | archive | help

       =20

       =20
            Cedric greatly overstates the intractability of resolving it. N=
onetheless, since the initial import very little has=C2=A0been done to impr=
ove integration, and I don't know of anyone who is up to the task taking an=
 interest in it. Consequently, mmap() performance is likely "doomed" for th=
e foreseeable future.-M---- On Sat, 02 Jul 2016 19:30:04 -0700  Paul Koch<p=
aul.koch137@gmail.com> wrote ---- Is there a "long story", or is mmap() per=
formance on ZFS doomed for the foreseeable future ?  =C2=A0=C2=A0=C2=A0=C2=
=A0Paul.  > Short story: ZFS was tacked on the kernel and was never properl=
y > integrated into the VM page management, which leads to DRAMATIC poor > =
performance for anything which uses mmap() for write IO. This was > solved =
in Oracle Solaris with the great VM allocator rewrite which > landed after =
Opensolaris was made closed source again. >  > Without a complete rewrite o=
f the VM system this problem is unsolvable. >  > Ced >  > On 30 June 2016 a=
t 06:06, Paul Koch <paul.koch137@gmail.com> wrote: > > > > Posted this to -=
stable on the 15th June, but no feedback... > > > > We are trying to unders=
tand a performance issue when syncing large mmap'ed > > files on ZFS. > > >=
 > Example test box setup: > >  FreeBSD 10.3-p5 > >  Intel i7-5820K 3.30GHz=
 with 64G RAM > >  6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe >=
 > > > Read performance of a sequentially written large file on the pool is=
 > > typically around 950Mbytes/sec using dd. > > > > Our software mmap's s=
ome large database files using MAP_NOSYNC, and we > > call fsync() every 10=
 minutes when we know the file system is mostly > > idle.  In our test setu=
p, the database files are 1.1G, 2G, 1.4G, 12G, > > 4.7G and ~20 small files=
 (under 10M).  All of the memory pages in the > > mmap'ed files are updated=
 every minute with new values, so the entire > > mmap'ed file needs to be s=
ynced to disk, not just fragments. > > > > When the 10 minute fsync() occur=
s, gstat typically shows very little disk > > reads and very high write spe=
eds, which is what we expect.  But, every 80 > > minutes we process the dat=
a in the large mmap'ed files and store it in > > highly compressed blocks o=
f a ~300G file using pread/pwrite (i.e. not > > mmap'ed). After that, the p=
erformance of the next fsync() of the mmap'ed > > files falls off a cliff. =
 We are assuming it is because the ARC has > > thrown away the cached data =
of the mmap'ed files.  gstat shows lots of > > read/write contention and lo=
ts of things tend to stall waiting for disk. > > > > Is this just a lack of=
 ZFS ARC and page cache coherency ?? > > > > Is there a way to prime the AR=
C with the mmap'ed files again before we > > call fsync() ? > > > > We've t=
ried cat and read() on the mmap'ed files but doesn't seem to touch > > the =
disk at all and the fsync() performance is still poor, so it looks > > like=
 the ARC is not being filled.  msync() doesn't seem to be much > > differen=
t. mincore() stats show the mmap'ed data is entirely incore and > > referen=
ced. > > > >         Paul. > > ____________________________________________=
___ > > freebsd-hackers@freebsd.org mailing list > > https://lists.freebsd.=
org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to >=
 > "freebsd-hackers-unsubscribe@freebsd.org"   ____________________________=
___________________ freebsd-hackers@freebsd.org mailing list https://lists.=
freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail =
to "freebsd-hackers-unsubscribe@freebsd.org"=20
       =20
       =20

   =20
   =20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?155afb8148f.c6f5294d33485.2952538647262141073>