From owner-freebsd-hackers@freebsd.org Thu Jun 30 06:14:52 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BEDC5B876E0 for ; Thu, 30 Jun 2016 06:14:52 +0000 (UTC) (envelope-from andrewbates09@gmail.com) Received: from mail-qk0-x231.google.com (mail-qk0-x231.google.com [IPv6:2607:f8b0:400d:c09::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7B07C252F for ; Thu, 30 Jun 2016 06:14:52 +0000 (UTC) (envelope-from andrewbates09@gmail.com) Received: by mail-qk0-x231.google.com with SMTP id a125so128459221qkc.2 for ; Wed, 29 Jun 2016 23:14:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=QGmhrXJ1jmKaZKqfJtWycwTRHjwcL1rvUWToSgF7sEU=; b=ly+SznhbJyMFh/Uc08okaID+cV9gmRpSngddXL+G6ZhZmNrR+MuHZK+SCjJLAIKVfY MWjINOzlzWu0MB/zflH5RHc1ApeiPpf5gnP8xNMx/eL9xkn1ItnPZS5jODPj2bKYeH+n T2G3b5DY0Ilm97WBsUDVursB9hHND5wiM798+qnCds3JFDpvWlnsoifkBu8vcjlx5oxV DaTeokJ7Kpr4P2FeI69/3E7+05EWxiYy5+MNxdWFjiv2aga4VhXMK5Jnn3Zeo2j2CaJo uaSyqqoy6n4XLOLigDZiBFs7/yRdR9ApQhQTSMm7bWu25bFsmVgz562GJGhA5IbuLQDF g4gQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=QGmhrXJ1jmKaZKqfJtWycwTRHjwcL1rvUWToSgF7sEU=; b=Rx+UTHoFyKlQvTylueyTr2+gfIeHt4SKzV7obp+zOhuO+jG/f3e2iO7nfkUYG/Ya37 WxY6JhxBEE2w1swbJShAxkrQa5Rny3uh0VZLbxyCsGHSZK+lQniQbqlmO5bVZDE4JOdX nG2t8z8MLMIIvCQnZCNstJAiqS8AWNE8oaqn65MXE1+HURKX1pCTMEM8pXtuIyFzvehZ PZ3+rFB9zzbCxlJlPHOo5oeW11Tpqogj4rlzPFWSPj4/Fz0uXh9lSXhqYqkLfjDHaqxY syBmdC7E4twLyfzsmG0q8OVwvb2FBdtYQ2rtnZ8ykuofWVn9+90aSdxM15Waf9Bsdh3Q Izzg== X-Gm-Message-State: ALyK8tLCE/4rlvFR8kQRecEcpzKgCVJz05mgvLwSxLUlZuajXTK0ecBzTAti8Q2aB4AM98maaknrjWufKenRng== X-Received: by 10.55.195.75 with SMTP id a72mr17123264qkj.4.1467267291650; Wed, 29 Jun 2016 23:14:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.233.222.4 with HTTP; Wed, 29 Jun 2016 23:14:51 -0700 (PDT) In-Reply-To: <20160630140625.3b4aece3@splash.akips.com> References: <20160630140625.3b4aece3@splash.akips.com> From: Andrew Bates Date: Wed, 29 Jun 2016 23:14:51 -0700 Message-ID: Subject: Re: ZFS ARC and mmap/page cache coherency question To: Paul Koch Cc: "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2016 06:14:52 -0000 Heya Paul, How is your ZFS configured ( zfs get all tank0 )? These certainly aren't absolute, law, or perfect - but if you haven't yet, I suggest you take a peek at the following: * http://open-zfs.org/wiki/Performance_tuning * https://www.joyent.com/blog/bruning-questions-zfs-record-size * http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide On Wed, Jun 29, 2016 at 9:06 PM, Paul Koch wrote: > > Posted this to -stable on the 15th June, but no feedback... > > We are trying to understand a performance issue when syncing large mmap'ed > files on ZFS. > > Example test box setup: > FreeBSD 10.3-p5 > Intel i7-5820K 3.30GHz with 64G RAM > 6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe > > Read performance of a sequentially written large file on the pool is > typically around 950Mbytes/sec using dd. > > Our software mmap's some large database files using MAP_NOSYNC, and we call > fsync() every 10 minutes when we know the file system is mostly idle. In > our test setup, the database files are 1.1G, 2G, 1.4G, 12G, 4.7G and ~20 > small files (under 10M). All of the memory pages in the mmap'ed files are > updated every minute with new values, so the entire mmap'ed file needs to > be > synced to disk, not just fragments. > > When the 10 minute fsync() occurs, gstat typically shows very little disk > reads and very high write speeds, which is what we expect. But, every 80 > minutes we process the data in the large mmap'ed files and store it in > highly > compressed blocks of a ~300G file using pread/pwrite (i.e. not mmap'ed). > After that, the performance of the next fsync() of the mmap'ed files falls > off a cliff. We are assuming it is because the ARC has thrown away the > cached data of the mmap'ed files. gstat shows lots of read/write > contention > and lots of things tend to stall waiting for disk. > > Is this just a lack of ZFS ARC and page cache coherency ?? > > Is there a way to prime the ARC with the mmap'ed files again before we call > fsync() ? > > We've tried cat and read() on the mmap'ed files but doesn't seem to touch > the > disk at all and the fsync() performance is still poor, so it looks like the > ARC is not being filled. msync() doesn't seem to be much different. > mincore() stats show the mmap'ed data is entirely incore and referenced. > > Paul. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > -- V/Respectfully, Andrew M Bates