From owner-freebsd-hackers@freebsd.org Sun Jul 3 02:30:19 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3160BB8FD8E for ; Sun, 3 Jul 2016 02:30:19 +0000 (UTC) (envelope-from paul.koch137@gmail.com) Received: from mail-pa0-x22b.google.com (mail-pa0-x22b.google.com [IPv6:2607:f8b0:400e:c03::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0888228AC for ; Sun, 3 Jul 2016 02:30:19 +0000 (UTC) (envelope-from paul.koch137@gmail.com) Received: by mail-pa0-x22b.google.com with SMTP id zl15so8648703pab.3 for ; Sat, 02 Jul 2016 19:30:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PuTrRoDEsNQYFAUu5pKW9A9MTimqKttEYHuTr+cldgY=; b=kFzUvI8nHwSI+GJ0/geizmv7pFpNtoDzYxiO965F/zvJLoWIFZ0mqNzjkQNOdK1QlO AeevwnC5vhojS1ghzv4OuGzSWsHU9tKiJXLhbzBODjehDrLSnAMiZQwsoGGC6KiI0nli 21U8qN2EoqoFX1yCtpCBBJoRH1XA0QR/ipq2ksqnnGMgiiFFqCeSedkuM5BnX3qljPUZ jepSkOAYnDB4ylmwIFvH+HMvrue9nnnnnaxq8E7wMJFaIEwzUaGjTSAhRKRWGStrt8V8 LLpGc/XZacY/IvD2reoKNTUbWPjeOSvQmfr7U6gTs6993DZSBKT8N/GkEnXUjADHD7bP lLpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PuTrRoDEsNQYFAUu5pKW9A9MTimqKttEYHuTr+cldgY=; b=QYhEG2O3X9qSfIhDn6nhuTQUZguJ7KntyrjejbAdZoJ8TRmF5xmnI8Pq1cPnhMoXx7 W6dR9Sq6dUaK0rL1fGtr1Xxa2UebsM12yZAjlabLtuitNYeG/LXFcHQJoO/kWtGdBBsh 2bUql57/KADXsr5MpUMZyl+LfudM+ZDCHhgeJ62q8lkDfE6MKxTDXy7CAX70dLGtoWL8 N7Qtwiic1VSqUXfBJzxg+53RDwGCpsijnufoa3ghdKEIPx99bTzJvbITcM9D8F1KieS2 4mkjAhemDI3XzYBuC54m/Gqx1J/b5r7T9h+jb0RZLYQ6y+OS94EpPV7jInTdUyndGm9K y89Q== X-Gm-Message-State: ALyK8tL6asMcrCUtgocJ11GuYyKKAdUMti2wUf4j0CS7/W+jBbKw0mDr6/y3j6Z+vs4d6w== X-Received: by 10.66.193.231 with SMTP id hr7mr10761945pac.28.1467513018439; Sat, 02 Jul 2016 19:30:18 -0700 (PDT) Received: from splash.akips.com (CPE-120-146-191-2.static.qld.bigpond.net.au. [120.146.191.2]) by smtp.gmail.com with ESMTPSA id o68sm698740pfb.18.2016.07.02.19.30.16 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 02 Jul 2016 19:30:17 -0700 (PDT) Date: Sun, 3 Jul 2016 12:30:04 +1000 From: Paul Koch To: Cedric Blancher Cc: "freebsd-hackers@freebsd.org" Subject: Re: ZFS ARC and mmap/page cache coherency question Message-ID: <20160703123004.74a7385a@splash.akips.com> In-Reply-To: References: <20160630140625.3b4aece3@splash.akips.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd10.2) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Jul 2016 02:30:19 -0000 Is there a "long story", or is mmap() performance on ZFS doomed for the foreseeable future ? Paul. > Short story: ZFS was tacked on the kernel and was never properly > integrated into the VM page management, which leads to DRAMATIC poor > performance for anything which uses mmap() for write IO. This was > solved in Oracle Solaris with the great VM allocator rewrite which > landed after Opensolaris was made closed source again. > > Without a complete rewrite of the VM system this problem is unsolvable. > > Ced > > On 30 June 2016 at 06:06, Paul Koch wrote: > > > > Posted this to -stable on the 15th June, but no feedback... > > > > We are trying to understand a performance issue when syncing large mmap'ed > > files on ZFS. > > > > Example test box setup: > > FreeBSD 10.3-p5 > > Intel i7-5820K 3.30GHz with 64G RAM > > 6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe > > > > Read performance of a sequentially written large file on the pool is > > typically around 950Mbytes/sec using dd. > > > > Our software mmap's some large database files using MAP_NOSYNC, and we > > call fsync() every 10 minutes when we know the file system is mostly > > idle. In our test setup, the database files are 1.1G, 2G, 1.4G, 12G, > > 4.7G and ~20 small files (under 10M). All of the memory pages in the > > mmap'ed files are updated every minute with new values, so the entire > > mmap'ed file needs to be synced to disk, not just fragments. > > > > When the 10 minute fsync() occurs, gstat typically shows very little disk > > reads and very high write speeds, which is what we expect. But, every 80 > > minutes we process the data in the large mmap'ed files and store it in > > highly compressed blocks of a ~300G file using pread/pwrite (i.e. not > > mmap'ed). After that, the performance of the next fsync() of the mmap'ed > > files falls off a cliff. We are assuming it is because the ARC has > > thrown away the cached data of the mmap'ed files. gstat shows lots of > > read/write contention and lots of things tend to stall waiting for disk. > > > > Is this just a lack of ZFS ARC and page cache coherency ?? > > > > Is there a way to prime the ARC with the mmap'ed files again before we > > call fsync() ? > > > > We've tried cat and read() on the mmap'ed files but doesn't seem to touch > > the disk at all and the fsync() performance is still poor, so it looks > > like the ARC is not being filled. msync() doesn't seem to be much > > different. mincore() stats show the mmap'ed data is entirely incore and > > referenced. > > > > Paul. > > _______________________________________________ > > freebsd-hackers@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to > > "freebsd-hackers-unsubscribe@freebsd.org"