From owner-freebsd-hackers@freebsd.org  Tue Jul  5 18:40:32 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E9DD6B2047C
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 18:40:32 +0000 (UTC)
 (envelope-from lionelcons1972@gmail.com)
Received: from mail-yw0-x230.google.com (mail-yw0-x230.google.com
 [IPv6:2607:f8b0:4002:c05::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A52791226
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 18:40:32 +0000 (UTC)
 (envelope-from lionelcons1972@gmail.com)
Received: by mail-yw0-x230.google.com with SMTP id i12so68519410ywa.1
 for <freebsd-hackers@freebsd.org>; Tue, 05 Jul 2016 11:40:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=bPHVmSWMxMML7KaxWY4/HrDQ6m4E8ABHFTh6rWXUyYM=;
 b=vBgOQy5Ic2H1eLXr9vQeHdHKc0D3MaSbb2wBkoS34hHFVbySfDlysBU+wU/LRBNIcG
 z3DbsKcJR7l1nTWsdrZebd0vVT3+RsLvJ5KM34hiZ+sE48RQnHfwuUcC3gJdGptLyk3t
 idWPbroBB6VWMbP+S6hEgzXnjW7hmR5+84ix0nowTCc1JUiuW+w0yJsWpoS5aPfOhJd6
 DGybS7JVV/NeyKwAZUdrWpgJC/g7H37sLpURdA14XXYcBdogccNLksxBMdZHRQAqLOmY
 LCGw2ohbP982gLLvhRk+C/0gvOBBXQjwXhtVOi2utVPpUdExlAOa2Pf25cuuBUNPP6zZ
 pSsw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=bPHVmSWMxMML7KaxWY4/HrDQ6m4E8ABHFTh6rWXUyYM=;
 b=GuQ/1vvDZxiLpzxEhiym7ArkS9LGSx0f5g3s7e2yu6ay/wXhRSPCXK8ezTl1KH2VI2
 1u6kpvCn8vckoeKmNiPtvSCh8JuIIjiQY5OB8pTWkXKyg+SwA3e1gdAHKWawTdxzWbrD
 D8i8fiRTUnKImWKklGnQhWOUnbTquKVi1dHRA4YCGVRdrgZUzc541ECsJexmQM0xKWQn
 rX4bNjHvL1AfMJRAARMNJynwQNTmE2L9sBbh/75d9lsVlghuk5wIgDfbUSubc0cJVRUc
 o7lPf42PYbyPwEr2tjwINMl81RIvpcDlcVs9m/rj9qOmJpNJlPFAkvW8qXMzCph6HlNI
 9d2g==
X-Gm-Message-State: ALyK8tIpkSB5wRy0WwfVPm7zk7XBIVFsxMn1GNaYgN/Djr/vGNc35akJCJCPa16PyfqHyFmVAgnYDRuUyREnsQ==
X-Received: by 10.129.50.83 with SMTP id y80mr12096377ywy.305.1467744031740;
 Tue, 05 Jul 2016 11:40:31 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.37.193.194 with HTTP; Tue, 5 Jul 2016 11:40:30 -0700 (PDT)
In-Reply-To: <31f4d30f-4170-0d04-bd23-1b998474a92e@denninger.net>
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
 <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
 <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org>
 <31f4d30f-4170-0d04-bd23-1b998474a92e@denninger.net>
From: Lionel Cons <lionelcons1972@gmail.com>
Date: Tue, 5 Jul 2016 20:40:30 +0200
Message-ID: <CAPJSo4VtJ1+txt4s13nKSWrj9fDTv5VsLVyMsX+DarBUVYMbOQ@mail.gmail.com>
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: Karl Denninger <karl@denninger.net>
Cc: Freebsd hackers list <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 18:40:33 -0000

So what Oracle did (based on work done by SUN for Opensolaris) was to:
1. Modify ZFS to prevent *ANY* double/multi caching [this is
considered a design defect]
2. Introduce a new VM subsystem which scales a lot better and provides
hooks for [1] so there are never two or more copies of the same data
in the system

Given that this was a huge, paid, multiyear effort its not likely
going to happen that the design defects in opensource ZFS will ever go
away.

Lionel

On 5 July 2016 at 19:50, Karl Denninger <karl@denninger.net> wrote:
>
> On 7/5/2016 12:19, Matthew Macy wrote:
>>
>>
>>  ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger <karl@denninger.=
net> wrote ----
>>  >
>>  >
>>  > On 7/4/2016 18:45, Matthew Macy wrote:
>>  > >
>>  > >
>>  > >  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <karl@denni=
nger.net> wrote ----
>>  > >  >
>>  > >  > On 7/3/2016 02:45, Matthew Macy wrote:
>>  > >  > >
>>  > >  > >             Cedric greatly overstates the intractability of re=
solving it. Nonetheless, since the initial import very little has been done=
 to improve integration, and I don't know of anyone who is up to the task t=
aking an interest in it. Consequently, mmap() performance is likely "doomed=
" for the foreseeable future.-M----
>>  > >  >
>>  > >  > Wellllll....
>>  > >  >
>>  > >  > I've done a fair bit of work here (see
>>  > >  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and =
the
>>  > >  > political issues are at least as bad as the coding ones.
>>  > >  >
>>  > >
>>  > >
>>  > > Strictly speaking, the root of the problem is the ARC. Not ZFS per =
se. Have you ever tried disabling MFU caching to see how much worse LRU onl=
y is? I'm not really convinced the ARC's benefits justify its cost.
>>  > >
>>  > > -M
>>  > >
>>  >
>>  > The ARC is very useful when it gets a hit as it avoid an I/O that wou=
ld
>>  > otherwise take place.
>>  >
>>  > Where it sucks is when the system evicts working set to preserve ARC.
>>  > That's always wrong in that you're trading a speculative I/O (if the
>>  > cache is hit later) for a *guaranteed* one (to page out) and maybe *t=
wo*
>>  > (to page back in.)
>>
>> The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU. Th=
ere are a lot of issues stemming from the fact that ZFS is a transactional =
object store with a POSIX FS on top. One is that it caches disk blocks as o=
pposed to file blocks. However, if one could resolve that and have the page=
 cache manage these blocks life would be much much better. However, you'd l=
ose MFU. Hence my question.
>>
>> -M
>>
> I suspect there's an argument to be made there but the present problems
> make determining the impact of that difficult or impossible as those
> effects are swamped by the other issues.
>
> I can fairly-easily create workloads on the base code where simply
> typing "vi <some file>", making a change and hitting ":w" will result in
> a stall of tens of seconds or more while the cache flush that gets
> requested is run down.  I've resolved a good part (but not all
> instances) of this through my work.
>
> My understanding is that 11- has had additional work done to the base
> code, but three underlying issues are not, from what I can see in the
> commit logs and discussions, addressed: The VM system will page out
> working set while leaving ARC alone, UMA reserved-but-not-in-use space
> is not policed adequately when memory pressure exists *before* the pager
> starts considering evicting working set and the write-back cache is for
> many machine configurations grossly inappropriate and cannot be tuned
> adequately by hand (particularly being true on a system with vdevs that
> have materially-varying performance levels.)
>
> I have more-or-less stopped work on the tree on a forward basis since I
> got to a place with 10.2 that (1) works for my production requirements,
> resolving the problems and (2) ran into what I deemed to be intractable
> political issues within core on progress toward eradicating the root of
> the problem.
>
> I will probably revisit the situation with 11- at some point, as I'll
> want to roll my production systems forward.  However, I don't know when
> that will be -- right now 11- is stable enough for some of my embedded
> work (e.g. on the Raspberry Pi2) but is not on my server and
> client-class machines.  Indeed just yesterday I got a lock-order
> reversal panic while doing a shutdown after a kernel update on one of my
> lab boxes running a just-updated 11- codebase.
>
> --
> Karl Denninger
> karl@denninger.net <mailto:karl@denninger.net>
> /The Market Ticker/
> /[S/MIME encrypted email preferred]/


--=20
Lionel