Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Jul 2016 12:50:16 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-hackers@freebsd.org
Subject:   Re: ZFS ARC and mmap/page cache coherency question
Message-ID:  <31f4d30f-4170-0d04-bd23-1b998474a92e@denninger.net>
In-Reply-To: <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org>
References:  <20160630140625.3b4aece3@splash.akips.com> <CALXu0UfxRMnaamh%2Bpo5zp=iXdNUNuyj%2B7e_N1z8j46MtJmvyVA@mail.gmail.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms040109070705040203000606
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


On 7/5/2016 12:19, Matthew Macy wrote:
>
>
>  ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger <karl@denninger=
=2Enet> wrote ----=20
>  > =20
>  > =20
>  > On 7/4/2016 18:45, Matthew Macy wrote:=20
>  > >=20
>  > >=20
>  > >  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <karl@denn=
inger.net> wrote ---- =20
>  > >  >  =20
>  > >  > On 7/3/2016 02:45, Matthew Macy wrote: =20
>  > >  > >          =20
>  > >  > >             Cedric greatly overstates the intractability of r=
esolving it. Nonetheless, since the initial import very little has been d=
one to improve integration, and I don't know of anyone who is up to the t=
ask taking an interest in it. Consequently, mmap() performance is likely =
"doomed" for the foreseeable future.-M----  =20
>  > >  >  =20
>  > >  > Wellllll.... =20
>  > >  >  =20
>  > >  > I've done a fair bit of work here (see =20
>  > >  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and=
 the =20
>  > >  > political issues are at least as bad as the coding ones. =20
>  > >  >  =20
>  > >  =20
>  > >=20
>  > > Strictly speaking, the root of the problem is the ARC. Not ZFS per=
 se. Have you ever tried disabling MFU caching to see how much worse LRU =
only is? I'm not really convinced the ARC's benefits justify its cost.=20
>  > >=20
>  > > -M=20
>  > >=20
>  > =20
>  > The ARC is very useful when it gets a hit as it avoid an I/O that wo=
uld=20
>  > otherwise take place.=20
>  > =20
>  > Where it sucks is when the system evicts working set to preserve ARC=
=2E =20
>  > That's always wrong in that you're trading a speculative I/O (if the=
=20
>  > cache is hit later) for a *guaranteed* one (to page out) and maybe *=
two*=20
>  > (to page back in.)=20
> =20
> The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU. T=
here are a lot of issues stemming from the fact that ZFS is a transaction=
al object store with a POSIX FS on top. One is that it caches disk blocks=
 as opposed to file blocks. However, if one could resolve that and have t=
he page cache manage these blocks life would be much much better. However=
, you'd lose MFU. Hence my question.
>
> -M
>
I suspect there's an argument to be made there but the present problems
make determining the impact of that difficult or impossible as those
effects are swamped by the other issues.

I can fairly-easily create workloads on the base code where simply
typing "vi <some file>", making a change and hitting ":w" will result in
a stall of tens of seconds or more while the cache flush that gets
requested is run down.  I've resolved a good part (but not all
instances) of this through my work.

My understanding is that 11- has had additional work done to the base
code, but three underlying issues are not, from what I can see in the
commit logs and discussions, addressed: The VM system will page out
working set while leaving ARC alone, UMA reserved-but-not-in-use space
is not policed adequately when memory pressure exists *before* the pager
starts considering evicting working set and the write-back cache is for
many machine configurations grossly inappropriate and cannot be tuned
adequately by hand (particularly being true on a system with vdevs that
have materially-varying performance levels.)

I have more-or-less stopped work on the tree on a forward basis since I
got to a place with 10.2 that (1) works for my production requirements,
resolving the problems and (2) ran into what I deemed to be intractable
political issues within core on progress toward eradicating the root of
the problem.

I will probably revisit the situation with 11- at some point, as I'll
want to roll my production systems forward.  However, I don't know when
that will be -- right now 11- is stable enough for some of my embedded
work (e.g. on the Raspberry Pi2) but is not on my server and
client-class machines.  Indeed just yesterday I got a lock-order
reversal panic while doing a shutdown after a kernel update on one of my
lab boxes running a just-updated 11- codebase.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms040109070705040203000606
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUxNzUwMTZaME8GCSqGSIb3DQEJBDFCBEB+
uW3KWU2eWDSXQTUP44BqHki8DdlspeuMs4iJnNFKXBwEb87FP/Qe3cSJk7JA9zPF4h13zPI8
Df2xbeNhsq9JMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAQmw70oJD
QhBLWxXdxGwD1Dws9tblRJ67e7dRElxtME/yJs1Gxtl4o4hwC76qd4mMmJ5wrCMcaZ9qDZwX
TKpC5/fWGU/sqXv4utH6fF18lbimDjm/SywA06DXwklNWHs+Y9k9HU06FXHn+n71wKHjR6t4
lRqF5yt6Uf7MK9quuL3l06HXgwoQZf75IR3WNSCvbrujAgLQDhjaaHLv12HiQPwbKsL5dAS2
PeF4wenKdi46Buil3qZ2EW7jrkoFoe2toUjak9skpZwFUD8X6ddPJf/kaofxq8bO7CJ4+bVx
ypOlRVNxVOEbRN5NNdHyel0hhFyNGVDiuOkrzOzhk1YBxRy0nYAeP/0DkhkZLcEEPyqLX9Kb
HH9Iy3kHEgJvw1vmvA+Jlpxrp1WcE7/pMQYndb2EfLXXNKaoJ0SnLlhD5uva/M00IxU+Rmr2
TolbZP5/pLsUYgiFkujv0jh/ChTOoEvIJFQNn3OELCI+MJPmJG6x9NVNBb4CmaiuP2L5IKNY
/59qJVeS1CwVZAPAHUGRMc900VFi3HS1mLvyZC7NBCI1Fzp5V7Qrw6lh3gNNGr9PolxhaCS0
rRTLk1QrEyhmxCof/WQQHBWJqdhoTRu5TU8hSZoPmRCDbfGIWjphhTfCtXVDetDYJojtXnFn
Aq/qFus05SnoKigpGQhxSEo3dCoAAAAAAAA=
--------------ms040109070705040203000606--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?31f4d30f-4170-0d04-bd23-1b998474a92e>