Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Mar 2013 23:10:01 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        arch@freebsd.org
Cc:        David Wolfskill <david@catwhisker.org>
Subject:   VM_BCACHE_SIZE_MAX on i386
Message-ID:  <20130323211001.GN3794@kib.kiev.ua>

next in thread | raw e-mail | index | archive | help

--WzMPPjnFM4J49rOr
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

The unmapped I/O work allows avoiding the map of the vnode pages into
the kernel memory for the UFS mounts, if underlying geoms and disk
drivers accept unmapped BIOs.  Converting all geom classes and
drivers, despite not very hard, is quite big task, which requires a
lot of validation on the unusual configurations and rare hardware.  I
decided to provide the transient remapping for the classes which are
not yet converted, which allowed to put the work into HEAD much
earlier, if at all.

When unmapped BIO is passed through the geom stack and next geom is
not marked as accepting unmapped BIO, the KVA space in the so called
transient map is allocated and pages are mapped there.  On the
architectures with ample KVA creating the transient map is not an
issue, but it is very delicate on the architectures with the limited
KVA, i.e. mostly 32bit architectures.

To not distrurb the KVA layout and current balance, I split the space
previously allocated to the buffer map, into 90% which are still used
by the buffer map, and the rest 10%, dedicated to the transient
mapping.  The split rationale is that typical load have 9/1 split for
the user data/metadata buffers, and almost all user data buffers are
unmapped.

More precisely, the transient map is sized to 10% of the maximum
_theoretical_ allowed buffer map size on the arch. Real buffer map is
usually smaller, sized proportionally to the available RAM. The
details of the allocation are in the
vfs_bio.c:kern_vfs_bio_buffer_alloc().  The function uses maxbcache
tunable, initialized from VM_BCACHE_SIZE_MAX by default.

But, on i386 !PAE, VM_BCACHE_SIZE_MAX is bigger then the maximally
sized buffer cache, on the 4GB RAM machine. The max buffer cache map
size is around 110MB, while VM_BCACHE_SIZE_MAX is 200MB. This causes
the bio_transient_map oversizing, eating additional 90MB of precious
KVA on i386.

By itself this +90MB KVA use is not critical, but it starts
conflicting with other KVA hogs, like nvidia blob, which seemingly
tries to remap the whole aperture (256+ MB) into the KVA. The issue
was reported by dwh, and appeared to be quite misterious, since his
machine has no useful way to report panics from failed X.

The resolution I propose is to change the VM_BCACHE_SIZE_MAX on i386
!PAE case, to make it equal to the exact max size of the buffer cache.
Note that maxbcache can be tuned from the loader prompt, so the effect
of the change would be only on the i386 machines with tuned buffer
cache.

Also, the patch doubles the size of the transient map to 1/5 of the
max buffer cache. This gives 180 parallel remapped i/os in flight,
since I consider the re-caclulated 90 i/os too small even for i386.

The patch was tested by dwh, please comment. I intend to commit it in
several days.

http://people.freebsd.org/~kib/misc/i386_maxbcache.1.patch


--WzMPPjnFM4J49rOr
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRThopAAoJEJDCuSvBvK1Bn9IQAJOtpC5GcrcB5HwDJGDCO2nU
W9zeiFPAzl1H386Ra4auQ1D2BJzGbr7yJdgwVhAFj+/to8FhxCNhZ/wfA1ClTL9g
KnXB7pvjk0tcaCP1xf2S6CPBRW4DpOah3L0mLTnPQaSyNUOps5s66pXPn8WE4/2m
eqP9jaIc+YBbto+fPneW91heQ2pnVrfLK8mbo4H+x+tjdmiXBNd4zYjIFOd+tPIq
LQ9pkjQqo9CprKoByomCD+ddQ5SMVJSK9S0sb3IcElnW2bhVFpt/NgjhT4YN5yPW
5sQ5YvQ4duZHt0iIORS6vI4ExgGsZ4fWvExIhfY6h05lFzOt2j83Sd0xJFAPQulN
jpJZUQKWS8ryZJ/CqSsDovxynSB1pS64in+cMtUbZuOmVSVTUEHE9QgsrsWbPy+t
j/WGctT7MfYe9Rz+sNkf8LdVyuElWLm4SakM/VpZDxBJwiXyCEurY4KoZVAgD9Op
ALa2foB/ACuvK0zcGUHuJWApHD62AR8+CQRkP0W6h7hT0ZgO4pn2R2VMpT86Ulbd
yecg8kbgyBsiNeB3hKbKZkA2yyXaM5obG7//YS46eZTsQ4xolNNAVjAKlYLkf2gN
Y5LdbChAifhqK3s+9W79SORTmpXwKo6nIxJFYUWcUJ4YoV9DdZcOFRX3fP4U9xnb
ZGppbdLW5Gh0DCa6uOsy
=wClS
-----END PGP SIGNATURE-----

--WzMPPjnFM4J49rOr--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130323211001.GN3794>