From owner-freebsd-arch@FreeBSD.ORG Sat Mar 23 21:10:21 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2DE6462F for ; Sat, 23 Mar 2013 21:10:21 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 7FBBD877 for ; Sat, 23 Mar 2013 21:10:20 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r2NLA31E082292; Sat, 23 Mar 2013 23:10:03 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.0 kib.kiev.ua r2NLA31E082292 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r2NLA13w082265; Sat, 23 Mar 2013 23:10:01 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 23 Mar 2013 23:10:01 +0200 From: Konstantin Belousov To: arch@freebsd.org Subject: VM_BCACHE_SIZE_MAX on i386 Message-ID: <20130323211001.GN3794@kib.kiev.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="WzMPPjnFM4J49rOr" Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: David Wolfskill X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Mar 2013 21:10:21 -0000 --WzMPPjnFM4J49rOr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline The unmapped I/O work allows avoiding the map of the vnode pages into the kernel memory for the UFS mounts, if underlying geoms and disk drivers accept unmapped BIOs. Converting all geom classes and drivers, despite not very hard, is quite big task, which requires a lot of validation on the unusual configurations and rare hardware. I decided to provide the transient remapping for the classes which are not yet converted, which allowed to put the work into HEAD much earlier, if at all. When unmapped BIO is passed through the geom stack and next geom is not marked as accepting unmapped BIO, the KVA space in the so called transient map is allocated and pages are mapped there. On the architectures with ample KVA creating the transient map is not an issue, but it is very delicate on the architectures with the limited KVA, i.e. mostly 32bit architectures. To not distrurb the KVA layout and current balance, I split the space previously allocated to the buffer map, into 90% which are still used by the buffer map, and the rest 10%, dedicated to the transient mapping. The split rationale is that typical load have 9/1 split for the user data/metadata buffers, and almost all user data buffers are unmapped. More precisely, the transient map is sized to 10% of the maximum _theoretical_ allowed buffer map size on the arch. Real buffer map is usually smaller, sized proportionally to the available RAM. The details of the allocation are in the vfs_bio.c:kern_vfs_bio_buffer_alloc(). The function uses maxbcache tunable, initialized from VM_BCACHE_SIZE_MAX by default. But, on i386 !PAE, VM_BCACHE_SIZE_MAX is bigger then the maximally sized buffer cache, on the 4GB RAM machine. The max buffer cache map size is around 110MB, while VM_BCACHE_SIZE_MAX is 200MB. This causes the bio_transient_map oversizing, eating additional 90MB of precious KVA on i386. By itself this +90MB KVA use is not critical, but it starts conflicting with other KVA hogs, like nvidia blob, which seemingly tries to remap the whole aperture (256+ MB) into the KVA. The issue was reported by dwh, and appeared to be quite misterious, since his machine has no useful way to report panics from failed X. The resolution I propose is to change the VM_BCACHE_SIZE_MAX on i386 !PAE case, to make it equal to the exact max size of the buffer cache. Note that maxbcache can be tuned from the loader prompt, so the effect of the change would be only on the i386 machines with tuned buffer cache. Also, the patch doubles the size of the transient map to 1/5 of the max buffer cache. This gives 180 parallel remapped i/os in flight, since I consider the re-caclulated 90 i/os too small even for i386. The patch was tested by dwh, please comment. I intend to commit it in several days. http://people.freebsd.org/~kib/misc/i386_maxbcache.1.patch --WzMPPjnFM4J49rOr Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRThopAAoJEJDCuSvBvK1Bn9IQAJOtpC5GcrcB5HwDJGDCO2nU W9zeiFPAzl1H386Ra4auQ1D2BJzGbr7yJdgwVhAFj+/to8FhxCNhZ/wfA1ClTL9g KnXB7pvjk0tcaCP1xf2S6CPBRW4DpOah3L0mLTnPQaSyNUOps5s66pXPn8WE4/2m eqP9jaIc+YBbto+fPneW91heQ2pnVrfLK8mbo4H+x+tjdmiXBNd4zYjIFOd+tPIq LQ9pkjQqo9CprKoByomCD+ddQ5SMVJSK9S0sb3IcElnW2bhVFpt/NgjhT4YN5yPW 5sQ5YvQ4duZHt0iIORS6vI4ExgGsZ4fWvExIhfY6h05lFzOt2j83Sd0xJFAPQulN jpJZUQKWS8ryZJ/CqSsDovxynSB1pS64in+cMtUbZuOmVSVTUEHE9QgsrsWbPy+t j/WGctT7MfYe9Rz+sNkf8LdVyuElWLm4SakM/VpZDxBJwiXyCEurY4KoZVAgD9Op ALa2foB/ACuvK0zcGUHuJWApHD62AR8+CQRkP0W6h7hT0ZgO4pn2R2VMpT86Ulbd yecg8kbgyBsiNeB3hKbKZkA2yyXaM5obG7//YS46eZTsQ4xolNNAVjAKlYLkf2gN Y5LdbChAifhqK3s+9W79SORTmpXwKo6nIxJFYUWcUJ4YoV9DdZcOFRX3fP4U9xnb ZGppbdLW5Gh0DCa6uOsy =wClS -----END PGP SIGNATURE----- --WzMPPjnFM4J49rOr--