Date: Fri, 17 Sep 2010 11:14:35 +0300 From: Andriy Gapon <avg@freebsd.org> To: freebsd-hackers@freebsd.org Cc: Jeff Roberson <jeff@freebsd.org> Subject: zfs + uma Message-ID: <4C93236B.4050906@freebsd.org>
next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------030602010507080304070903 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit I've been investigating interaction between zfs and uma for a while. You might remember that there is a noticeable fragmentation in zfs uma zones when uma use is not enabled for actual data/metadata buffers. I also noticed that when uma use is enabled for data/metadata buffers (zio.use_uma=1) amount of memory reserved in free items of zfs uma zones becomes really huge. And this is despite the fact that the vast majority of the data/metadata zone have items with sizes that are multiples of page size. This couldn't really be because of fragmentation. Further checks show that the free items are accumulated in per-cpu cache buckets. uz_count for those buckets starts with 1, but over time, during bursts of activity, it grows up to maximum of 128. Problem with those buckets is that they are not drained on low memory conditions and uz_count never goes down. So, after a while, I observe about 300 free items (on a mere two core system) cached in 4 per-cpu buckets for a single zone with 128KB item size. That's 30MB right there. For all data and metadata zones the number goes as high as 500MB on my machine with 4GB physical RAM. This seems like a bit too much to me. Although keeping free items around improves performance, it does consume memory too. And the fact that that memory is not freed on lowmem condition makes the situation worse. So, I decided to take a look at how they handle this situation in (Open)Solaris. There is this good book: http://books.google.com/books?id=r_cecYD4AKkC&printsec=frontcover Please see section 6.2.4.5 on page 225 and table 6-11 on page 226. And also this code: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/kmem.c#971 It makes sense to me to limit size of per-cpu buckets depending on item size. I even wrote a little bit hackish patch [attached]. But I didn't go far as they did in Solaris, so minimum bucket size limit is 4. But perhaps it would make sense to not use the cache at all starting with certain size. Another attached hack removes zio zones that have items larger than page size, but not multiple of page size. Internally they would still consume multiple of page size per item, so we potentially can have two zones that use the same number of pages per zone, but with different item size. With the patch they are collapsed into a single zone. -- Andriy Gapon --------------030602010507080304070903 Content-Type: text/plain; name="uma-uz_count_max.diff" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="uma-uz_count_max.diff" ZGlmZiAtLWdpdCBhL3N5cy92bS91bWFfY29yZS5jIGIvc3lzL3ZtL3VtYV9jb3JlLmMKaW5k ZXggM2ZjNWI4YS4uM2I4Mzg0YiAxMDA2NDQKLS0tIGEvc3lzL3ZtL3VtYV9jb3JlLmMKKysr IGIvc3lzL3ZtL3VtYV9jb3JlLmMKQEAgLTE3OSw5ICsxNzksMTIgQEAgc3RydWN0IHVtYV9i dWNrZXRfem9uZSB7CiAJaW50CQl1YnpfZW50cmllczsKIH07CiAKLSNkZWZpbmUJQlVDS0VU X01BWAkxMjgKKyNkZWZpbmUJQlVDS0VUX1NJWkVfVEhSRVNIT0xECTEzMTA3MgorI2RlZmlu ZQlCVUNLRVRfTUFYCQkxMjgKIAogc3RydWN0IHVtYV9idWNrZXRfem9uZSBidWNrZXRfem9u ZXNbXSA9IHsKKwl7IE5VTEwsICI0IEJ1Y2tldCIsIDQgfSwKKwl7IE5VTEwsICI4IEJ1Y2tl dCIsIDggfSwKIAl7IE5VTEwsICIxNiBCdWNrZXQiLCAxNiB9LAogCXsgTlVMTCwgIjMyIEJ1 Y2tldCIsIDMyIH0sCiAJeyBOVUxMLCAiNjQgQnVja2V0IiwgNjQgfSwKQEAgLTE4OSw3ICsx OTIsNyBAQCBzdHJ1Y3QgdW1hX2J1Y2tldF96b25lIGJ1Y2tldF96b25lc1tdID0gewogCXsg TlVMTCwgTlVMTCwgMH0KIH07CiAKLSNkZWZpbmUJQlVDS0VUX1NISUZUCTQKKyNkZWZpbmUJ QlVDS0VUX1NISUZUCTIKICNkZWZpbmUJQlVDS0VUX1pPTkVTCSgoQlVDS0VUX01BWCA+PiBC VUNLRVRfU0hJRlQpICsgMSkKIAogLyoKQEAgLTE0NjMsNiArMTQ2NiwxMyBAQCB6b25lX2N0 b3Iodm9pZCAqbWVtLCBpbnQgc2l6ZSwgdm9pZCAqdWRhdGEsIGludCBmbGFncykKIAkJem9u ZS0+dXpfY291bnQgPSBrZWctPnVrX2lwZXJzOwogCWVsc2UKIAkJem9uZS0+dXpfY291bnQg PSBCVUNLRVRfTUFYOworCisJem9uZS0+dXpfY291bnRfbWF4ID0gQlVDS0VUX1NJWkVfVEhS RVNIT0xEIC8gem9uZS0+dXpfc2l6ZTsKKwlpZiAoem9uZS0+dXpfY291bnRfbWF4ID4gQlVD S0VUX01BWCkKKwkJem9uZS0+dXpfY291bnRfbWF4ID0gQlVDS0VUX01BWDsKKwllbHNlIGlm ICh6b25lLT51el9jb3VudF9tYXggPCAoMSA8PCBCVUNLRVRfU0hJRlQpKQorCQl6b25lLT51 el9jb3VudF9tYXggPSAxIDw8IEJVQ0tFVF9TSElGVDsKKwogCXJldHVybiAoMCk7CiB9CiAK QEAgLTIwNzYsNyArMjA4Niw3IEBAIHphbGxvY19zdGFydDoKIAljcml0aWNhbF9leGl0KCk7 CiAKIAkvKiBCdW1wIHVwIG91ciB1el9jb3VudCBzbyB3ZSBnZXQgaGVyZSBsZXNzICovCi0J aWYgKHpvbmUtPnV6X2NvdW50IDwgQlVDS0VUX01BWCkKKwlpZiAoem9uZS0+dXpfY291bnQg PCB6b25lLT51el9jb3VudF9tYXgpCiAJCXpvbmUtPnV6X2NvdW50Kys7CiAKIAkvKgpkaWZm IC0tZ2l0IGEvc3lzL3ZtL3VtYV9pbnQuaCBiL3N5cy92bS91bWFfaW50LmgKaW5kZXggNzcx MzU5My4uNmQ4MWUzZCAxMDA2NDQKLS0tIGEvc3lzL3ZtL3VtYV9pbnQuaAorKysgYi9zeXMv dm0vdW1hX2ludC5oCkBAIC0zMzAsNiArMzMwLDcgQEAgc3RydWN0IHVtYV96b25lIHsKIAl1 X2ludDY0X3QJdXpfc2xlZXBzOwkvKiBUb3RhbCBudW1iZXIgb2YgYWxsb2Mgc2xlZXBzICov CiAJdWludDE2X3QJdXpfZmlsbHM7CS8qIE91dHN0YW5kaW5nIGJ1Y2tldCBmaWxscyAqLwog CXVpbnQxNl90CXV6X2NvdW50OwkvKiBIaWdoZXN0IHZhbHVlIHViX3B0ciBjYW4gaGF2ZSAq LworCXVpbnQxNl90CXV6X2NvdW50X21heDsJLyogSGlnaGVzdCB2YWx1ZSB1el9jb3VudCBj YW4gaGF2ZSAqLwogCiAJLyoKIAkgKiBUaGlzIEhBUyB0byBiZSB0aGUgbGFzdCBpdGVtIGJl Y2F1c2Ugd2UgYWRqdXN0IHRoZSB6b25lIHNpemUK --------------030602010507080304070903 Content-Type: text/plain; name="zfs-zio-zones.diff" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="zfs-zio-zones.diff" ZGlmZiAtLWdpdCBhL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMvdXRzL2NvbW1vbi9m cy96ZnMvemlvLmMgYi9zeXMvY2RkbC9jb250cmliL29wZW5zb2xhcmlzL3V0cy9jb21tb24v ZnMvemZzL3ppby5jCmluZGV4IDhkZGY3Y2QuLjM0MGY2NzYgMTAwNjQ0Ci0tLSBhL3N5cy9j ZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMvdXRzL2NvbW1vbi9mcy96ZnMvemlvLmMKKysrIGIv c3lzL2NkZGwvY29udHJpYi9vcGVuc29sYXJpcy91dHMvY29tbW9uL2ZzL3pmcy96aW8uYwpA QCAtMTIxLDEwICsxMjEsMTEgQEAgemlvX2luaXQodm9pZCkKIAkJCWFsaWduID0gU1BBX01J TkJMT0NLU0laRTsKIAkJfSBlbHNlIGlmIChQMlBIQVNFKHNpemUsIFBBR0VTSVpFKSA9PSAw KSB7CiAJCQlhbGlnbiA9IFBBR0VTSVpFOworI2lmIDAKIAkJfSBlbHNlIGlmIChQMlBIQVNF KHNpemUsIHAyID4+IDIpID09IDApIHsKIAkJCWFsaWduID0gcDIgPj4gMjsKKyNlbmRpZgog CQl9Ci0KIAkJaWYgKGFsaWduICE9IDApIHsKIAkJCWNoYXIgbmFtZVszNl07CiAJCQkodm9p ZCkgc3ByaW50ZihuYW1lLCAiemlvX2J1Zl8lbHUiLCAodWxvbmdfdClzaXplKTsK --------------030602010507080304070903--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C93236B.4050906>