Date: Sat, 2 Feb 2008 23:54:50 +0100 (CET) From: "Max Laier" <max@love2party.net> To: "Alexander Motin" <mav@FreeBSD.org> Cc: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org, Robert Watson <rwatson@freebsd.org>, Julian Elischer <julian@elischer.org> Subject: Re: Memory allocation performance Message-ID: <55964.192.168.4.151.1201992890.squirrel@router.laiers.local> In-Reply-To: <47A4E934.1050207@FreeBSD.org> References: <47A25412.3010301@FreeBSD.org> <47A25A0D.2080508@elischer.org> <47A2C2A2.5040109@FreeBSD.org> <20080201185435.X88034@fledge.watson.org> <47A43873.40801@FreeBSD.org> <20080202095658.R63379@fledge.watson.org> <47A4E934.1050207@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Am Sa, 2.02.2008, 23:05, schrieb Alexander Motin: > Robert Watson wrote: >> Hence my request for drilling down a bit on profiling -- the question >> I'm asking is whether profiling shows things running or taking time that >> shouldn't be. > > I have not yet understood why does it happend, but hwpmc shows huge > amount of "p4-resource-stall"s in UMA functions: > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 45.2 2303.00 2303.00 0 100.00% uma_zfree_arg [1] > 41.2 4402.00 2099.00 0 100.00% uma_zalloc_arg [2] > 1.4 4472.00 70.00 0 100.00% > uma_zone_exhausted_nolock [3] > 0.9 4520.00 48.00 0 100.00% ng_snd_item [4] > 0.8 4562.00 42.00 0 100.00% __qdivrem [5] > 0.8 4603.00 41.00 0 100.00% ether_input [6] > 0.6 4633.00 30.00 0 100.00% ng_ppp_prepend [7] > > Probably it explains why "p4-global-power-events" shows many hits into > them > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 20.0 37984.00 37984.00 0 100.00% uma_zfree_arg [1] > 17.8 71818.00 33834.00 0 100.00% uma_zalloc_arg [2] > 4.0 79483.00 7665.00 0 100.00% ng_snd_item [3] > 3.0 85256.00 5773.00 0 100.00% __mcount [4] > 2.3 89677.00 4421.00 0 100.00% bcmp [5] > 2.2 93853.00 4176.00 0 100.00% generic_bcopy [6] > > , while "p4-instr-retired" does not. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 11.1 5351.00 5351.00 0 100.00% ng_apply_item [1] > 7.9 9178.00 3827.00 0 100.00% > legacy_pcib_alloc_msi [2] > 4.1 11182.00 2004.00 0 100.00% init386 [3] > 4.0 13108.00 1926.00 0 100.00% rn_match [4] > 3.5 14811.00 1703.00 0 100.00% uma_zalloc_arg [5] > 2.6 16046.00 1235.00 0 100.00% SHA256_Transform > [6] > 2.2 17130.00 1084.00 0 100.00% ng_add_hook [7] > 2.0 18111.00 981.00 0 100.00% ng_rmhook_self [8] > 2.0 19054.00 943.00 0 100.00% em_encap [9] > > For this moment I have invent two possible explanation. One is that due > to UMA's cyclic block allocation order it does not fits CPU caches and > another that it is somehow related to critical_exit(), which possibly > can cause context switch. Does anybody have better explanation how such > small and simple in this part function can cause such results? I didn't see bzero accounted for in any of the traces in this thread - makes me wonder if that might mean that it's counted within uma_zalloc? Maybe we are calling it twice by accident? I wasn't quite able to figure out the logic of M_ZERO vs. UMA_ZONE_MALLOC etc. ... just a crazy idea. -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55964.192.168.4.151.1201992890.squirrel>