Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 03 Feb 2008 00:05:40 +0200
From:      Alexander Motin <mav@FreeBSD.org>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org, Julian Elischer <julian@elischer.org>
Subject:   Re: Memory allocation performance
Message-ID:  <47A4E934.1050207@FreeBSD.org>
In-Reply-To: <20080202095658.R63379@fledge.watson.org>
References:  <47A25412.3010301@FreeBSD.org> <47A25A0D.2080508@elischer.org> <47A2C2A2.5040109@FreeBSD.org> <20080201185435.X88034@fledge.watson.org> <47A43873.40801@FreeBSD.org> <20080202095658.R63379@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Robert Watson wrote:
> Hence my request for drilling down a bit on profiling -- the question 
> I'm asking is whether profiling shows things running or taking time that 
> shouldn't be.

I have not yet understood why does it happend, but hwpmc shows huge 
amount of "p4-resource-stall"s in UMA functions:
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  45.2    2303.00  2303.00        0  100.00%           uma_zfree_arg [1]
  41.2    4402.00  2099.00        0  100.00%           uma_zalloc_arg [2]
   1.4    4472.00    70.00        0  100.00% 
uma_zone_exhausted_nolock [3]
   0.9    4520.00    48.00        0  100.00%           ng_snd_item [4]
   0.8    4562.00    42.00        0  100.00%           __qdivrem [5]
   0.8    4603.00    41.00        0  100.00%           ether_input [6]
   0.6    4633.00    30.00        0  100.00%           ng_ppp_prepend [7]

Probably it explains why "p4-global-power-events" shows many hits into them
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  20.0   37984.00 37984.00        0  100.00%           uma_zfree_arg [1]
  17.8   71818.00 33834.00        0  100.00%           uma_zalloc_arg [2]
   4.0   79483.00  7665.00        0  100.00%           ng_snd_item [3]
   3.0   85256.00  5773.00        0  100.00%           __mcount [4]
   2.3   89677.00  4421.00        0  100.00%           bcmp [5]
   2.2   93853.00  4176.00        0  100.00%           generic_bcopy [6]

, while "p4-instr-retired" does not.
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  11.1    5351.00  5351.00        0  100.00%           ng_apply_item [1]
   7.9    9178.00  3827.00        0  100.00% 
legacy_pcib_alloc_msi [2]
   4.1   11182.00  2004.00        0  100.00%           init386 [3]
   4.0   13108.00  1926.00        0  100.00%           rn_match [4]
   3.5   14811.00  1703.00        0  100.00%           uma_zalloc_arg [5]
   2.6   16046.00  1235.00        0  100.00%           SHA256_Transform [6]
   2.2   17130.00  1084.00        0  100.00%           ng_add_hook [7]
   2.0   18111.00   981.00        0  100.00%           ng_rmhook_self [8]
   2.0   19054.00   943.00        0  100.00%           em_encap [9]

For this moment I have invent two possible explanation. One is that due 
to UMA's cyclic block allocation order it does not fits CPU caches and 
another that it is somehow related to critical_exit(), which possibly 
can cause context switch. Does anybody have better explanation how such 
small and simple in this part function can cause such results?

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47A4E934.1050207>