FreeBSD Mail Archives

Date:      Thu, 21 Aug 2014 10:57:36 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Alan Cox <alc@rice.edu>
Cc:        "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>, "Polyack, Steve" <Steve.Polyack@intermedix.com>
Subject:   Re: vmdaemon CPU usage and poor performance in 10.0-RELEASE
Message-ID:  <20140821075736.GD2737@kib.kiev.ua>
In-Reply-To: <53F59AF6.6080204@rice.edu>
References:  <4D557EC7CC2A544AA7C1A3B9CBA2B3672609BBA64F@exchange03.epbs.com> <53F2790C.20703@rice.edu> <4D557EC7CC2A544AA7C1A3B9CBA2B3672609CF28E5@exchange03.epbs.com> <4D557EC7CC2A544AA7C1A3B9CBA2B3672609CF2F8F@exchange03.epbs.com> <4D557EC7CC2A544AA7C1A3B9CBA2B3672609CF31F0@exchange03.epbs.com> <53F4C4C2.1030109@rice.edu> <4D557EC7CC2A544AA7C1A3B9CBA2B3672609CF335D@exchange03.epbs.com> <53F4C82E.5000900@rice.edu> <4D557EC7CC2A544AA7C1A3B9CBA2B3672609CF33F6@exchange03.epbs.com> <53F59AF6.6080204@rice.edu>

index | next in thread | previous in thread | raw e-mail


[-- Attachment #1 --]
On Thu, Aug 21, 2014 at 02:08:38AM -0500, Alan Cox wrote:
> On 08/20/2014 11:22, Polyack, Steve wrote:
> >> -----Original Message-----
> >> From: Alan Cox [mailto:alc@rice.edu]
> >> Sent: Wednesday, August 20, 2014 12:09 PM
> >> To: Polyack, Steve; freebsd-stable@freebsd.org
> >> Subject: Re: vmdaemon CPU usage and poor performance in 10.0-RELEASE
> >>
> >> On 08/20/2014 10:56, Polyack, Steve wrote:
> >>>> -----Original Message-----
> >>>> From: Alan Cox [mailto:alc@rice.edu]
> >>>> Sent: Wednesday, August 20, 2014 11:55 AM
> >>>> To: Polyack, Steve; freebsd-stable@freebsd.org
> >>>> Subject: Re: vmdaemon CPU usage and poor performance in 10.0-RELEASE
> >>>>
> >>>> On 08/20/2014 09:55, Polyack, Steve wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: Polyack, Steve
> >>>>>> Sent: Wednesday, August 20, 2014 9:14 AM
> >>>>>> To: Polyack, Steve; Alan Cox; freebsd-stable@freebsd.org
> >>>>>> Subject: RE: vmdaemon CPU usage and poor performance in 10.0-
> >> RELEASE
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: owner-freebsd-stable@freebsd.org [mailto:owner-freebsd-
> >>>>>>> stable@freebsd.org] On Behalf Of Polyack, Steve
> >>>>>>> Sent: Tuesday, August 19, 2014 12:37 PM
> >>>>>>> To: Alan Cox; freebsd-stable@freebsd.org
> >>>>>>> Subject: RE: vmdaemon CPU usage and poor performance in 10.0-
> >>>> RELEASE
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: owner-freebsd-stable@freebsd.org [mailto:owner-freebsd-
> >>>>>>>> stable@freebsd.org] On Behalf Of Alan Cox
> >>>>>>>> Sent: Monday, August 18, 2014 6:07 PM
> >>>>>>>> To: freebsd-stable@freebsd.org
> >>>>>>>> Subject: Re: vmdaemon CPU usage and poor performance in 10.0-
> >>>>>> RELEASE
> >>>>>>>> On 08/18/2014 16:29, Polyack, Steve wrote:
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: owner-freebsd-stable@freebsd.org [mailto:owner-freebsd-
> >>>>>>>>>> stable@freebsd.org] On Behalf Of Alan Cox
> >>>>>>>>>> Sent: Monday, August 18, 2014 3:05 PM
> >>>>>>>>>> To: freebsd-stable@freebsd.org
> >>>>>>>>>> Subject: Re: vmdaemon CPU usage and poor performance in 10.0-
> >>>>>>> RELEASE
> >>>>>>>>>> On 08/18/2014 13:42, Polyack, Steve wrote:
> >>>>>>>>>>> Excuse my poorly formatted reply at the moment, but this seems
> >> to
> >>>>>>>> have
> >>>>>>>>>> fixed our problems.  I'm going to update the bug report with a
> >> note.
> >>>>>>>>>>> Thanks Alan!
> >>>>>>>>>> You're welcome.  And, thanks for letting me know of the outcome.
> >>>>>>>>>>
> >>>>>>>>> Actually, I may have spoken too soon, as it looks like we're seeing
> >>>>>>>> vmdaemon tying up the system again:
> >>>>>>>>> root                6  100.0  0.0        0       16  -  DL   Wed04PM      4:37.95
> >>>>>>> [vmdaemon]
> >>>>>>>>> Is there anything I can check to help narrow down what may be the
> >>>>>>>> problem?  KTrace/truss on the "process" doesn't give any
> >> information, I
> >>>>>>>> suppose because it's actually a kernel thread.
> >>>>>>>>
> >>>>>>>> Can you provide the full output of top?  Is there anything unusual
> >>>> about
> >>>>>>>> the hardware or software configuration?
> >>>>>>> This may have just been a fluke (maybe NFS caching the old
> >>>> vm_pageout.c
> >>>>>>> during the first source build).  We've rebuilt and are monitoring it
> >> now.
> >>>>>>> The hardware consists of a few Dell PowerEdge R720xd servers with
> >>>> 256GB
> >>>>>>> of RAM and array of SSDs (no ZFS).  64GB is dedicated to postgres
> >>>>>>> shared_buffers right now. FreeBSD 10, PostgreSQL 9.3, Slony-I v2.2.2,
> >>>> and
> >>>>>>> redis-2.8.11 are all in use here.  I can't say that anything is unusual
> >> about
> >>>>>> the
> >>>>>>> configuration.
> >>>>>>>
> >>>>>> We are still seeing the issue.  It seems to manifest once the "Free"
> >>>> memory
> >>>>>> gets under 10GB (of 256GB on the system), even though ~200GB of this
> >> is
> >>>>>> classified as Inactive.  For us, this was about 7 hours of database
> >> activity
> >>>>>> (initial replication w/ slony).  Right now vmdaemon is consuming 100%
> >>>> CPU
> >>>>>> and shows 671:34 CPU time when it showed 0:00 up until the problem
> >>>>>> manifested.  The full top output (that fits on my screen) is below:
> >>>>>>
> >>>>>> last pid: 62309;  load averages:  4.05,  4.24,  4.10
> >>>>>> up 0+22:34:31  09:08:43
> >>>>>> 159 processes: 8 running, 145 sleeping, 1 waiting, 5 lock
> >>>>>> CPU: 14.5% user,  0.0% nice,  4.9% system,  0.0% interrupt, 80.5% idle
> >>>>>> Mem: 26G Active, 216G Inact, 4122M Wired, 1178M Cache, 1632M Buf,
> >>>> 2136M
> >>>>>> Free
> >>>>>> Swap: 32G Total, 32G Free
> >>>>>>
> >>>>>>   PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU
> >>>>>> COMMAND
> >>>>>>    11 root            32 155 ki31     0K   512K CPU31  31 669.6H 2934.23% idle
> >>>>>>     6 root             1 -16    -     0K    16K CPU19  19 678:57 100.00% vmdaemon
> >>>>>>  1963 pgsql            1  45    0 67538M   208M CPU0    0 121:46  17.38%
> >> postgres
> >>>>>>  2037 pgsql            1  77    0 67536M  2200K *vm ob 14   6:24  15.97%
> >> postgres
> >>>>>>  1864 pgsql            1  31    0 67536M  1290M semwai  4 174:41  15.19%
> >>>> postgres
> >>>>>>  1996 pgsql            1  38    0 67538M   202M semwai 16 120:27  15.09%
> >>>> postgres
> >>>>>>  1959 pgsql            1  39    0 67538M   204M CPU27  27 117:30  15.09%
> >> postgres
> >>>>>>  1849 pgsql            1  32    0 67536M  1272M semwai 23 126:22  13.96%
> >>>> postgres
> >>>>>>  1997 pgsql            1  31    0 67538M   206M CPU30  30 122:26  11.77%
> >> postgres
> >>>>>>  2002 pgsql            1  34    0 67538M   182M sbwait 11  55:20  11.28%
> >> postgres
> >>>>>>  1961 pgsql            1  32    0 67538M   206M CPU12  12 121:47  10.99%
> >> postgres
> >>>>>>  1964 pgsql            1  30    0 67538M   206M semwai 28 122:08   9.86%
> >> postgres
> >>>>>>  1962 pgsql            1  29    0 67538M  1286M sbwait  2  45:49   7.18%
> >> postgres
> >>>>>>  1752 root             1  22    0 78356K  8688K CPU2    2 175:46   6.88% snmpd
> >>>>>>  1965 pgsql            1  25    0 67538M   207M semwai  9 120:55   6.59%
> >> postgres
> >>>>>>  1960 pgsql            1  23    0 67538M   177M semwai  6  52:42   4.88%
> >> postgres
> >>>>>>  1863 pgsql            1  25    0 67542M   388M semwai 25   9:12   2.20%
> >> postgres
> >>>>>>  1859 pgsql            1  22    0 67538M  1453M *vm ob 20   6:13   2.10%
> >> postgres
> >>>>>>  1860 pgsql            1  22    0 67538M  1454M sbwait  8   6:08   1.95% postgres
> >>>>>>  1848 pgsql            1  21    0 67586M 66676M *vm ob 30 517:07   1.66%
> >>>> postgres
> >>>>>>  1856 pgsql            1  22    0 67538M   290M *vm ob 15   5:39   1.66%
> >> postgres
> >>>>>>  1846 pgsql            1  21    0 67538M   163M sbwait 15   5:46   1.46% postgres
> >>>>>>  1853 pgsql            1  21    0 67538M   110M sbwait 30   8:54   1.17% postgres
> >>>>>>  1989 pgsql            1  23    0 67536M  5180K sbwait 18   1:41   0.98% postgres
> >>>>>>     5 root             1 -16    -     0K    16K psleep  6   9:33   0.78% pagedaemon
> >>>>>>  1854 pgsql            1  20    0 67538M   338M sbwait 22   5:38   0.78% postgres
> >>>>>>  1861 pgsql            1  20    0 67538M   286M sbwait 15   6:13   0.68% postgres
> >>>>>>  1857 pgsql            1  20    0 67538M  1454M semwai 10   6:19   0.49%
> >> postgres
> >>>>>>  1999 pgsql            1  36    0 67538M   156M *vm ob 28 120:56   0.39%
> >> postgres
> >>>>>>  1851 pgsql            1  20    0 67538M   136M sbwait 22   5:48   0.39% postgres
> >>>>>>  1975 pgsql            1  20    0 67536M  5688K sbwait 25   1:40   0.29% postgres
> >>>>>>  1858 pgsql            1  20    0 67538M   417M sbwait  3   5:55   0.20% postgres
> >>>>>>  2031 pgsql            1  20    0 67536M  5664K sbwait  5   3:26   0.10% postgres
> >>>>>>  1834 root            12  20    0 71892K 12848K select 20  34:05   0.00% slon
> >>>>>>    12 root            78 -76    -     0K  1248K WAIT    0  25:47   0.00% intr
> >>>>>>  2041 pgsql            1  20    0 67536M  5932K sbwait 14  12:50   0.00%
> >> postgres
> >>>>>>  2039 pgsql            1  20    0 67536M  5960K sbwait 17   9:59   0.00% postgres
> >>>>>>  2038 pgsql            1  20    0 67536M  5956K sbwait  6   8:21   0.00% postgres
> >>>>>>  2040 pgsql            1  20    0 67536M  5996K sbwait  7   8:20   0.00% postgres
> >>>>>>  2032 pgsql            1  20    0 67536M  5800K sbwait 22   7:03   0.00% postgres
> >>>>>>  2036 pgsql            1  20    0 67536M  5748K sbwait 23   6:38   0.00% postgres
> >>>>>>  1812 pgsql            1  20    0 67538M 59185M select  1   5:46   0.00% postgres
> >>>>>>  2005 pgsql            1  20    0 67536M  5788K sbwait 23   5:14   0.00% postgres
> >>>>>>  2035 pgsql            1  20    0 67536M  4892K sbwait 18   4:52   0.00%
> >> <postgres>
> >>>>>>  1852 pgsql            1  21    0 67536M  1230M semwai  7   4:47   0.00%
> >> postgres
> >>>>>>    13 root             3  -8    -     0K    48K -      28   4:46   0.00% geom
> >>>>>>
> >>>>>>
> >>>>> Another thing I've noticed is that this sysctl vm.stats counter is
> >> increasing
> >>>> fairly rapidly:
> >>>>> # sysctl vm.stats.vm.v_pdpages && sleep 1 && sysctl
> >>>> vm.stats.vm.v_pdpages
> >>>>> vm.stats.vm.v_pdpages: 3455264541
> >>>>> vm.stats.vm.v_pdpages: 3662158383
> >>>> I'm not sure what that tells us, because both the page daemon and the
> >> vm
> >>>> ("swap") daemon increment this counter.
> >>>>
> >>>>> Also, to demonstrate what kind of problems this seems to cause:
> >>>>> # time sleep 1
> >>>>>
> >>>>> real	0m18.288s
> >>>>> user	0m0.001s
> >>>>> sys	0m0.004s
> >>>> If you change the sysctl vm.swap_enabled to 0, how does your system
> >>>> behave?
> >>>>
> >>> Setting vm.swap_enabled to 0 made the problem clear up almost
> >> instantly.  vmdaemon is back to 0.00% CPU usage and the system is
> >> responsive once again.
> >>>
> >> I doubt that you need whole process swapping.  The page daemon is
> >> probably sufficient.  See how things go for a few days and let me know.
> >>
> >> There is still a bug here that needs diagnosing and fixing.  So, I will
> >> likely send you a debugging patch in the near future, and ask you to
> >> reenable swapping under that patch.
> >>
> > If it helps at all - setting vm.swap_enabled=0 seems to fix the problem even without the aforementioned patch to vm_pageout.c.
> >
> 
> 
> I have a couple hypotheses for what is causing your problem.  The
> attached patch addresses one of them.  Please apply this patch and then
> reset vm._swap_enabled back to 1.
> 
> 

> Index: vm/vm_pageout.c
> ===================================================================
> --- vm/vm_pageout.c	(revision 270258)
> +++ vm/vm_pageout.c	(working copy)
> @@ -1309,6 +1309,20 @@ relock_queues:
>  	vm_pagequeue_unlock(pq);
>  
>  	/*
> +	 * If we didn't get enough free pages, and we have skipped a vnode
> +	 * in a writeable object, wakeup the sync daemon.  And kick swapout
> +	 * if we did not get enough free pages.
> +	 */
> +	if (page_shortage > 0) {
> +		if (vnodes_skipped && vm_page_count_min())
> +			(void) speedup_syncer();
> +#if !defined(NO_SWAPPING)
> +		if (vm_swap_enabled)
> +			vm_req_vmdaemon(VM_SWAP_NORMAL);
> +#endif
> +	}
> +
> +	/*
>  	 * Compute the number of pages we want to try to move from the
>  	 * active queue to the inactive queue.
>  	 */
> @@ -1418,20 +1432,6 @@ relock_queues:
>  		}
>  	}
>  #endif
> -		
> -	/*
> -	 * If we didn't get enough free pages, and we have skipped a vnode
> -	 * in a writeable object, wakeup the sync daemon.  And kick swapout
> -	 * if we did not get enough free pages.
> -	 */
> -	if (vm_paging_target() > 0) {
> -		if (vnodes_skipped && vm_page_count_min())
> -			(void) speedup_syncer();
> -#if !defined(NO_SWAPPING)
> -		if (vm_swap_enabled && vm_page_count_target())
> -			vm_req_vmdaemon(VM_SWAP_NORMAL);
> -#endif
> -	}
>  
>  	/*
>  	 * If we are critically low on one of RAM or swap and low on

I suspect that what happen is the contention on vm object lock
between page fault handler and pageout scan.  If faults are frequent
enough, pagedaemon would be unable to try-lock the object backing
the postgresql shared area, which is large enough to constitute the
significant part of the queues.

The fault fast path does not help there, since the object lock is
still taken on fault, even readonly, which conflicts with the pagedaemon
write lock mode.

This is one of the reasons why I added MAP_SHARED_PHYS flag.

[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJT9aZwAAoJEJDCuSvBvK1BdCwQAJzV3V9IwDm100Y8XTv+aZIV
tRIIL6OVC/SO80uoNUOg5t/NJqX1+CPrgQnj7hgUsngVqyu/SfrIOujIdDjxekAD
TRBic4yOSZMVw4fgOHVIkXg/QrtQnu5fzK2JVmk+bg8x94NU2TGt3Qoc9vFJ7VHE
r73LK/1bWyp1aSWm5FOUn4zFjSs9yjonsRJUSlzdckAt9TnlY0SHK0VDCwDqpKDk
7lSNQwlPdCgAYPhCuf9eslKCMd6m4aJkccEMybJPk3ntJlfeppuQyuD484VZlN2A
05Mpz7ggZtdiMFe+f6VfhgqzrtjvuE3cDjouJMSXqUEdvSW/xScUJEQ+kOebJaqm
eSHOT+4/9BRBT2rVhdI17qaMgZ4+DrtTpWIhcyPQrTTbsNVou5g8BCkSrP8pbGLY
6hIC8ADsEvZ+B9yNL/JpYRhch8EBSB0F73x3TUmkGcA6cR/eOcQcPG5c0QIjSktu
oeQXG3LE0Hyz3SpWZNksHjChgRUNtIEjnjPUD3+2WViR4cjkkcUdYqn/xz/r3INt
L+GcKWWApbib7uaTbvfbeBeIaX1W0aIHJmaG/xr7GEHZGyi7+Aa2cUXCLcbh7OMa
gWFFaaRFqEq51HHzfnEuLL6PGuST1yFdy1+8iVJddXVWT3A7aWyzYbsJGeZA0FLX
PNLy/0CtqEAAIFoutia5
=0DL0
-----END PGP SIGNATURE-----

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140821075736.GD2737>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation