Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Sep 2016 09:35:03 +0200
From:      Michael Schuster <michaelsprivate@gmail.com>
To:        Stxe5le Bordal Kristoffersen <chiller@putsch.kolbu.ws>
Cc:        freeBSD Mailing List <freebsd-questions@freebsd.org>
Subject:   Re: Server gets a high load, but no CPU use, and then later stops respond on the network
Message-ID:  <CADqw_gLiL=RDmdfOpr5Y-eWqzjDJmvhAvfTR8mc9bWQa8Kungg@mail.gmail.com>
In-Reply-To: <20160913232351.GA36091@putsch.kolbu.ws>
References:  <20160913232351.GA36091@putsch.kolbu.ws>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

While I'm not very familiar with FreeBSD internals, I'd like to point out
two things that I think may be relevant:

1) note that '[idle]' seems to be the only thread/process doing significant
work - at a guess, I'd say that's the kernel doing work that cannot be
ascribed to anything else ... housekeeping? (someone who knows FreeBSD
better will have to answer that)

On Wed, Sep 14, 2016 at 1:23 AM, Stxe5le Bordal Kristoffersen <
chiller@putsch.kolbu.ws> wrote:

>   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU
> COMMAND
>    11 root         24 155 ki31     0K   384K CPU23  23 1206.3 2396.63%
> [idle]
>     5 root          1 -16    -     0K    16K ipmire 14 100:17   0.00%
> [ipmi0: kcs]
>     0 root        407  -8    -     0K  6512K -      22  56:20   0.00%
> [kernel]
>     7 root          2 -16    -     0K    32K umarcl  3   6:21   0.00%
> [pagedaemon]
>    18 root          1  16    -     0K    16K syncer 14   3:37   0.00%
> [syncer]
>    12 root         38 -76    -     0K   608K WAIT   255   3:04   0.00%
> [intr]
>     2 root          6 -16    -     0K    96K -       0   2:41   0.00%
> [cam]
>    14 root          1 -16    -     0K    16K -      16   1:40   0.00%
> [rand_harvestq]
>     3 root          9  -8    -     0K   176K tx->tx 20   1:13   0.00%
> [zfskern]
>    17 root          1 -16    -     0K    16K vlruwt 13   1:10   0.00%
> [vnlru]
>   762 root          1  20    0 50040K 15212K select 18   0:10   0.00%
> /usr/local/bin/perl -wT /usr/local/sbin/munin-node
>   620 root          1  20    0 14520K  2044K select 20   0:06   0.00%
> /usr/sbin/syslogd -s
>    15 root         40 -68    -     0K   640K -       0   0:05   0.00%
> [usb]
>   686 root          1  20    0 26128K 18044K select 15   0:05   0.00%
> /usr/sbin/ntpd -c /etc/ntp.conf -p /var/run/ntpd.pid -f /var/db/ntpd.drift
>   823 root          1  20    0 24156K  5420K select 13   0:02   0.00%
> sendmail: accepting connections (sendmail)
>     6 root          1 -16    -     0K    16K idle   16   0:02   0.00%
> [enc_daemon0]
>    16 root          1 -16    -     0K    16K psleep 19   0:00   0.00%
> [bufdaemon]
>   830 root          1  20    0 16624K   712K nanslp 16   0:00   0.00%
> /usr/sbin/cron -s
>   826 smmsp         1  20    0 24156K  1056K pause  23   0:00   0.00%
> sendmail: Queue runner@00:30:00 for /var/spool/clientmqueue (sendmail)
> 52778 chiller       1  20    0 31060K  5356K pause  23   0:00   0.00%
> -zsh (zsh)
>   800 root          1  20    0 61316K  5164K select 17   0:00   0.00%
> /usr/sbin/sshd
>     1 root          1  20    0  9492K   460K wait   22   0:00   0.00%
> [init]
> 54395 chiller       1  23    0 31060K  5388K pause  23   0:00   0.00%
> -zsh (zsh)
> 52777 chiller       1  20    0 86584K  7576K select 23   0:00   0.00%
> sshd: chiller@pts/0 (sshd)
> 54394 chiller       1  20    0 86584K  7616K select 19   0:00   0.00%
> sshd: chiller@pts/1 (sshd)
>   473 root          1  20    0 13628K  4504K select 22   0:00   0.00%
> /sbin/devd
> 54441 root          1  20    0 24392K  4064K pause  15   0:00   0.00% -su
> (zsh)
> 52774 root          1  20    0 86584K  7532K select 19   0:00   0.00%
> sshd: chiller [priv] (sshd)
> 54050 root          1  20    0 24392K  4064K ttyin  20   0:00   0.00% -su
> (zsh)
>    13 root          3  -8    -     0K    48K -       4   0:00   0.00%
> [geom]
> 54389 root          1  20    0 86584K  7568K select 13   0:00   0.00%
> sshd: chiller [priv] (sshd)
>
> [...]
>
>
2) look at 'sr' (using a fixed-width font probably helps). In Solaris
(which is where I come from ... a long time ago ;-)) this is "scan rate",
ie the number of pages (per second) the paging mechanism is looking at -
(again on Solaris) this would mean that your system is under some kind of
fairly constant memory pressure - where from I cannot even guess, and given
the "avm" and "fre" columns, this does look very strange ... but that's
what I'd continue my investigation with.

pusen# vmstat 1
>  procs      memory      page                    disks     faults
>  cpu
>  r b w     avm    fre   flt  re  pi  po    fr  sr da0 da1   in   sy   cs
> us sy id
>  0 0 0    858M  1449M   335   0   0   1   355 4954   0   0 1917 4403 5302
> 0  0 99
>  0 0 0    858M  1449M     0   0   0   0     0 265   0   0    1  120   80
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 265   0   0    0  124   81
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 265   0   0    1  120   68
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 265   0   0    1  127   92
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 265   0   0    0  120   91
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 265   0   0    1  121   82
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 265   0   0    0  120   75
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 265   0   0    2  121   96
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 265   0   0    1  126   83
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 217   0   0    1  121   68
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 215   0   0    1  120   88
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 215   0   0    0  121   92
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 215   0   0    0  120   83
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 215   0   0    1  127   90
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 196   0   0    5  120   94
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 196   0   0    1  121   80
> 0  0 100
>  0 0 0    858M  1449M     0   0   0   0     0 196   0   0    0  123   79
> 0  0 100
>  1 0 0    858M  1449M     0   0   0   0     0 196   0   0    2  121   76
> 0  0 100
>  1 0 0    858M  1449M     0   0   0   0     0 196   0   0    4  118  106
> 0  0 100
>  1 0 0    858M  1449M     0   0   0   0     0 196   0   0    0  112   87
> 0  0 100
>

HTH
Michael
-- 
Michael Schuster
http://recursiveramblings.wordpress.com/
recursion, n: see 'recursion'



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADqw_gLiL=RDmdfOpr5Y-eWqzjDJmvhAvfTR8mc9bWQa8Kungg>