Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 5 Jan 2012 22:31:18 +0200
From:      =?windows-1251?B?yu7t/Oru4iDF4uPl7ejp?= <kes-kes@yandex.ru>
To:        freebsd-questions@freebsd.org
Cc:        freebsd-bugs@FreeBSD.org
Subject:   Re[8]: high load system do not take all CPU time
Message-ID:  <1863129064.20120105223118@yandex.ru>
In-Reply-To: <201112262154.pBQLsxJt038471@mail.r-bonomi.com>
References:  <9110154891.20111226214424@yandex.ru> <201112262154.pBQLsxJt038471@mail.r-bonomi.com>

next in thread | previous in thread | raw e-mail | index | archive | help

I get it!!

When one of netisr take 100% of CPU other netisr threads did not get
free CPU time.

http://piccy.info/view3/2444937/25e978a34d1da6b62e4e4602dee53d8b/

In this case network works without any problem

last pid: 23632;  load averages:  5.53,  5.76,  5.72                up 6+00:09:50  20:37:43
292 processes: 12 running, 265 sleeping, 15 waiting
CPU 0:  9.4% user,  0.0% nice, 18.4% system, 36.1% interrupt, 36.1% idle
CPU 1:  2.4% user,  0.0% nice, 12.2% system, 62.4% interrupt, 23.1% idle
CPU 2:  1.6% user,  0.0% nice,  4.3% system, 85.1% interrupt,  9.0% idle
CPU 3:  1.6% user,  0.0% nice,  3.9% system, 86.3% interrupt,  8.2% idle
Mem: 613M Active, 2788M Inact, 315M Wired, 122M Cache, 112M Buf, 59M Free
Swap: 4096M Total, 30M Used, 4065M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   12 root       -72    -     0K   160K CPU3    3  43.4H 100.00% {swi1: netisr 3}
   12 root       -72    -     0K   160K CPU2    2  28.6H 93.60% {swi1: netisr 1}
   12 root       -72    -     0K   160K CPU1    1 954:56 50.68% {swi1: netisr 2}
   11 root       155 ki31     0K    32K RUN     0  96.7H 36.91% {idle: cpu0}
   12 root       -72    -     0K   160K RUN     0 757:29 31.10% {swi1: netisr 0}
   11 root       155 ki31     0K    32K RUN     1  98.4H 21.44% {idle: cpu1}
   12 root       -92    -     0K   160K WAIT    0  17.8H 12.94% {irq256: re0}
   11 root       155 ki31     0K    32K RUN     2  97.7H 11.08% {idle: cpu2}
   11 root       155 ki31     0K    32K RUN     3  98.6H 10.16% {idle: cpu3}
   13 root       -16    -     0K    32K sleep   0 411:30  4.25% {ng_queue0}
   13 root       -16    -     0K    32K sleep   0 411:54  4.20% {ng_queue3}
   13 root       -16    -     0K    32K RUN     0 411:05  4.10% {ng_queue1}
   13 root       -16    -     0K    32K RUN     2 411:16  3.81% {ng_queue2}
 5588 root        21    0   222M   106M select  0 116:59  0.93% {mpd5}
 5588 root        21    0   222M   106M select  1   0:00  0.93% {mpd5}
 5588 root        21    0   222M   106M select  3   0:00  0.83% {mpd5}
 5588 root        21    0   222M   106M select  0   0:00  0.83% {mpd5}
 5588 root        21    0   222M   106M select  3   0:00  0.83% {mpd5}
 5588 root        21    0   222M   106M select  3   0:00  0.83% {mpd5}
 5588 root        21    0   222M   106M select  3   0:00  0.83% {mpd5}
32882 root        21    0 15392K  5492K select  1 313:35  0.63% snmpd
 5588 root        21    0   222M   106M select  3   0:00  0.54% {mpd5}
 5588 root        20    0   222M   106M select  0   0:00  0.15% {mpd5}
 5588 root        20    0   222M   106M select  2   0:00  0.15% {mpd5}
 5588 root        20    0   222M   106M select  3   0:00  0.15% {mpd5}


# netstat -W 1 re0
            input        (Total)           output
   packets  errs idrops      bytes    packets  errs      bytes colls
     96245     0     0   65271414     115828     0   80031246     0
    104903     0     0   70367758     121943     0   85634456     0
    102693     0     0   69663018     118800     0   83847075     0
    108654     0     0   73776089     125368     0   88487518     0
    100216     0     0   68186983     118522     0   80985757     0
     94819     0     0   63001720     107334     0   73020011     0
    108428     0     0   73849974     127976     0   88674709     0


# vmstat -i
interrupt                          total       rate
irq14: ata0                     13841687         26
irq16: ehci0                      782489          1
irq23: ehci1                     1046847          2
cpu0:timer                    2140530447       4122
irq256: re0                    374422787        721
cpu1:timer                    2132118859       4106
cpu3:timer                    2108526888       4061
cpu2:timer                    2131292574       4105
Total                         8902562578      17147


    1 users    Load  5.87  5.62  5.65                  Jan  5 20:41

Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act  737696   12284  3154940    38552  186192  count
All  946508   19628  5403252    88672          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt    760 cow   38031 total
            227      128k 7771  14k  21k  13k 3996   1624 zfod     18 ata0 14
                                                      189 ozfod     1 ehci0 16
 7.8%Sys  68.3%Intr  4.2%User  0.0%Nice 19.7%Idle      11%ozfod     2 ehci1 23
|    |    |    |    |    |    |    |    |    |    |       daefr  4123 cpu0:timer
====++++++++++++++++++++++++++++++++++>>             1439 prcfr 21603 re0 256
                                       184 dtbuf     3275 totfr  4102 cpu1:timer
Namei     Name-cache   Dir-cache    142271 desvn          react  4083 cpu3:timer
   Calls    hits   %    hits   %     46214 numvn          pdwak  4099 cpu2:timer
   26140   24882  95                 33321 frevn          pdpgs
                                                          intrn
Disks   ad0   da0 pass0                            324800 wire
KB/t  17.24  0.00  0.00                            633588 act
tps      18     0     0                           2846896 inact
MB/s   0.30  0.00  0.00                            119168 cache
%busy     1     0     0                             67024 free
                                                   114976 buf

in this situation of "100%" load I try to run iperf:

[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   288 MBytes   241 Mbits/sec


    111903     0     0   74200867     126836     0   89449352     0
    110664     0     0   74473447     128409     0   90114178     0
    113265     0     0   67425404     127877     0   96282769     0
    136307     0     0   69979865     147500     0  130264219     0
    138705     0     0   68701690     154450     0  139226405     0
    133391     0     0   66548136     147004     0  133131877     0
    140963     0     0   66963418     156790     0  143433241     0
    145366     0     0   68466136     159704     0  148596768     0
    128926     0     0   59496288     142304     0  132538052     0
    138049     0     0   62331085     149111     0  140148372     0
    150960     0     0   67406436     161604     0  156142937     0
    144312     0     0   62348179     160260     0  155727547     0
    105767     0     0   60168774     121313     0   95353466     0
    112040     0     0   75337443     135858     0   96321971     0
    103266     0     0   69406528     123384     0   86397398     0
    102423     0     0   69873997     121462     0   84718299     0
    112413     0     0   75917095     133604     0   94757254     0
    107780     0     0   72207347     126182     0   89811690     0

As you can see that, re0 have a reserve to serve more traffic! despite on
reached "mistery limit" on graph. The reserve is about 20kpps and ~30MBytes/s

When trying to connect to this vpnserver it says 'error 718:....'
Does mpd5 have poor? performance now.

last pid: 43807;  load averages:  6.91,  5.84,  5.43                     up 6+00:41:40  21:09:33
310 processes: 10 running, 281 sleeping, 18 waiting, 1 lock
CPU 0: 31.0% user,  0.0% nice, 16.5% system, 45.5% interrupt,  7.1% idle
CPU 1: 21.6% user,  0.0% nice,  7.5% system, 65.5% interrupt,  5.5% idle
CPU 2: 15.3% user,  0.0% nice,  7.1% system, 71.8% interrupt,  5.9% idle
CPU 3: 15.7% user,  0.0% nice,  4.7% system, 71.4% interrupt,  8.2% idle
Mem: 543M Active, 2616M Inact, 318M Wired, 123M Cache, 112M Buf, 298M Free
Swap: 4096M Total, 31M Used, 4064M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   12 root       -72    -     0K   160K CPU0    0  43.8H 97.85% {swi1: netisr 3}
   12 root       -72    -     0K   160K *per-i  1  29.0H 87.01% {swi1: netisr 1}
42359 root        92    0 13704K  8356K CPU2    2   1:36 65.82% bzip2
   12 root       -72    -     0K   160K WAIT    3 969:10 37.79% {swi1: netisr 2}
   12 root       -72    -     0K   160K WAIT    2 767:04 32.71% {swi1: netisr 0}
   12 root       -92    -     0K   160K WAIT    0  17.9H 14.01% {irq256: re0}
   11 root       155 ki31     0K    32K RUN     0  96.8H  8.59% {idle: cpu0}
   11 root       155 ki31     0K    32K RUN     2  97.8H  7.23% {idle: cpu2}
   11 root       155 ki31     0K    32K RUN     3  98.7H  7.18% {idle: cpu3}
   11 root       155 ki31     0K    32K RUN     1  98.4H  7.03% {idle: cpu1}
   13 root       -16    -     0K    32K sleep   3 413:33  4.49% {ng_queue3}
   13 root       -16    -     0K    32K RUN     0 412:56  4.30% {ng_queue2}
   13 root       -16    -     0K    32K sleep   2 413:10  4.15% {ng_queue0}
   13 root       -16    -     0K    32K RUN     2 412:43  4.05% {ng_queue1}
 5588 root        20  -20   222M   105M select  0 117:48  1.61% {mpd5}
 5588 root        20  -20   222M   105M select  1   0:00  1.22% {mpd5}
32882 root        22    0 15392K  5492K select  3 314:39  1.12% snmpd
 5588 root        20  -20   222M   105M select  0   0:00  1.12% {mpd5}
 5588 root        20  -20   222M   105M select  1   0:00  1.12% {mpd5}
 5351 root        20    0  9616K  1224K select  0  35:12  0.24% syslogd
 5588 root        20  -20   222M   105M select  0   0:00  0.24% {mpd5}
 5588 root        20  -20   222M   105M select  1   0:00  0.24% {mpd5}
 5588 root        20  -20   222M   105M select  3   0:00  0.24% {mpd5}
 5588 root        20  -20   222M   105M select  3   0:00  0.24% {mpd5}
 5588 root        20  -20   222M   105M select  1   0:00  0.24% {mpd5}
 5588 root        20  -20   222M   105M select  0   0:00  0.24% {mpd5}
 5588 root        20  -20   222M   105M select  0   0:00  0.24% {mpd5}
 5588 root        20  -20   222M   105M select  1   0:00  0.24% {mpd5}
 5588 root        20  -20   222M   105M select  2   0:00  0.24% {mpd5}
 5588 root        20  -20   222M   105M select  1   0:00  0.24% {mpd5}
 5588 root        20  -20   222M   105M select  1   0:00  0.24% {mpd5}
 5588 root        20  -20   222M   105M select  0   0:00  0.24% {mpd5}
 5588 root        20  -20   222M   105M select  1   0:00  0.15% {mpd5}
 5588 root        20  -20   222M   105M select  2   0:00  0.15% {mpd5}
 5588 root        20  -20   222M   105M select  1   0:00  0.15% {mpd5}
 5588 root        20  -20   222M   105M select  1   0:00  0.15% {mpd5}
 5588 root        20  -20   222M   105M select  1   0:00  0.15% {mpd5}
 5588 root        20  -20   222M   105M select  2   0:00  0.15% {mpd5}
 5588 root        20  -20   222M   105M select  1   0:00  0.15% {mpd5}
 5588 root        20  -20   222M   105M select  2   0:00  0.15% {mpd5}
 5588 root        20  -20   222M   105M select  3   0:00  0.15% {mpd5}
 5588 root        20  -20   222M   105M select  0   0:00  0.15% {mpd5}
   12 root       -60    -     0K   160K WAIT    1 108:19  0.10% {swi4: clock}
14567 root        20    0  9536K   736K nanslp  1  14:28  0.05% monitord
 5588 root        20  -20   222M   105M select  0   0:00  0.05% {mpd5}
 5588 root        20  -20   222M   105M select  0   0:00  0.05% {mpd5}
 5588 root        20  -20   222M   105M select  2   0:00  0.05% {mpd5}
 5588 root        20  -20   222M   105M select  3   0:00  0.05% {mpd5}
43807 cacti       72    0 12000K  2900K CPU3    3   0:00  0.00% snmpget

I also ran bzip2 to see 'does the remaining CPU utilised?'
http://piccy.info/view3/2445242/802b93a3db1055a20a98aed7e61dfc9a/
Yes it is utilized.

We see many mpd5 threads.


but mpd5 log says (only suspicious messages):
Jan  5 21:00:01 flux mpd: [L98] RADIUS: rad_send_request for user '10153' failed: No valid RADIUS responses received
Jan  5 21:00:01 flux mpd: [L98] ACCT: Thread finished normally
Jan  5 21:00:01 flux mpd: [L194] RADIUS: Sending request for user 'aviator'
Jan  5 21:00:01 flux mpd: [L194] RADIUS: rad_send_request for user 'aviator' failed: No valid RADIUS responses received
Jan  5 21:00:01 flux mpd: [L194] ACCT: Thread finished normally

Jan  5 21:01:01 flux mpd: EVENT: Processing event EVENT_READ LinkNgDataEvent() done
Jan  5 21:01:01 flux mpd: EVENT: Processing event EVENT_TIMEOUT TimerExpires()
Jan  5 21:01:01 flux mpd: EVENT: Processing timer "FsmKeepAlive" FsmEchoTimeout()
Jan  5 21:01:01 flux mpd: EVENT: Processing timer "FsmKeepAlive" FsmEchoTimeout() done
Jan  5 21:01:01 flux mpd: EVENT: Processing event EVENT_TIMEOUT TimerExpires() done
Jan  5 21:01:01 flux mpd: EVENT: Processing event EVENT_TIMEOUT TimerExpires()
Jan  5 21:01:01 flux mpd: EVENT: Processing timer "FsmKeepAlive" FsmEchoTimeout()
Jan  5 21:01:01 flux mpd: EVENT: Processing timer "FsmKeepAlive" FsmEchoTimeout() done
Jan  5 21:01:01 flux mpd: EVENT: Processing event EVENT_TIMEOUT TimerExpires() done
Jan  5 21:01:01 flux mpd: EVENT: Processing event EVENT_TIMEOUT TimerExpires()

Jan  5 21:01:44 flux mpd: [L36] LCP: phase shift TERMINATE --> ESTABLISH
Jan  5 21:01:44 flux mpd: EVENT: Stopping timer "LCP" FsmTimeout() at fsm.c:190
Jan  5 21:01:44 flux mpd: [L36] LCP: LayerFinish
Jan  5 21:01:44 flux mpd: EVENT: Stopping timer "AuthTimer" AuthTimeout() at auth.c:682
Jan  5 21:01:44 flux mpd: EVENT: Stopping timer "(null)" (null)() at pap.c:64
Jan  5 21:01:44 flux mpd: EVENT: Stopping timer "ChalTimer" ChapChalTimeout() at chap.c:93
Jan  5 21:01:44 flux mpd: EVENT: Stopping timer "(null)" (null)() at chap.c:94
Jan  5 21:01:44 flux mpd: EVENT: Stopping timer "(null)" (null)() at eap.c:146
Jan  5 21:01:44 flux mpd: EVENT: Stopping timer "(null)" (null)() at eap.c:147
Jan  5 21:01:44 flux mpd: EVENT: Message 2 to PhysMsg() sent
Jan  5 21:01:44 flux mpd: EVENT: Processing timer "LCP" FsmTimeout() done
Jan  5 21:01:44 flux mpd: EVENT: Processing event EVENT_TIMEOUT TimerExpires() done
Jan  5 21:01:44 flux mpd: EVENT: Processing event EVENT_READ MsgEvent()
Jan  5 21:01:44 flux mpd: EVENT: Message 2 to PhysMsg() received
Jan  5 21:01:44 flux mpd: [L36] device: CLOSE event
Jan  5 21:01:44 flux mpd: pptp38-0: clearing call
Jan  5 21:01:44 flux mpd: pptp38: send CallDiscNotify msg
Jan  5 21:01:44 flux mpd:   len=0x94 msgType=1 magic=0x1a2b3c4d type=13
Jan  5 21:01:44 flux mpd:   cid=0xdecd result=3 err=0 cause=0 stats=""
Jan  5 21:01:44 flux mpd: pptp38: wrote 148 bytes ctrl data
Jan  5 21:01:44 flux mpd:     00 94 00 01 1a 2b 3c 4d 00 0d 00 00 de cd 03 00  .....+<M........
Jan  5 21:01:44 flux mpd:     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan  5 21:01:44 flux mpd: last message repeated 7 times
Jan  5 21:01:44 flux mpd:     00 00 00 00                                      ....
Jan  5 21:01:44 flux mpd: pptp38-0: killing channel
Jan  5 21:01:44 flux mpd: [L36] PPTP call terminated
Jan  5 21:01:44 flux mpd: [L36] device: DOWN event
Jan  5 21:01:44 flux mpd: [L36] Link: DOWN event
Jan  5 21:01:44 flux mpd: [L36] LCP: Close event
Jan  5 21:01:44 flux mpd: [L36] LCP: state change Stopped --> Closed
Jan  5 21:01:44 flux mpd: EVENT: Stopping timer "LCP" FsmTimeout() at fsm.c:190


#top
last pid: 45804;  load averages:  6.66,  6.35,  5.74                     up 6+00:44:49  21:12:42
241 processes: 2 running, 239 sleeping
CPU: 17.0% user,  0.0% nice,  7.3% system, 62.9% interrupt, 12.9% idle
Mem: 537M Active, 2679M Inact, 320M Wired, 117M Cache, 112M Buf, 246M Free
Swap: 4096M Total, 31M Used, 4064M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
42359 root          1  91    0 13704K  8356K RUN     3   3:32 62.21% bzip2
 5588 root         52  20  -20   222M   106M select  0 117:53 28.61% mpd5
 3641 root          1  26    0 10460K  3996K select  0 116:52  2.10% zebra
32882 root          1  21    0 15392K  5492K select  2 314:46  1.42% snmpd
14567 root          1  20    0  9536K   736K nanslp  0  14:29  0.15% monitord
45792 cacti         1  27    0 12000K  3096K select  0   0:00  0.15% snmpwalk
34587 freeradius    2  52  -20 23652K 16420K usem    0   0:12  0.10% radiusd
45730 cacti         1  20    0 12000K  3096K select  2   0:00  0.10% snmpwalk
 5455 bind          7  20    0   136M   104M kqread  2  56:12  0.00% named
 5351 root          1  20    0  9616K  1224K select  2  35:13  0.00% syslogd
10440 dhcpd         1  20    0 15032K  5300K select  0   4:37  0.00% dhcpd
17155 firebird      8  20  -10 88652K 24128K usem    2   4:01  0.00% fb_smp_server
 3647 root          1  20    0 14660K  5196K select  2   1:57  0.00% bgpd
15683 mysql        34  20    0   407M   280M sbwait  0   1:24  0.00% mysqld
13446 postfix       1  20    0 11876K   948K kqread  2   0:12  0.00% qmgr
26875 root          1  20    0 33812K 11212K select  1   0:12  0.00% httpd
13336 root          1  20    0 11876K   664K kqread  2   0:06  0.00% master
29201 root          1  20    0 13068K  3012K select  2   0:05  0.00% sshd
15176 vmail         1  20    0 11324K  1492K kqread  0   0:04  0.00% auth
29733 root          1  22    0  9648K   544K nanslp  2   0:02  0.00% cron
11459 root          1  20    0 11136K   936K kqread  2   0:02  0.00% log
64159 postfix       1  20    0 11876K   692K kqread  0   0:02  0.00% anvil
33804 www           1  20    0 42004K 24028K lockf   2   0:01  0.00% httpd
11454 root          1  20    0 11136K   932K kqread  0   0:01  0.00% dovecot
35662 root          1  20    0 13236K  3928K select  0   0:01  0.00% mc
37631 www           1  20    0 37908K 19780K lockf   0   0:01  0.00% httpd
36086 www           1  20    0 37908K 20120K lockf   2   0:01  0.00% httpd
42143 www           1  20    0 37908K 20396K lockf   0   0:01  0.00% httpd

here suspicious is state for fb_smp_server and radiusd 'usem' (use semaphores??)

I restart 'radiusd' with no debug mode. Active semaphores are rised
for firebird from 5 to 13

mpd (radiusd?) start to process auth and acct messages
http://piccy.info/view3/2445292/150cf955f2903404dd8a3751de7f2dd1/
slow but it do that!

# ipcs -s
Semaphores:
T           ID          KEY MODE        OWNER    GROUP
s       131072      4396057 --rw-r----- firebird firebird
s       458753     21173263 --rw-r----- firebird firebird
s       327682     37950480 --rw-r----- firebird firebird
s       262147     54727696 --rw-r----- firebird firebird
s       131076     71504912 --rw-r----- firebird firebird
s       131077     88282128 --rw-r----- firebird firebird
s        65542    105059344 --rw-r----- firebird firebird
s        65543    121836560 --rw-r----- firebird firebird
s        65544    138613776 --rw-r----- firebird firebird
s        65545    155390992 --rw-r----- firebird firebird
s        65546    172168208 --rw-r----- firebird firebird
s        65547    188945424 --rw-r----- firebird firebird
s        65548    205722640 --rw-r----- firebird firebird


     -T      Display system information about shared memory, message queues
             and semaphores.

# ipcs -T
msginfo:
        msgmax:        16384    (max characters in a message)
        msgmni:           40    (# of message queues)
        msgmnb:         2048    (max characters in a message queue)
        msgtql:           40    (max # of messages in system)
        msgssz:            8    (size of a message segment)
        msgseg:         2048    (# of message segments in system)

shminfo:
        shmmax:    134217728    (max shared memory segment size)
        shmmin:            1    (min shared memory segment size)
        shmmni:          192    (max number of shared memory identifiers)
        shmseg:          128    (max shared memory segments per process)
        shmall:        32768    (max amount of shared memory in pages)

seminfo:
        semmap:          256    (# of entries in semaphore map)
        semmni:          256    (# of semaphore identifiers)
        semmns:          512    (# of semaphores in system)
        semmnu:          256    (# of undo structures in system)
        semmsl:          340    (max # of semaphores per id)
        semopm:          100    (max # of operations per semop call)
        semume:           50    (max # of undo entries per process)
        semusz:          616    (size in bytes of undo structure)
        semvmx:        32767    (semaphore maximum value)
        semaem:        16384    (adjust on exit max value)

# sysctl -a kern.ipc
kern.ipc.maxsockbuf: 16777216
kern.ipc.sockbuf_waste_factor: 8
kern.ipc.somaxconn: 32768
kern.ipc.max_linkhdr: 16
kern.ipc.max_protohdr: 60
kern.ipc.max_hdr: 76
kern.ipc.max_datalen: 124
kern.ipc.nmbjumbo16: 3200
kern.ipc.nmbjumbo9: 6400
kern.ipc.nmbjumbop: 12800
kern.ipc.nmbclusters: 262144
kern.ipc.piperesizeallowed: 1
kern.ipc.piperesizefail: 0
kern.ipc.pipeallocfail: 0
kern.ipc.pipefragretry: 0
kern.ipc.pipekva: 2097152
kern.ipc.maxpipekva: 16810176
kern.ipc.msgseg: 2048
kern.ipc.msgssz: 8
kern.ipc.msgtql: 40
kern.ipc.msgmnb: 2048
kern.ipc.msgmni: 40
kern.ipc.msgmax: 16384
kern.ipc.semaem: 16384
kern.ipc.semvmx: 32767
kern.ipc.semusz: 616
kern.ipc.semume: 50
kern.ipc.semopm: 100
kern.ipc.semmsl: 340
kern.ipc.semmnu: 256
kern.ipc.semmns: 512
kern.ipc.semmni: 256
kern.ipc.semmap: 256
kern.ipc.shm_allow_removed: 0
kern.ipc.shm_use_phys: 0
kern.ipc.shmall: 32768
kern.ipc.shmseg: 128
kern.ipc.shmmni: 192
kern.ipc.shmmin: 1
kern.ipc.shmmax: 134217728
kern.ipc.maxsockets: 204800
kern.ipc.numopensockets: 1150
kern.ipc.nsfbufsused: 1
kern.ipc.nsfbufspeak: 80
kern.ipc.nsfbufs: 6656

#sysctl -a net.graph
net.graph.msg_version: 8
net.graph.abi_version: 12
net.graph.maxdata: 32768
net.graph.maxalloc: 32768
net.graph.threads: 4
net.graph.control.proto: 2
net.graph.data.proto: 1
net.graph.family: 32
net.graph.recvspace: 524288
net.graph.maxdgram: 262144

# vmstat -z|egrep 'ITEM|NetGraph'
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
NetGraph items:          36,  32804,      71,     755,8711030020,   0,   0
NetGraph data items:     36,  32804,       0,    1003,16369746867,   0,   0

'# vmstat -z' there are FAIL only for
128 Bucket:             524,      0,     540,     223,  296830,75013,   0


Now (21hour 45min) state of system is:
http://piccy.info/view3/2445350/6b7c5bfffc977ab5a1fe934172aa33fd/
as you can see now
is *less* kernel interrupts on graph
is *more* interrupts from re0 that  (see below)
is *more* pps on re0 interface   (see below)

  as it were when this situation of overload have occurred!!

So (maybe??) is this problem with no 're' interface???


#systat -v
    2 users    Load  6.01  5.75  5.72                  Jan  5 21:46

Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act  961448   13896  3541824    41356  180912  count
All 1247360   22052  5889300   103760          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt    713 cow   39492 total
  7   1     236      166k 7248  45k  23k  20k 2994   1031 zfod     26 ata0 14
                                                      112 ozfod     2 ehci0 16
13.0%Sys  62.7%Intr  4.1%User  0.0%Nice 20.2%Idle      10%ozfod     2 ehci1 23
|    |    |    |    |    |    |    |    |    |    |       daefr  4121 cpu0:timer
======++++++++++++++++++++++++++++++++>>             1087 prcfr 23015 re0 256
                                       111 dtbuf     3213 totfr  4110 cpu1:timer
Namei     Name-cache   Dir-cache    142271 desvn          react  4106 cpu3:timer
   Calls    hits   %    hits   %     47569 numvn          pdwak  4110 cpu2:timer
   18464   18449 100                 34142 frevn          pdpgs
                                                          intrn
Disks   ad0   da0 pass0                            330212 wire
KB/t  44.00  0.00  0.00                            748860 act
tps      25     0     0                           2731784 inact
MB/s   1.08  0.00  0.00                            159568 cache
%busy     1     0     0                             21024 free
                                                   114976 buf

# netstat -W 1 re0
            input        (Total)           output
   packets  errs idrops      bytes    packets  errs      bytes colls
    137567     0     0   97669861     156923     0  114329168     0
    153711     0     0  108860571     178608     0  129023839     0
    132672     0     0   92702405     159798     0  113270233     0
    141540     0     0   99766858     169112     0  120816262     0
    143670     0     0  101820412     173753     0  124918609     0
    138409     0     0   97944357     162214     0  117144054     0
    139900     0     0   99422132     163406     0  118104685     0
    151955     0     0  108602727     180131     0  131715416     0
    138995     0     0   99993704     167247     0  121914361     0
    144477     0     0  104308909     171671     0  125640285     0
    136133     0     0   95776095     163066     0  117071802     0
    135196     0     0   95555349     163899     0  118434548     0
    135067     0     0   96838717     163829     0  119398350     0
    138786     0     0  100931422     162668     0  119753322     0
    136027     0     0   95524321     162876     0  117032548     0



and now 22:00 (NOTICE: significant release of CPU utilisation)
there is more less kernel interrupts! but no significant changes in
traffic flow: same interrupts and same pps and same Mbp/s...
http://piccy.info/view3/2445486/f3f0a124d8f080ddd181e5db8f24dd82/orig/
and vpn online users
http://piccy.info/view3/2445500/b3812ae75e8dcf7c3778bf1330c055c2/



    2 users    Load  3.89  4.67  5.04                  Jan  5 22:03

Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act  967716   14036  3339656    41580  227852  count          30
All 1245848   26492  5715024   119292          pages          31
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt   1892 cow   38624 total
  2         235      201k 5722  10k  22k  32k 4020    649 zfod     50 ata0 14
                                                      130 ozfod     1 ehci0 16
17.2%Sys  42.5%Intr  1.3%User  0.0%Nice 39.0%Idle      20%ozfod     3 ehci1 23
|    |    |    |    |    |    |    |    |    |    |       daefr  4125 cpu0:timer
=========+++++++++++++++++++++>                       967 prcfr 22069 re0 256
                                       152 dtbuf     5557 totfr  4125 cpu1:timer
Namei     Name-cache   Dir-cache    142271 desvn          react  4126 cpu3:timer
   Calls    hits   %    hits   %     47597 numvn          pdwak  4125 cpu2:timer
    3489    3483 100                 34132 frevn          pdpgs
                                                          intrn
Disks   ad0   da0 pass0                            304952 wire
KB/t  16.74  0.00  0.00                            688352 act
tps      50     0     0                           2770104 inact
MB/s   0.82  0.00  0.00                            111144 cache
%busy     4     0     0                            116904 free
                                                   114976 buf

# netstat -W 1 re0
            input        (Total)           output
   packets  errs idrops      bytes    packets  errs      bytes colls
    127166     0     0   88929973     143992     0  106883020     0
    138535     0     0   98464838     158725     0  119737496     0
    133694     0     0   95349794     157750     0  117821954     0
    129177     0     0   90732865     146634     0  108781223     0
    143138     0     0  101357842     162761     0  119955682     0
    142492     0     0  102453893     166022     0  124863890     0
    132476     0     0   93756889     149644     0  113221417     0
    136080     0     0   96119177     154514     0  116135341     0
    143103     0     0  100681447     160290     0  119572568     0
    124777     0     0   86943492     140187     0  102704963     0
    131743     0     0   92982527     152259     0  113233013     0

NOTICE again: pps and interrupts now more than were when overload have occured!


I think this issue may be related to next subsystems:
1. netgraph
2. netisr

# sysctl net.isr
net.isr.numthreads: 4
net.isr.maxprot: 16
net.isr.defaultqlimit: 256
net.isr.maxqlimit: 10240
net.isr.bindthreads: 0
net.isr.maxthreads: 4
net.isr.direct: 0
net.isr.direct_force: 0

# uname -a
FreeBSD 9.0-CURRENT FreeBSD 9.0-CURRENT #4: Fri Jun 10 01:30:12 UTC 2011     @:/usr/obj/usr/src/sys/PAE_KES  i386

PS. While reading notice system time in output

-- 
С уважением,
 Коньков                          mailto:kes-kes@yandex.ru




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1863129064.20120105223118>