Date: Thu, 5 Jan 2012 22:31:18 +0200 From: =?windows-1251?B?yu7t/Oru4iDF4uPl7ejp?= <kes-kes@yandex.ru> To: freebsd-questions@freebsd.org Cc: freebsd-bugs@FreeBSD.org Subject: Re[8]: high load system do not take all CPU time Message-ID: <1863129064.20120105223118@yandex.ru> In-Reply-To: <201112262154.pBQLsxJt038471@mail.r-bonomi.com> References: <9110154891.20111226214424@yandex.ru> <201112262154.pBQLsxJt038471@mail.r-bonomi.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I get it!! When one of netisr take 100% of CPU other netisr threads did not get free CPU time. http://piccy.info/view3/2444937/25e978a34d1da6b62e4e4602dee53d8b/ In this case network works without any problem last pid: 23632; load averages: 5.53, 5.76, 5.72 up 6+00:09:50 20:37:43 292 processes: 12 running, 265 sleeping, 15 waiting CPU 0: 9.4% user, 0.0% nice, 18.4% system, 36.1% interrupt, 36.1% idle CPU 1: 2.4% user, 0.0% nice, 12.2% system, 62.4% interrupt, 23.1% idle CPU 2: 1.6% user, 0.0% nice, 4.3% system, 85.1% interrupt, 9.0% idle CPU 3: 1.6% user, 0.0% nice, 3.9% system, 86.3% interrupt, 8.2% idle Mem: 613M Active, 2788M Inact, 315M Wired, 122M Cache, 112M Buf, 59M Free Swap: 4096M Total, 30M Used, 4065M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -72 - 0K 160K CPU3 3 43.4H 100.00% {swi1: netisr 3} 12 root -72 - 0K 160K CPU2 2 28.6H 93.60% {swi1: netisr 1} 12 root -72 - 0K 160K CPU1 1 954:56 50.68% {swi1: netisr 2} 11 root 155 ki31 0K 32K RUN 0 96.7H 36.91% {idle: cpu0} 12 root -72 - 0K 160K RUN 0 757:29 31.10% {swi1: netisr 0} 11 root 155 ki31 0K 32K RUN 1 98.4H 21.44% {idle: cpu1} 12 root -92 - 0K 160K WAIT 0 17.8H 12.94% {irq256: re0} 11 root 155 ki31 0K 32K RUN 2 97.7H 11.08% {idle: cpu2} 11 root 155 ki31 0K 32K RUN 3 98.6H 10.16% {idle: cpu3} 13 root -16 - 0K 32K sleep 0 411:30 4.25% {ng_queue0} 13 root -16 - 0K 32K sleep 0 411:54 4.20% {ng_queue3} 13 root -16 - 0K 32K RUN 0 411:05 4.10% {ng_queue1} 13 root -16 - 0K 32K RUN 2 411:16 3.81% {ng_queue2} 5588 root 21 0 222M 106M select 0 116:59 0.93% {mpd5} 5588 root 21 0 222M 106M select 1 0:00 0.93% {mpd5} 5588 root 21 0 222M 106M select 3 0:00 0.83% {mpd5} 5588 root 21 0 222M 106M select 0 0:00 0.83% {mpd5} 5588 root 21 0 222M 106M select 3 0:00 0.83% {mpd5} 5588 root 21 0 222M 106M select 3 0:00 0.83% {mpd5} 5588 root 21 0 222M 106M select 3 0:00 0.83% {mpd5} 32882 root 21 0 15392K 5492K select 1 313:35 0.63% snmpd 5588 root 21 0 222M 106M select 3 0:00 0.54% {mpd5} 5588 root 20 0 222M 106M select 0 0:00 0.15% {mpd5} 5588 root 20 0 222M 106M select 2 0:00 0.15% {mpd5} 5588 root 20 0 222M 106M select 3 0:00 0.15% {mpd5} # netstat -W 1 re0 input (Total) output packets errs idrops bytes packets errs bytes colls 96245 0 0 65271414 115828 0 80031246 0 104903 0 0 70367758 121943 0 85634456 0 102693 0 0 69663018 118800 0 83847075 0 108654 0 0 73776089 125368 0 88487518 0 100216 0 0 68186983 118522 0 80985757 0 94819 0 0 63001720 107334 0 73020011 0 108428 0 0 73849974 127976 0 88674709 0 # vmstat -i interrupt total rate irq14: ata0 13841687 26 irq16: ehci0 782489 1 irq23: ehci1 1046847 2 cpu0:timer 2140530447 4122 irq256: re0 374422787 721 cpu1:timer 2132118859 4106 cpu3:timer 2108526888 4061 cpu2:timer 2131292574 4105 Total 8902562578 17147 1 users Load 5.87 5.62 5.65 Jan 5 20:41 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 737696 12284 3154940 38552 186192 count All 946508 19628 5403252 88672 pages Proc: Interrupts r p d s w Csw Trp Sys Int Sof Flt 760 cow 38031 total 227 128k 7771 14k 21k 13k 3996 1624 zfod 18 ata0 14 189 ozfod 1 ehci0 16 7.8%Sys 68.3%Intr 4.2%User 0.0%Nice 19.7%Idle 11%ozfod 2 ehci1 23 | | | | | | | | | | | daefr 4123 cpu0:timer ====++++++++++++++++++++++++++++++++++>> 1439 prcfr 21603 re0 256 184 dtbuf 3275 totfr 4102 cpu1:timer Namei Name-cache Dir-cache 142271 desvn react 4083 cpu3:timer Calls hits % hits % 46214 numvn pdwak 4099 cpu2:timer 26140 24882 95 33321 frevn pdpgs intrn Disks ad0 da0 pass0 324800 wire KB/t 17.24 0.00 0.00 633588 act tps 18 0 0 2846896 inact MB/s 0.30 0.00 0.00 119168 cache %busy 1 0 0 67024 free 114976 buf in this situation of "100%" load I try to run iperf: [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 288 MBytes 241 Mbits/sec 111903 0 0 74200867 126836 0 89449352 0 110664 0 0 74473447 128409 0 90114178 0 113265 0 0 67425404 127877 0 96282769 0 136307 0 0 69979865 147500 0 130264219 0 138705 0 0 68701690 154450 0 139226405 0 133391 0 0 66548136 147004 0 133131877 0 140963 0 0 66963418 156790 0 143433241 0 145366 0 0 68466136 159704 0 148596768 0 128926 0 0 59496288 142304 0 132538052 0 138049 0 0 62331085 149111 0 140148372 0 150960 0 0 67406436 161604 0 156142937 0 144312 0 0 62348179 160260 0 155727547 0 105767 0 0 60168774 121313 0 95353466 0 112040 0 0 75337443 135858 0 96321971 0 103266 0 0 69406528 123384 0 86397398 0 102423 0 0 69873997 121462 0 84718299 0 112413 0 0 75917095 133604 0 94757254 0 107780 0 0 72207347 126182 0 89811690 0 As you can see that, re0 have a reserve to serve more traffic! despite on reached "mistery limit" on graph. The reserve is about 20kpps and ~30MBytes/s When trying to connect to this vpnserver it says 'error 718:....' Does mpd5 have poor? performance now. last pid: 43807; load averages: 6.91, 5.84, 5.43 up 6+00:41:40 21:09:33 310 processes: 10 running, 281 sleeping, 18 waiting, 1 lock CPU 0: 31.0% user, 0.0% nice, 16.5% system, 45.5% interrupt, 7.1% idle CPU 1: 21.6% user, 0.0% nice, 7.5% system, 65.5% interrupt, 5.5% idle CPU 2: 15.3% user, 0.0% nice, 7.1% system, 71.8% interrupt, 5.9% idle CPU 3: 15.7% user, 0.0% nice, 4.7% system, 71.4% interrupt, 8.2% idle Mem: 543M Active, 2616M Inact, 318M Wired, 123M Cache, 112M Buf, 298M Free Swap: 4096M Total, 31M Used, 4064M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -72 - 0K 160K CPU0 0 43.8H 97.85% {swi1: netisr 3} 12 root -72 - 0K 160K *per-i 1 29.0H 87.01% {swi1: netisr 1} 42359 root 92 0 13704K 8356K CPU2 2 1:36 65.82% bzip2 12 root -72 - 0K 160K WAIT 3 969:10 37.79% {swi1: netisr 2} 12 root -72 - 0K 160K WAIT 2 767:04 32.71% {swi1: netisr 0} 12 root -92 - 0K 160K WAIT 0 17.9H 14.01% {irq256: re0} 11 root 155 ki31 0K 32K RUN 0 96.8H 8.59% {idle: cpu0} 11 root 155 ki31 0K 32K RUN 2 97.8H 7.23% {idle: cpu2} 11 root 155 ki31 0K 32K RUN 3 98.7H 7.18% {idle: cpu3} 11 root 155 ki31 0K 32K RUN 1 98.4H 7.03% {idle: cpu1} 13 root -16 - 0K 32K sleep 3 413:33 4.49% {ng_queue3} 13 root -16 - 0K 32K RUN 0 412:56 4.30% {ng_queue2} 13 root -16 - 0K 32K sleep 2 413:10 4.15% {ng_queue0} 13 root -16 - 0K 32K RUN 2 412:43 4.05% {ng_queue1} 5588 root 20 -20 222M 105M select 0 117:48 1.61% {mpd5} 5588 root 20 -20 222M 105M select 1 0:00 1.22% {mpd5} 32882 root 22 0 15392K 5492K select 3 314:39 1.12% snmpd 5588 root 20 -20 222M 105M select 0 0:00 1.12% {mpd5} 5588 root 20 -20 222M 105M select 1 0:00 1.12% {mpd5} 5351 root 20 0 9616K 1224K select 0 35:12 0.24% syslogd 5588 root 20 -20 222M 105M select 0 0:00 0.24% {mpd5} 5588 root 20 -20 222M 105M select 1 0:00 0.24% {mpd5} 5588 root 20 -20 222M 105M select 3 0:00 0.24% {mpd5} 5588 root 20 -20 222M 105M select 3 0:00 0.24% {mpd5} 5588 root 20 -20 222M 105M select 1 0:00 0.24% {mpd5} 5588 root 20 -20 222M 105M select 0 0:00 0.24% {mpd5} 5588 root 20 -20 222M 105M select 0 0:00 0.24% {mpd5} 5588 root 20 -20 222M 105M select 1 0:00 0.24% {mpd5} 5588 root 20 -20 222M 105M select 2 0:00 0.24% {mpd5} 5588 root 20 -20 222M 105M select 1 0:00 0.24% {mpd5} 5588 root 20 -20 222M 105M select 1 0:00 0.24% {mpd5} 5588 root 20 -20 222M 105M select 0 0:00 0.24% {mpd5} 5588 root 20 -20 222M 105M select 1 0:00 0.15% {mpd5} 5588 root 20 -20 222M 105M select 2 0:00 0.15% {mpd5} 5588 root 20 -20 222M 105M select 1 0:00 0.15% {mpd5} 5588 root 20 -20 222M 105M select 1 0:00 0.15% {mpd5} 5588 root 20 -20 222M 105M select 1 0:00 0.15% {mpd5} 5588 root 20 -20 222M 105M select 2 0:00 0.15% {mpd5} 5588 root 20 -20 222M 105M select 1 0:00 0.15% {mpd5} 5588 root 20 -20 222M 105M select 2 0:00 0.15% {mpd5} 5588 root 20 -20 222M 105M select 3 0:00 0.15% {mpd5} 5588 root 20 -20 222M 105M select 0 0:00 0.15% {mpd5} 12 root -60 - 0K 160K WAIT 1 108:19 0.10% {swi4: clock} 14567 root 20 0 9536K 736K nanslp 1 14:28 0.05% monitord 5588 root 20 -20 222M 105M select 0 0:00 0.05% {mpd5} 5588 root 20 -20 222M 105M select 0 0:00 0.05% {mpd5} 5588 root 20 -20 222M 105M select 2 0:00 0.05% {mpd5} 5588 root 20 -20 222M 105M select 3 0:00 0.05% {mpd5} 43807 cacti 72 0 12000K 2900K CPU3 3 0:00 0.00% snmpget I also ran bzip2 to see 'does the remaining CPU utilised?' http://piccy.info/view3/2445242/802b93a3db1055a20a98aed7e61dfc9a/ Yes it is utilized. We see many mpd5 threads. but mpd5 log says (only suspicious messages): Jan 5 21:00:01 flux mpd: [L98] RADIUS: rad_send_request for user '10153' failed: No valid RADIUS responses received Jan 5 21:00:01 flux mpd: [L98] ACCT: Thread finished normally Jan 5 21:00:01 flux mpd: [L194] RADIUS: Sending request for user 'aviator' Jan 5 21:00:01 flux mpd: [L194] RADIUS: rad_send_request for user 'aviator' failed: No valid RADIUS responses received Jan 5 21:00:01 flux mpd: [L194] ACCT: Thread finished normally Jan 5 21:01:01 flux mpd: EVENT: Processing event EVENT_READ LinkNgDataEvent() done Jan 5 21:01:01 flux mpd: EVENT: Processing event EVENT_TIMEOUT TimerExpires() Jan 5 21:01:01 flux mpd: EVENT: Processing timer "FsmKeepAlive" FsmEchoTimeout() Jan 5 21:01:01 flux mpd: EVENT: Processing timer "FsmKeepAlive" FsmEchoTimeout() done Jan 5 21:01:01 flux mpd: EVENT: Processing event EVENT_TIMEOUT TimerExpires() done Jan 5 21:01:01 flux mpd: EVENT: Processing event EVENT_TIMEOUT TimerExpires() Jan 5 21:01:01 flux mpd: EVENT: Processing timer "FsmKeepAlive" FsmEchoTimeout() Jan 5 21:01:01 flux mpd: EVENT: Processing timer "FsmKeepAlive" FsmEchoTimeout() done Jan 5 21:01:01 flux mpd: EVENT: Processing event EVENT_TIMEOUT TimerExpires() done Jan 5 21:01:01 flux mpd: EVENT: Processing event EVENT_TIMEOUT TimerExpires() Jan 5 21:01:44 flux mpd: [L36] LCP: phase shift TERMINATE --> ESTABLISH Jan 5 21:01:44 flux mpd: EVENT: Stopping timer "LCP" FsmTimeout() at fsm.c:190 Jan 5 21:01:44 flux mpd: [L36] LCP: LayerFinish Jan 5 21:01:44 flux mpd: EVENT: Stopping timer "AuthTimer" AuthTimeout() at auth.c:682 Jan 5 21:01:44 flux mpd: EVENT: Stopping timer "(null)" (null)() at pap.c:64 Jan 5 21:01:44 flux mpd: EVENT: Stopping timer "ChalTimer" ChapChalTimeout() at chap.c:93 Jan 5 21:01:44 flux mpd: EVENT: Stopping timer "(null)" (null)() at chap.c:94 Jan 5 21:01:44 flux mpd: EVENT: Stopping timer "(null)" (null)() at eap.c:146 Jan 5 21:01:44 flux mpd: EVENT: Stopping timer "(null)" (null)() at eap.c:147 Jan 5 21:01:44 flux mpd: EVENT: Message 2 to PhysMsg() sent Jan 5 21:01:44 flux mpd: EVENT: Processing timer "LCP" FsmTimeout() done Jan 5 21:01:44 flux mpd: EVENT: Processing event EVENT_TIMEOUT TimerExpires() done Jan 5 21:01:44 flux mpd: EVENT: Processing event EVENT_READ MsgEvent() Jan 5 21:01:44 flux mpd: EVENT: Message 2 to PhysMsg() received Jan 5 21:01:44 flux mpd: [L36] device: CLOSE event Jan 5 21:01:44 flux mpd: pptp38-0: clearing call Jan 5 21:01:44 flux mpd: pptp38: send CallDiscNotify msg Jan 5 21:01:44 flux mpd: len=0x94 msgType=1 magic=0x1a2b3c4d type=13 Jan 5 21:01:44 flux mpd: cid=0xdecd result=3 err=0 cause=0 stats="" Jan 5 21:01:44 flux mpd: pptp38: wrote 148 bytes ctrl data Jan 5 21:01:44 flux mpd: 00 94 00 01 1a 2b 3c 4d 00 0d 00 00 de cd 03 00 .....+<M........ Jan 5 21:01:44 flux mpd: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Jan 5 21:01:44 flux mpd: last message repeated 7 times Jan 5 21:01:44 flux mpd: 00 00 00 00 .... Jan 5 21:01:44 flux mpd: pptp38-0: killing channel Jan 5 21:01:44 flux mpd: [L36] PPTP call terminated Jan 5 21:01:44 flux mpd: [L36] device: DOWN event Jan 5 21:01:44 flux mpd: [L36] Link: DOWN event Jan 5 21:01:44 flux mpd: [L36] LCP: Close event Jan 5 21:01:44 flux mpd: [L36] LCP: state change Stopped --> Closed Jan 5 21:01:44 flux mpd: EVENT: Stopping timer "LCP" FsmTimeout() at fsm.c:190 #top last pid: 45804; load averages: 6.66, 6.35, 5.74 up 6+00:44:49 21:12:42 241 processes: 2 running, 239 sleeping CPU: 17.0% user, 0.0% nice, 7.3% system, 62.9% interrupt, 12.9% idle Mem: 537M Active, 2679M Inact, 320M Wired, 117M Cache, 112M Buf, 246M Free Swap: 4096M Total, 31M Used, 4064M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 42359 root 1 91 0 13704K 8356K RUN 3 3:32 62.21% bzip2 5588 root 52 20 -20 222M 106M select 0 117:53 28.61% mpd5 3641 root 1 26 0 10460K 3996K select 0 116:52 2.10% zebra 32882 root 1 21 0 15392K 5492K select 2 314:46 1.42% snmpd 14567 root 1 20 0 9536K 736K nanslp 0 14:29 0.15% monitord 45792 cacti 1 27 0 12000K 3096K select 0 0:00 0.15% snmpwalk 34587 freeradius 2 52 -20 23652K 16420K usem 0 0:12 0.10% radiusd 45730 cacti 1 20 0 12000K 3096K select 2 0:00 0.10% snmpwalk 5455 bind 7 20 0 136M 104M kqread 2 56:12 0.00% named 5351 root 1 20 0 9616K 1224K select 2 35:13 0.00% syslogd 10440 dhcpd 1 20 0 15032K 5300K select 0 4:37 0.00% dhcpd 17155 firebird 8 20 -10 88652K 24128K usem 2 4:01 0.00% fb_smp_server 3647 root 1 20 0 14660K 5196K select 2 1:57 0.00% bgpd 15683 mysql 34 20 0 407M 280M sbwait 0 1:24 0.00% mysqld 13446 postfix 1 20 0 11876K 948K kqread 2 0:12 0.00% qmgr 26875 root 1 20 0 33812K 11212K select 1 0:12 0.00% httpd 13336 root 1 20 0 11876K 664K kqread 2 0:06 0.00% master 29201 root 1 20 0 13068K 3012K select 2 0:05 0.00% sshd 15176 vmail 1 20 0 11324K 1492K kqread 0 0:04 0.00% auth 29733 root 1 22 0 9648K 544K nanslp 2 0:02 0.00% cron 11459 root 1 20 0 11136K 936K kqread 2 0:02 0.00% log 64159 postfix 1 20 0 11876K 692K kqread 0 0:02 0.00% anvil 33804 www 1 20 0 42004K 24028K lockf 2 0:01 0.00% httpd 11454 root 1 20 0 11136K 932K kqread 0 0:01 0.00% dovecot 35662 root 1 20 0 13236K 3928K select 0 0:01 0.00% mc 37631 www 1 20 0 37908K 19780K lockf 0 0:01 0.00% httpd 36086 www 1 20 0 37908K 20120K lockf 2 0:01 0.00% httpd 42143 www 1 20 0 37908K 20396K lockf 0 0:01 0.00% httpd here suspicious is state for fb_smp_server and radiusd 'usem' (use semaphores??) I restart 'radiusd' with no debug mode. Active semaphores are rised for firebird from 5 to 13 mpd (radiusd?) start to process auth and acct messages http://piccy.info/view3/2445292/150cf955f2903404dd8a3751de7f2dd1/ slow but it do that! # ipcs -s Semaphores: T ID KEY MODE OWNER GROUP s 131072 4396057 --rw-r----- firebird firebird s 458753 21173263 --rw-r----- firebird firebird s 327682 37950480 --rw-r----- firebird firebird s 262147 54727696 --rw-r----- firebird firebird s 131076 71504912 --rw-r----- firebird firebird s 131077 88282128 --rw-r----- firebird firebird s 65542 105059344 --rw-r----- firebird firebird s 65543 121836560 --rw-r----- firebird firebird s 65544 138613776 --rw-r----- firebird firebird s 65545 155390992 --rw-r----- firebird firebird s 65546 172168208 --rw-r----- firebird firebird s 65547 188945424 --rw-r----- firebird firebird s 65548 205722640 --rw-r----- firebird firebird -T Display system information about shared memory, message queues and semaphores. # ipcs -T msginfo: msgmax: 16384 (max characters in a message) msgmni: 40 (# of message queues) msgmnb: 2048 (max characters in a message queue) msgtql: 40 (max # of messages in system) msgssz: 8 (size of a message segment) msgseg: 2048 (# of message segments in system) shminfo: shmmax: 134217728 (max shared memory segment size) shmmin: 1 (min shared memory segment size) shmmni: 192 (max number of shared memory identifiers) shmseg: 128 (max shared memory segments per process) shmall: 32768 (max amount of shared memory in pages) seminfo: semmap: 256 (# of entries in semaphore map) semmni: 256 (# of semaphore identifiers) semmns: 512 (# of semaphores in system) semmnu: 256 (# of undo structures in system) semmsl: 340 (max # of semaphores per id) semopm: 100 (max # of operations per semop call) semume: 50 (max # of undo entries per process) semusz: 616 (size in bytes of undo structure) semvmx: 32767 (semaphore maximum value) semaem: 16384 (adjust on exit max value) # sysctl -a kern.ipc kern.ipc.maxsockbuf: 16777216 kern.ipc.sockbuf_waste_factor: 8 kern.ipc.somaxconn: 32768 kern.ipc.max_linkhdr: 16 kern.ipc.max_protohdr: 60 kern.ipc.max_hdr: 76 kern.ipc.max_datalen: 124 kern.ipc.nmbjumbo16: 3200 kern.ipc.nmbjumbo9: 6400 kern.ipc.nmbjumbop: 12800 kern.ipc.nmbclusters: 262144 kern.ipc.piperesizeallowed: 1 kern.ipc.piperesizefail: 0 kern.ipc.pipeallocfail: 0 kern.ipc.pipefragretry: 0 kern.ipc.pipekva: 2097152 kern.ipc.maxpipekva: 16810176 kern.ipc.msgseg: 2048 kern.ipc.msgssz: 8 kern.ipc.msgtql: 40 kern.ipc.msgmnb: 2048 kern.ipc.msgmni: 40 kern.ipc.msgmax: 16384 kern.ipc.semaem: 16384 kern.ipc.semvmx: 32767 kern.ipc.semusz: 616 kern.ipc.semume: 50 kern.ipc.semopm: 100 kern.ipc.semmsl: 340 kern.ipc.semmnu: 256 kern.ipc.semmns: 512 kern.ipc.semmni: 256 kern.ipc.semmap: 256 kern.ipc.shm_allow_removed: 0 kern.ipc.shm_use_phys: 0 kern.ipc.shmall: 32768 kern.ipc.shmseg: 128 kern.ipc.shmmni: 192 kern.ipc.shmmin: 1 kern.ipc.shmmax: 134217728 kern.ipc.maxsockets: 204800 kern.ipc.numopensockets: 1150 kern.ipc.nsfbufsused: 1 kern.ipc.nsfbufspeak: 80 kern.ipc.nsfbufs: 6656 #sysctl -a net.graph net.graph.msg_version: 8 net.graph.abi_version: 12 net.graph.maxdata: 32768 net.graph.maxalloc: 32768 net.graph.threads: 4 net.graph.control.proto: 2 net.graph.data.proto: 1 net.graph.family: 32 net.graph.recvspace: 524288 net.graph.maxdgram: 262144 # vmstat -z|egrep 'ITEM|NetGraph' ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP NetGraph items: 36, 32804, 71, 755,8711030020, 0, 0 NetGraph data items: 36, 32804, 0, 1003,16369746867, 0, 0 '# vmstat -z' there are FAIL only for 128 Bucket: 524, 0, 540, 223, 296830,75013, 0 Now (21hour 45min) state of system is: http://piccy.info/view3/2445350/6b7c5bfffc977ab5a1fe934172aa33fd/ as you can see now is *less* kernel interrupts on graph is *more* interrupts from re0 that (see below) is *more* pps on re0 interface (see below) as it were when this situation of overload have occurred!! So (maybe??) is this problem with no 're' interface??? #systat -v 2 users Load 6.01 5.75 5.72 Jan 5 21:46 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 961448 13896 3541824 41356 180912 count All 1247360 22052 5889300 103760 pages Proc: Interrupts r p d s w Csw Trp Sys Int Sof Flt 713 cow 39492 total 7 1 236 166k 7248 45k 23k 20k 2994 1031 zfod 26 ata0 14 112 ozfod 2 ehci0 16 13.0%Sys 62.7%Intr 4.1%User 0.0%Nice 20.2%Idle 10%ozfod 2 ehci1 23 | | | | | | | | | | | daefr 4121 cpu0:timer ======++++++++++++++++++++++++++++++++>> 1087 prcfr 23015 re0 256 111 dtbuf 3213 totfr 4110 cpu1:timer Namei Name-cache Dir-cache 142271 desvn react 4106 cpu3:timer Calls hits % hits % 47569 numvn pdwak 4110 cpu2:timer 18464 18449 100 34142 frevn pdpgs intrn Disks ad0 da0 pass0 330212 wire KB/t 44.00 0.00 0.00 748860 act tps 25 0 0 2731784 inact MB/s 1.08 0.00 0.00 159568 cache %busy 1 0 0 21024 free 114976 buf # netstat -W 1 re0 input (Total) output packets errs idrops bytes packets errs bytes colls 137567 0 0 97669861 156923 0 114329168 0 153711 0 0 108860571 178608 0 129023839 0 132672 0 0 92702405 159798 0 113270233 0 141540 0 0 99766858 169112 0 120816262 0 143670 0 0 101820412 173753 0 124918609 0 138409 0 0 97944357 162214 0 117144054 0 139900 0 0 99422132 163406 0 118104685 0 151955 0 0 108602727 180131 0 131715416 0 138995 0 0 99993704 167247 0 121914361 0 144477 0 0 104308909 171671 0 125640285 0 136133 0 0 95776095 163066 0 117071802 0 135196 0 0 95555349 163899 0 118434548 0 135067 0 0 96838717 163829 0 119398350 0 138786 0 0 100931422 162668 0 119753322 0 136027 0 0 95524321 162876 0 117032548 0 and now 22:00 (NOTICE: significant release of CPU utilisation) there is more less kernel interrupts! but no significant changes in traffic flow: same interrupts and same pps and same Mbp/s... http://piccy.info/view3/2445486/f3f0a124d8f080ddd181e5db8f24dd82/orig/ and vpn online users http://piccy.info/view3/2445500/b3812ae75e8dcf7c3778bf1330c055c2/ 2 users Load 3.89 4.67 5.04 Jan 5 22:03 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 967716 14036 3339656 41580 227852 count 30 All 1245848 26492 5715024 119292 pages 31 Proc: Interrupts r p d s w Csw Trp Sys Int Sof Flt 1892 cow 38624 total 2 235 201k 5722 10k 22k 32k 4020 649 zfod 50 ata0 14 130 ozfod 1 ehci0 16 17.2%Sys 42.5%Intr 1.3%User 0.0%Nice 39.0%Idle 20%ozfod 3 ehci1 23 | | | | | | | | | | | daefr 4125 cpu0:timer =========+++++++++++++++++++++> 967 prcfr 22069 re0 256 152 dtbuf 5557 totfr 4125 cpu1:timer Namei Name-cache Dir-cache 142271 desvn react 4126 cpu3:timer Calls hits % hits % 47597 numvn pdwak 4125 cpu2:timer 3489 3483 100 34132 frevn pdpgs intrn Disks ad0 da0 pass0 304952 wire KB/t 16.74 0.00 0.00 688352 act tps 50 0 0 2770104 inact MB/s 0.82 0.00 0.00 111144 cache %busy 4 0 0 116904 free 114976 buf # netstat -W 1 re0 input (Total) output packets errs idrops bytes packets errs bytes colls 127166 0 0 88929973 143992 0 106883020 0 138535 0 0 98464838 158725 0 119737496 0 133694 0 0 95349794 157750 0 117821954 0 129177 0 0 90732865 146634 0 108781223 0 143138 0 0 101357842 162761 0 119955682 0 142492 0 0 102453893 166022 0 124863890 0 132476 0 0 93756889 149644 0 113221417 0 136080 0 0 96119177 154514 0 116135341 0 143103 0 0 100681447 160290 0 119572568 0 124777 0 0 86943492 140187 0 102704963 0 131743 0 0 92982527 152259 0 113233013 0 NOTICE again: pps and interrupts now more than were when overload have occured! I think this issue may be related to next subsystems: 1. netgraph 2. netisr # sysctl net.isr net.isr.numthreads: 4 net.isr.maxprot: 16 net.isr.defaultqlimit: 256 net.isr.maxqlimit: 10240 net.isr.bindthreads: 0 net.isr.maxthreads: 4 net.isr.direct: 0 net.isr.direct_force: 0 # uname -a FreeBSD 9.0-CURRENT FreeBSD 9.0-CURRENT #4: Fri Jun 10 01:30:12 UTC 2011 @:/usr/obj/usr/src/sys/PAE_KES i386 PS. While reading notice system time in output -- С уважением, Коньков mailto:kes-kes@yandex.ru
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1863129064.20120105223118>