Date: Thu, 10 Apr 2014 19:17:52 -0700 From: hiren panchasara <hiren.panchasara@gmail.com> To: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: netisr observations Message-ID: <CALCpEUHhUkZ9b=2ynaN5-MkxOObs%2BO4RTsUhmhcMeC-WDnAxKg@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
(Note: This may seem more like a rant than an actual problem report.) I am on a stable-10ish box with igb0. Workload is mainly inbound nfs traffic. About 2K connections at any point in time. device igb # Intel PRO/1000 PCIE Server Gigabit Family hw.igb.rxd: 4096 hw.igb.txd: 4096 hw.igb.enable_aim: 1 hw.igb.enable_msix: 1 hw.igb.max_interrupt_rate: 32768 hw.igb.buf_ring_size: 4096 hw.igb.header_split: 0 hw.igb.num_queues: 0 hw.igb.rx_process_limit: 100 dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.4.0 dev.igb.0.%driver: igb dev.igb.0.%location: slot=0 function=0 dev.igb.0.%pnpinfo: vendor=0x8086 device=0x10c9 subvendor=0x103c subdevice=0x323f class=0x020000 -bash-4.2$ netstat -I igb0 -i 1 input igb0 output packets errs idrops bytes packets errs bytes colls 18332 0 0 19096474 22946 0 18211000 0 19074 0 0 11408912 28280 0 29741195 0 15753 0 0 15499238 21234 0 16779695 0 12914 0 0 9583719 17945 0 14599603 0 13677 0 0 10818359 19050 0 15069889 0 -bash-4.2$ sysctl net.isr net.isr.dispatch: direct net.isr.maxthreads: 8 net.isr.bindthreads: 0 net.isr.maxqlimit: 10240 net.isr.defaultqlimit: 256 net.isr.maxprot: 16 net.isr.numthreads: 8 -bash-4.2$ sysctl -a | grep igb.0 | grep rx_bytes dev.igb.0.queue0.rx_bytes: 65473003127 dev.igb.0.queue1.rx_bytes: 73982776038 dev.igb.0.queue2.rx_bytes: 57669494795 dev.igb.0.queue3.rx_bytes: 57830053867 dev.igb.0.queue4.rx_bytes: 75087429774 dev.igb.0.queue5.rx_bytes: 69252615374 dev.igb.0.queue6.rx_bytes: 70565370833 dev.igb.0.queue7.rx_bytes: 90210083223 I am seeing something interesting in "top": PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root 68 -72 - 0K 1088K WAIT 0 279:36 65.77% intr I see "intr" in on of top 3 slots almost all the time. turning on -H (thread view) shows me: 12 root -72 - 0K 1088K WAIT 2 69:04 20.36% intr{swi1: netisr 3} (Does this mean netisr has swi (software interrupt) on cpu3?) also, I see this process jumping to all different CPUs (so its not sticking to 1 cpu) -bash-4.2$ vmstat -i interrupt total rate irq4: uart0 1538 0 cpu0:timer 23865486 1108 irq256: igb0:que 0 46111948 2140 irq257: igb0:que 1 49820986 2313 irq258: igb0:que 2 41914519 1945 irq259: igb0:que 3 40926921 1900 irq260: igb0:que 4 49549124 2300 irq261: igb0:que 5 47066777 2185 irq262: igb0:que 6 50945395 2365 irq263: igb0:que 7 47147662 2188 irq264: igb0:link 2 0 irq274: ahci0:ch0 196869 9 cpu1:timer 23866170 1108 cpu10:timer 23805794 1105 cpu4:timer 23870757 1108 cpu11:timer 23806733 1105 cpu13:timer 23806644 1105 cpu2:timer 23858811 1107 cpu3:timer 23862250 1107 cpu15:timer 23805634 1105 cpu7:timer 23863865 1107 cpu9:timer 23810503 1105 cpu5:timer 23864136 1107 cpu12:timer 23808397 1105 cpu8:timer 23806059 1105 cpu6:timer 23874612 1108 cpu14:timer 23807698 1105 Total 755065290 35055 So, i seems all queues are being used uniformly. -bash-4.2$ netstat -Q Configuration: Setting Current Limit Thread count 8 8 Default queue limit 256 10240 Dispatch policy direct n/a Threads bound to CPUs disabled n/a Protocols: Name Proto QLimit Policy Dispatch Flags ip 1 1024 flow default --- igmp 2 256 source default --- rtsock 3 256 source default --- arp 7 256 source default --- ether 9 256 source direct --- ip6 10 256 flow default --- But the *interesting* part from it: -bash-4.2$ netstat -Q | grep "ip " (looking at just ip in workstreams) Workstreams: WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled 0 0 ip 0 0 73815267 0 0 0 73815267 1 1 ip 0 0 68975084 0 0 0 68975084 2 2 ip 0 0 48943960 0 0 0 48943960 3 3 ip 0 67 59306618 0 0 203888563 263168729 4 4 ip 0 0 77025108 0 0 0 77025108 5 5 ip 0 0 58537310 0 0 0 58537310 6 6 ip 0 0 81896427 0 0 0 81896427 7 7 ip 0 0 69535857 0 0 0 69535857 So, looks like only cpu3 is doing all the queuing. But it doesn't look like it's getting hammered or anything: last pid: 75181; load averages: 27.81, 27.08, 26.93 up 0+06:12:37 19:04:33 508 processes: 23 running, 476 sleeping, 1 waiting, 8 lock CPU 0: 71.8% user, 0.0% nice, 13.7% system, 14.5% interrupt, 0.0% idle CPU 1: 80.9% user, 0.0% nice, 14.5% system, 4.6% interrupt, 0.0% idle CPU 2: 77.1% user, 0.0% nice, 17.6% system, 5.3% interrupt, 0.0% idle CPU 3: 88.5% user, 0.0% nice, 9.2% system, 2.3% interrupt, 0.0% idle CPU 4: 80.2% user, 0.0% nice, 14.5% system, 5.3% interrupt, 0.0% idle CPU 5: 79.4% user, 0.0% nice, 16.8% system, 3.1% interrupt, 0.8% idle CPU 6: 83.2% user, 0.0% nice, 11.5% system, 4.6% interrupt, 0.8% idle CPU 7: 68.7% user, 0.0% nice, 18.3% system, 13.0% interrupt, 0.0% idle CPU 8: 88.5% user, 0.0% nice, 11.5% system, 0.0% interrupt, 0.0% idle CPU 9: 87.8% user, 0.0% nice, 10.7% system, 0.0% interrupt, 1.5% idle CPU 10: 87.0% user, 0.0% nice, 10.7% system, 2.3% interrupt, 0.0% idle CPU 11: 80.9% user, 0.0% nice, 16.8% system, 2.3% interrupt, 0.0% idle CPU 12: 86.3% user, 0.0% nice, 11.5% system, 2.3% interrupt, 0.0% idle CPU 13: 84.7% user, 0.0% nice, 14.5% system, 0.8% interrupt, 0.0% idle CPU 14: 87.0% user, 0.0% nice, 12.2% system, 0.8% interrupt, 0.0% idle CPU 15: 87.8% user, 0.0% nice, 9.9% system, 2.3% interrupt, 0.0% idle Mem: 17G Active, 47G Inact, 3712M Wired, 674M Cache, 1655M Buf, 1300M Free Swap: 8192M Total, 638M Used, 7554M Free, 7% Inuse, 4K In My conclusion after lookinag it for a bunch of times that all CPUs are equally doing work (if we believe top -P stats) Finally, the question: why is cpu3 doing all the queuing. and what does that actually mean? Can I improve performance OR reduce cpu load any other way? Should I change anything in my netisr settings? cheers, Hiren
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CALCpEUHhUkZ9b=2ynaN5-MkxOObs%2BO4RTsUhmhcMeC-WDnAxKg>