Date: Mon, 19 Nov 2007 15:08:32 GMT From: Dennis Yusupoff <dyr@homelink.ru> To: freebsd-gnats-submit@FreeBSD.org Subject: misc/118128: Dummynet cause kernel trap or system freeze Message-ID: <200711191508.lAJF8WA1090821@www.freebsd.org> Resent-Message-ID: <200711191510.lAJFA1oA053778@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 118128 >Category: misc >Synopsis: Dummynet cause kernel trap or system freeze >Confidential: no >Severity: critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Nov 19 15:10:01 UTC 2007 >Closed-Date: >Last-Modified: >Originator: Dennis Yusupoff >Release: 7.0-BETA2 >Organization: Ozerki.Net ISP >Environment: FreeBSD router-nat1.ozerki.net 7.0-BETA2 FreeBSD 7.0-BETA2 #0: Thu Nov 15 14:02:32 UTC 2007 root@:/usr/obj/usr/src/sys/ROUTER-NAT1 i386 >Description: There are two identical routers, which route traffic of "public" and "private" ip(NAT via ng_nat) addresses into Internet, and shape it (via dummynet). For traffic accounting they use ng_netflow. Traffic is about 50-120 Mbit/sec in and same out, Packets rate about 30 pps: === router-nat1# netstat -w 1 input (Total) output packets errs bytes packets errs bytes colls 28649 0 19394537 28441 0 19252958 0 28176 0 18598912 28016 0 18511845 0 28737 0 19055484 28297 0 18909975 0 27841 0 18981432 27708 0 18938807 0 28269 0 19045253 28021 0 18840864 0 27787 0 19299610 27535 0 19181573 0 27556 0 18373238 27499 0 18447451 0 27411 0 18515690 27299 0 18514113 0 28260 0 19817378 27946 0 19584214 0 28063 0 18933588 27653 0 18776208 0 29126 0 20290956 28793 0 20033618 0 27964 0 19401665 27795 0 19382463 0 27939 0 18776063 27634 0 18615120 0 29226 0 20208282 28788 0 19890549 0 28020 0 19202448 27865 0 19191244 0 27303 0 18666019 26909 0 18421229 0 27895 0 19033813 27895 0 19083944 0 27475 0 18298819 27101 0 18065517 0 27040 0 18467203 26809 0 18367452 0 28198 0 18950534 28080 0 19014581 0 === The problem: Sometimes, "top -S" shows 100% load of dummynet process, which cause, from time to time: - Network activity freeze - System freeze - Kernel trap Than network activity freezes, no packets (incl. ARP) transmitted or recieved at interfaces at all, netstat -w 1 shows: === input (Total) output packets errs bytes packets errs bytes colls 0 805 0 1 0 42 0 0 863 0 0 0 0 0 input (Total) output packets errs bytes packets errs bytes colls 0 741 0 0 0 0 0 0 785 0 1 0 42 0 0 739 0 0 0 0 0 0 741 0 0 0 0 0 === ps -awwux (head part) === USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 11 100.0 0.0 0 8 ?? RL 10:10AM 212:53.29 [idle: cpu0] root 27 99.7 0.0 0 8 ?? RL 10:10AM 159:41.81 [dummynet] root 12 0.7 0.0 0 8 ?? WL 10:10AM 5:58.27 [swi4: clock] root 0 0.0 0.0 0 0 ?? WLs 10:10AM 0:00.01 [swapper] root 1 0.0 0.0 1888 436 ?? ILs 10:10AM 0:00.12 /sbin/init -- root 2 0.0 0.0 0 8 ?? DL 10:10AM 0:00.57 [g_event] root 3 0.0 0.0 0 8 ?? DL 10:10AM 0:01.88 [g_up] root 4 0.0 0.0 0 8 ?? DL 10:10AM 0:01.76 [g_down] root 5 0.0 0.0 0 8 ?? DL 10:10AM 0:00.00 [acpi_task_0] root 6 0.0 0.0 0 8 ?? DL 10:10AM 0:00.00 [acpi_task_1] root 7 0.0 0.0 0 8 ?? DL 10:10AM 0:00.00 [acpi_task_2] root 8 0.0 0.0 0 8 ?? DL 10:10AM 0:00.00 [kqueue taskq] root 9 0.0 0.0 0 8 ?? DL 10:10AM 0:00.00 [thread taskq] root 10 0.0 0.0 0 8 ?? RL 10:10AM 132:25.53 [idle: cpu1] root 13 0.0 0.0 0 8 ?? WL 10:10AM 0:00.00 [swi3: vm] root 14 0.0 0.0 0 8 ?? WL 10:10AM 97:16.30 [swi1: net] root 15 0.0 0.0 0 8 ?? DL 10:10AM 0:47.66 [yarrow] root 16 0.0 0.0 0 8 ?? WL 10:10AM 0:00.00 [swi6: task queue] root 17 0.0 0.0 0 8 ?? WL 10:10AM 0:00.00 [swi5: +] root 18 0.0 0.0 0 8 ?? WL 10:10AM 0:00.00 [swi6: Giant taskq] root 19 0.0 0.0 0 8 ?? WL 10:10AM 0:00.00 [irq9: acpi0] root 20 0.0 0.0 0 8 ?? RL 10:10AM 72:52.05 [irq256: bge0] root 21 0.0 0.0 0 8 ?? WL 10:10AM 26:57.17 [irq257: bge1] root 22 0.0 0.0 0 8 ?? WL 10:10AM 0:00.00 [irq14: ata0] root 23 0.0 0.0 0 8 ?? WL 10:10AM 0:00.00 [irq15: ata1] root 24 0.0 0.0 0 8 ?? WL 10:10AM 0:01.51 [irq18: atapci1] root 25 0.0 0.0 0 8 ?? WL 10:10AM 0:00.49 [irq1: atkbd0] === top -SH (head) === last pid: 17107; load averages: 1.00, 1.03, 1.05 up 0+06:18:02 16:28:28 79 processes: 5 running, 58 sleeping, 13 waiting, 3 lock Mem: 15M Active, 16M Inact, 106M Wired, 228K Cache, 112M Buf, 863M Free Swap: 1024M Total, 1024M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 171 ki31 0K 8K RUN 0 211:26 100.00% idle: cpu0 27 root -68 - 0K 8K CPU1 1 159:42 100.00% dummynet 12 root -32 - 0K 8K WAIT 0 5:57 0.68% swi4: clock 25 root -60 - 0K 8K WAIT 0 0:00 0.20% irq1: atkbd0 10 root 171 ki31 0K 8K RUN 1 132:26 0.00% idle: cpu1 14 root -44 - 0K 8K WAIT 0 97:16 0.00% swi1: net 20 root -68 - 0K 8K RUN 0 72:52 0.00% irq256: bge0 21 root -68 - 0K 8K WAIT 0 26:57 0.00% irq257: bge1 === ipfw show: === 00010 56 2800 allow ip from any to any via lo* 00020 848615 1083923444 allow ip from 192.168.254.0/24 to 192.168.254.0/24 00040 0 0 deny ip from 127.0.0.0/8,192.168.254.0/24 to any 00050 7449 582742 deny ip from any to any dst-port 135,137,138,139,145,445 00055 3861 193044 count log logamount 29999 tcp from any to any dst-port 25 setup in via bge1 00060 0 0 allow tcp from 85.249.167.249,85.249.167.6,85.249.175.6,85.249.162.81,85.249.162.75,85.249.160.7 0,85.249.75.16/28,85.249.170.17,85.249.171.26,80.93.176.45,10.255.255.18 to me dst-port 26 in 00060 0 0 allow tcp from me 26 to 85.249.167.249,85.249.167.6,85.249.175.6,85.249.162.81,85.249.162.75,85. 249.160.70,85.249.75.16/28,85.249.170.17,85.249.171.26,80.93.176.45,10.255.255.18 out 00070 7300 474890 allow ip from any to me dst-port 53 in 00070 6518 280649 allow ip from me 53 to any out 00080 0 0 allow tcp from 192.168.0.0/16 to me dst-port 1723 in 00080 0 0 allow tcp from me 1723 to 192.168.0.0/16 out 00085 0 0 allow gre from 192.168.0.0/16 to me in 00085 0 0 allow gre from me to 192.168.0.0/16 out 00090 546948646 404049496275 skipto 100 ip from table(1) to any 00095 1084064 59330703 deny ip from 192.168.0.0/16 to not me in 00251 61016102 39527922688 netgraph 251 ip from any to me via bge1 in // Inside NAT 00271 551973919 332595728463 netgraph 271 ip from any to any via bge1 // Netflow 00350 0 0 deny icmp from any to any iplen 1100-1400 00355 128 13785 allow icmp from any to me icmptypes 0,3,4,8,11 00360 2050 121213 allow icmp from me to any icmptypes 0,3,4,8,11 00370 48 3029 allow ip from me to any dst-port 53 00370 39 2837 allow ip from any 53 to me 00375 968 38720 allow ip from me to any dst-port 25 00375 1116 68332 allow ip from any 25 to me 00390 1210379 78191102 deny ip from any to me in 00400 18945 3676920 pipe 1000 ip from any to table(100) via bge1 00400 7016 1032471 pipe 2000 ip from table(100) to any via bge1 00400 0 0 pipe 1001 ip from any to table(101) via bge1 00400 0 0 pipe 2001 ip from table(101) to any via bge1 00400 23176 3733337 pipe 1002 ip from any to table(102) via bge1 00400 12755 9606631 pipe 2002 ip from table(102) to any via bge1 00400 5408 1767725 pipe 1003 ip from any to table(103) via bge1 00400 3726 1833816 pipe 2003 ip from table(103) to any via bge1 00400 52786 10847967 pipe 1004 ip from any to table(104) via bge1 00400 42214 21943239 pipe 2004 ip from table(104) to any via bge1 00400 26945 6792557 pipe 1005 ip from any to table(105) via bge1 00400 17318 8998862 pipe 2005 ip from table(105) to any via bge1 00400 0 0 pipe 1006 ip from any to table(106) via bge1 00400 0 0 pipe 2006 ip from table(106) to any via bge1 00400 52696 10276819 pipe 1007 ip from any to table(107) via bge1 00400 45249 22189681 pipe 2007 ip from table(107) to any via bge1 00400 1516 92256 pipe 1008 ip from any to table(108) via bge1 00400 534 32007 pipe 2008 ip from table(108) to any via bge1 00400 392 19174 pipe 1009 ip from any to table(109) via bge1 00400 145 21261 pipe 2009 ip from table(109) to any via bge1 00400 0 0 pipe 1010 ip from any to table(110) via bge1 00400 0 0 pipe 2010 ip from table(110) to any via bge1 00400 29916 7665658 pipe 1011 ip from any to table(111) via bge1 00400 28121 12202299 pipe 2011 ip from table(111) to any via bge1 00400 5685 1821467 pipe 1012 ip from any to table(112) via bge1 00400 12274 1893996 pipe 2012 ip from table(112) to any via bge1 00400 0 0 pipe 1013 ip from any to table(113) via bge1 00400 0 0 pipe 2013 ip from table(113) to any via bge1 00400 7846 2683721 pipe 1014 ip from any to table(114) via bge1 00400 11550 2461408 pipe 2014 ip from table(114) to any via bge1 00400 5178 992091 pipe 1015 ip from any to table(115) via bge1 00400 5128 982621 pipe 2015 ip from table(115) to any via bge1 00400 3409 1046512 pipe 1016 ip from any to table(116) via bge1 00400 3491 278703 pipe 2016 ip from table(116) to any via bge1 00400 606 209521 pipe 1017 ip from any to table(117) via bge1 00400 1156 80156 pipe 2017 ip from table(117) to any via bge1 00400 161 7984 pipe 1018 ip from any to table(118) via bge1 00400 277 24665 pipe 2018 ip from table(118) to any via bge1 00400 110 5561 pipe 1019 ip from any to table(119) via bge1 00400 0 0 pipe 2019 ip from table(119) to any via bge1 00501 67966152 54550611510 netgraph 501 ip from 192.168.0.0/17 to any via bge1 out // Outside NAT 00700 67963229 54550620016 allow ip from me to any out via bge1 00710 0 0 allow ip from any to 172.16.0.0/17 in via bge1 00730 0 0 allow ip from any to 172.16.0.0/17 out 00999 476270755 347321385488 allow ip from table(1) to any 00999 549654075 257173525109 allow ip from any to table(1) 65530 151626 10726463 deny ip from any to any 65535 0 0 allow ip from any to any === Sometimes it could looking like this "/var/log/all.log": === ... Nov 19 14:30:01 router-nat1 /usr/sbin/cron[99868]: (root) CMD (/usr/libexec/atrun) Nov 19 14:36:21 router-nat1 syslogd: restart Nov 19 14:36:21 router-nat1 syslogd: kernel boot file is /boot/kernel/kernel Nov 19 14:36:21 router-nat1 kernel: ipfw: ouch!, skip past end of rules, denying packet Nov 19 14:36:21 router-nat1 last message repeated 13 times Nov 19 14:36:21 router-nat1 kernel: Nov 19 14:36:21 router-nat1 kernel: Nov 19 14:36:21 router-nat1 kernel: Fatal trap 12: page fault while in kernel mode Nov 19 14:36:21 router-nat1 kernel: cpuid = 0; apic id = 00 Nov 19 14:36:21 router-nat1 kernel: fault virtual address = 0xf Nov 19 14:36:21 router-nat1 kernel: fault code = supervisor read, page not present Nov 19 14:36:21 router-nat1 kernel: instruction pointer = 0x20:0xc05cf04f Nov 19 14:36:21 router-nat1 kernel: stack pointer = 0x28:0xe3d667fc Nov 19 14:36:21 router-nat1 kernel: frame pointer = 0x28:0xe3d66a9c Nov 19 14:36:21 router-nat1 kernel: code segment = base 0x0, limit 0xfffff, type 0x1b Nov 19 14:36:21 router-nat1 kernel: = DPL 0, pres 1, def32 1, gran 1 Nov 19 14:36:21 router-nat1 kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Nov 19 14:36:21 router-nat1 kernel: current process = 27 (dummynet) Nov 19 14:36:21 router-nat1 kernel: trap number = 12 Nov 19 14:36:21 router-nat1 kernel: panic: page fault Nov 19 14:36:21 router-nat1 kernel: cpuid = 0 Nov 19 14:36:21 router-nat1 kernel: Uptime: 6h21m32s Nov 19 14:36:21 router-nat1 kernel: Physical memory: 1019 MB Nov 19 14:36:21 router-nat1 kernel: Dumping 82 MB: Nov 19 14:36:21 router-nat1 kernel: ipfw: 55 Count TCP 216.64.55.194:35266 85.249.174.7:25 in via bge1 Nov 19 14:36:21 router-nat1 kernel: ipfw: 55 Count TCP 83.25.100.191:54690 85.249.169.24:25 in via bge1 Nov 19 14:36:21 router-nat1 kernel: ipfw: 55 Count TCP 70.77.152.190:61067 85.249.166.101:25 in via bge1 Nov 19 14:36:21 router-nat1 kernel: ipfw: 55 Count TCP 70.77.152.190:61067 85.249.166.101:25 in via bge1 Nov 19 14:36:21 router-nat1 kernel: ipfw: 55 Count TCP 196.206.211.4:3795 85.249.166.101:25 in via bge1 Nov 19 14:36:21 router-nat1 last message repeated 2 times .. === CONFIGS: /etc/make.conf === NO_SENDMAIL= true # do not build sendmail and related programs NO_ATM= true NO_GAMES= true # do not build games (games/ subdir) NO_INET6= true # do not build IPv6 related programs and libraries NO_BIND9= true NO_PROFILE= true # Avoid compiling profiled libraries MAKE_IDEA= YES # IDEA (128 bit symmetric encryption) BOOTWAIT= 2 DOC_LANG= en_US.ISO8859-1 ru_RU.KOI8-R MAKE_KERBEROS5= no ENABLE_SUID_K5SU= no CFLAGS=-O2 -pipe #COPTFLAGS=-O -pipe CPUTYPE?=core2 WITH_OPENSSL_PORT=yes CVS_UPDATE= true PERL_VER=5.8.8 PERL_VERSION=5.8.8 WITHOUT_X11=yes NO_GUI="true" NO_X="true" X11BASE=${LOCALBASE} WITHOUT_MODULES= 3dfx 3dfx_linux aac aac_linux amr amr_linux apm apm_saver atapifd ataraid bktr bktr_mem bridgestp i915 wlan fireware sound digi hptmv ath if_bridge if_gif if_gre if_ndis if_tap === /usr/src/sys/i386/conf/ROUTER-NAT1 === router-nat1# grep -v ^# /root/kernel/ROUTER-NAT1 | grep -v ^$ machine i386 cpu I686_CPU ident ROUTER-NAT1 maxusers 0 options SMP # Symmetric MultiProcessor Kernel device apic # I/O APIC options SCHED_ULE options PREEMPTION # Enable kernel thread preemption options IPI_PREEMPTION options UFS_GJOURNAL options INET # InterNETworking options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_DIRHASH # Improve performance on big directories options MD_ROOT # MD is a potential root device options MSDOSFS # MSDOS Filesystem options CD9660 # ISO 9660 Filesystem options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS # Pseudo-filesystem framework options GEOM_PART_GPT # GUID Partition Tables. options GEOM_LABEL # Provides labelization options COMPAT_43TTY # BSD 4.3 TTY compat [KEEP THIS!] options KTRACE # ktrace(1) support options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KBD_INSTALL_CDEV # install a CDEV entry in /dev # output. Adds ~128k to driver. # output. Adds ~215k to driver. options ADAPTIVE_GIANT # Giant mutex is adaptive. options STOP_NMI # Stop CPUS using NMI instead of IPI options VESA options SC_PIXEL_MODE options SC_DISABLE_REBOOT options SC_HISTORY_SIZE=2000 device cpufreq device pci device ata device atadisk # ATA disk drives options ATA_STATIC_ID # Static device numbering device atkbdc # AT keyboard controller device atkbd # AT keyboard device vga # VGA video card driver device sc device pmtimer # Adjust system timer at wakeup time device miibus # MII bus support device bge # Broadcom BCM570xx Gigabit Ethernet device loop # Network loopback device random # Entropy device device ether # Ethernet support device pty # Pseudo-ttys (telnet etc) device md # Memory "disks" device vlan device coretemp device bpf # Berkeley packet filter options IPFIREWALL options IPFIREWALL_VERBOSE options IPFIREWALL_VERBOSE_LIMIT=400 options IPFIREWALL_DEFAULT_TO_ACCEPT options IPFIREWALL_FORWARD #enable transparent proxy support options IPFIREWALL_NAT options IPDIVERT options IPSTEALTH options DUMMYNET options LIBALIAS device pf device pflog device pfsync device carp options DEVICE_POLLING options HZ=1000 === /etc/sysctl.conf: === router-nat1# cat /etc/sysctl.conf |grep -v ^# |grep -v ^$ net.link.ether.inet.log_arp_wrong_iface=0 # For clear console net.inet.carp.preempt=1 net.inet.carp.arpbalance=1 net.inet.carp.log=2 kern.maxfiles=20480 # For PPP kern.ipc.somaxconn=1024 net.inet.udp.maxdgram=57344 kern.ipc.maxsockbuf=2097152 net.inet.tcp.recvspace=65228 net.inet.tcp.sendspace=65228 net.inet.tcp.drop_synfin=1 net.inet.tcp.syncookies=1 net.inet.tcp.delayed_ack=0 net.inet.ip.fastforwarding=1 net.inet.ip.fw.debug=0 net.inet.ip.fw.dyn_max=16384 net.inet.ip.intr_queue_maxlen=3000 net.inet.ip.redirect=0 net.inet.ip.random_id=1 router-nat1# === sysctl (dummynet) === router-nat1# sysctl net.inet.ip.dummynet net.inet.ip.dummynet.debug: 0 net.inet.ip.dummynet.tick_lost: 0 net.inet.ip.dummynet.tick_diff: 90516 net.inet.ip.dummynet.tick_adjustment: 60508 net.inet.ip.dummynet.tick_delta_sum: 431 net.inet.ip.dummynet.tick_delta: 1 net.inet.ip.dummynet.red_max_pkt_size: 1500 net.inet.ip.dummynet.red_avg_pkt_size: 512 net.inet.ip.dummynet.red_lookup_depth: 256 net.inet.ip.dummynet.max_chain_len: 16 net.inet.ip.dummynet.expire: 1 net.inet.ip.dummynet.search_steps: 31451018 net.inet.ip.dummynet.searches: 30779341 net.inet.ip.dummynet.extract_heap: 0 net.inet.ip.dummynet.ready_heap: 176 net.inet.ip.dummynet.curr_time: 1912510 net.inet.ip.dummynet.hash_size: 64 === >How-To-Repeat: Wait for a no more then 12 hours and it happens =( >Fix: No idea >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200711191508.lAJF8WA1090821>