Date: Fri, 15 Jun 2007 07:40:14 GMT From: Cristian KLEIN <cristi@net.utcluj.ro> To: freebsd-net@FreeBSD.org Subject: Re: kern/113548: [dummynet] [patch] system hangs with dummynet queues Message-ID: <200706150740.l5F7eEl1077089@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/113548; it has been noted by GNATS. From: Cristian KLEIN <cristi@net.utcluj.ro> To: Alexey Illarionov <littlesavage@orionet.ru> Cc: bug-followup@FreeBSD.org Subject: Re: kern/113548: [dummynet] [patch] system hangs with dummynet queues Date: Fri, 15 Jun 2007 10:30:43 +0300 Alexey Illarionov wrote: > Cristian KLEIN wrote: > >> I think the problem occurs because you use ipfw tags. As far as I know, >> ipfw tags are stored as mbuf_tags(9). Dummynet uses mbuf tags too to >> mark it's own packets. However, I suspect that in dn_tag_get(), dummynet >> incorrectly assumes it is the only one using mbuf_tags(9). > >> Could you please apply the following patch? Also, could you test whether >> removing "tag 1" from ipfw rules has any impact? > > Thanks for a fast reply and for the patch. It seems that panics have > really been caused by ipfw tags. When I apply this patch, there were no > panics for several days, but I have got the following dump today: > > kgdb: kvm_nlist(_stopped_cpus): > kgdb: kvm_nlist(_stoppcbs): > [GDB will not be able to debug user-mode threads: > /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i386-marcel-freebsd". > > Unread portion of the kernel message buffer: > > > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0xec221d87 > fault code = supervisor read, page not present > instruction pointer = 0x20:0xc05dafc6 > stack pointer = 0x28:0xde7b0c24 > frame pointer = 0x28:0xde7b0c28 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 30 (dummynet) > trap number = 12 > panic: page fault > KDB: stack backtrace: > kdb_backtrace(100,c52ad480,28,de7b0be4,c,...) at kdb_backtrace+0x29 > panic(c078df19,c07d4928,0,fffff,c09b,...) at panic+0xa4 > trap_fatal(de7b0be4,ec221d87,c52ad480,c104b000,ec221000,...) at > trap_fatal+0x2b7 > trap_pfault(de7b0be4,0,ec221d87) at trap_pfault+0x16b > trap(8,28,28,1,0,...) at trap+0x331 > calltrap() at calltrap+0x5 > --- trap 0xc, eip = 0xc05dafc6, esp = 0xde7b0c24, ebp = 0xde7b0c28 --- > m_tag_locate(c55df900,0,f,0) at m_tag_locate+0x36 > dn_tag_get(c55df900,2ffbd300,1,c05c3e7e,c088e858,...) at dn_tag_get+0x1d > ready_event_wfq(c57b0800,de7b0cac,de7b0cb0) at ready_event_wfq+0x50b > dummynet_task(0,1) at dummynet_task+0x24c > taskqueue_run(c5562a00) at taskqueue_run+0xd1 > taskqueue_thread_loop(c08ce950,de7b0d38,c08ce950,c05c01e0,0,...) at > taskqueue_thread_loop+0x4a > fork_exit(c05c01e0,c08ce950,de7b0d38) at fork_exit+0xa8 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip = 0, esp = 0xde7b0d6c, ebp = 0 --- > Uptime: 50m0s > Dumping 511 MB (2 chunks) > chunk 0: 1MB (156 pages) ... ok > chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 > 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 > 31 15 > > #0 doadump () at pcpu.h:165 > 165 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) bt > #0 doadump () at pcpu.h:165 > #1 0xc059f2a6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 > #2 0xc059f57b in panic (fmt=0xc078df19 "%s") at > /usr/src/sys/kern/kern_shutdown.c:565 > #3 0xc076c1f7 in trap_fatal (frame=0xde7b0be4, eva=3961658759) at > /usr/src/sys/i386/i386/trap.c:837 > #4 0xc076bf0b in trap_pfault (frame=0xde7b0be4, usermode=0, > eva=3961658759) at /usr/src/sys/i386/i386/trap.c:745 > #5 0xc076bb71 in trap (frame= > {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = 1, tf_esi = 0, tf_ebp > = -562361304, tf_isp = -562361328, tf_ebx = 15, tf_edx = -333308545, > tf_ecx = 0, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = > -1067601978, tf_cs = 32, tf_eflags = 66178, tf_esp = 22, tf_ss = > -562361280}) at /usr/src/sys/i386/i386/trap.c:435 > #6 0xc0758bca in calltrap () at /usr/src/sys/i386/i386/exception.s:139 > #7 0xc05dafc6 in m_tag_locate (m=0xec221d7f, cookie=0, type=15, t=0x0) > at /usr/src/sys/kern/uipc_mbuf2.c:392 > #8 0xc06279ad in dn_tag_get (m=0xec221d7f) at mbuf.h:881 > #9 0xc06281fb in ready_event_wfq (p=0xc57b0800, head=0xde7b0cac, > tail=0xde7b0cb0) at /usr/src/sys/netinet/ip_dummynet.c:705 > #10 0xc06284cc in dummynet_task (context=0x0, pending=0) at > /usr/src/sys/netinet/ip_dummynet.c:805 > #11 0xc05bfe71 in taskqueue_run (queue=0xc5562a00) at > /usr/src/sys/kern/subr_taskqueue.c:257 > #12 0xc05c022a in taskqueue_thread_loop (arg=0x0) at > /usr/src/sys/kern/subr_taskqueue.c:376 > #13 0xc05897b8 in fork_exit (callout=0xc05c01e0 <taskqueue_thread_loop>, > arg=0xc08ce950, frame=0xde7b0d38) > at /usr/src/sys/kern/kern_fork.c:821 > #14 0xc0758c2c in fork_trampoline () at > /usr/src/sys/i386/i386/exception.s:208 > (kgdb) up 9 > #9 0xc06281fb in ready_event_wfq (p=0xc57b0800, head=0xde7b0cac, > tail=0xde7b0cb0) at /usr/src/sys/netinet/ip_dummynet.c:705 > 705 dn_tag_get(p->tail)->output_time += t ; > (kgdb) p *p > $1 = {next = {sle_next = 0xc6713600}, pipe_nr = 1700, bandwidth = > 50000000, delay = 0, head = 0x0, tail = 0xc55df900, > scheduler_heap = {size = 16, elements = 1, offset = 0, p = > 0xc57b2800}, not_eligible_heap = {size = 16, elements = 0, > offset = 0, p = 0xc57ac700}, idle_heap = {size = 16, elements = 0, > offset = 124, p = 0xc56a2800}, V = 9830400, > sum = 10, numbytes = -1090027776, sched_time = 2997985, if_name = '\0' > <repeats 15 times>, ifp = 0x0, ready = 0, fs = { > next = {sle_next = 0x0}, fs_nr = 0, flags_fs = 0, pipe = 0xc57b0800, > parent_nr = 0, weight = 0, qsize = 50, plr = 0, > flow_mask = {dst_ip = 0, src_ip = 0, dst_port = 0, src_port = 0, > proto = 0 '\0', flags = 0 '\0', addr_type = 0 '\0', > dst_ip6 = {__u6_addr = {__u6_addr8 = '\0' <repeats 15 times>, > __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = { > 0, 0, 0, 0}}}, src_ip6 = {__u6_addr = {__u6_addr8 = '\0' > <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, > 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, flow_id6 = 0, frag_id6 > = 0}, rq_size = 1, rq_elements = 0, > rq = 0xc55791b0, last_expired = 0, backlogged = 0, w_q = 0, max_th = > 0, min_th = 0, max_p = 0, c_1 = 0, c_2 = 0, > c_3 = 0, c_4 = 0, w_q_lookup = 0x0, lookup_depth = 0, lookup_step = > 0, lookup_weight = 0, avg_pkt_size = 0, > max_pkt_size = 0}} > > > When I remove "tag 1" the kernel stopped panick, but deadlocks didn't > pass away. When I managed to enter DDB using serial console I found > dummynet_task() looped on the following code: > > h = heaps[i]; > while (h->elements > 0 && DN_KEY_LEQ(h->p[0].key, curr_time)) { > ... > ready_event_wfq(p, &head, &tail); > ... > } > It seems to me that problem is in ready_event_wfq() in the following code: > if (p->bandwidth > 0) > t = (p->bandwidth -1 - p->numbytes) / p->bandwidth ; > > Since p->bandwidth and p->numbytes are signed integers, the result can > be negative (i have p->bandwidth=50000000 and p->numbytes=-2147483647) > > Now i test attached patch. I hope it will help. :) Could you please be so kind and test whether SMP has any effect on the bug. I.e. does an unpatched ip_dummynet without SMP cause panics? I ask this because I was unable to reproduce this bug on a non-SMP machine. Also, I see you have "dummynet_task" in your dumps. Are using RELENG_6 or 1.93.2.6 of ip_dummynet.c?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200706150740.l5F7eEl1077089>