From owner-freebsd-net@FreeBSD.ORG Fri Jun 15 07:40:10 2007 Return-Path: X-Original-To: freebsd-net@hub.freebsd.org Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 70E7F16A400 for ; Fri, 15 Jun 2007 07:40:10 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [69.147.83.40]) by mx1.freebsd.org (Postfix) with ESMTP id 0DC4C13C483 for ; Fri, 15 Jun 2007 07:40:10 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id l5F7e9nw077041 for ; Fri, 15 Jun 2007 07:40:09 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id l5F7e9N1077036; Fri, 15 Jun 2007 07:40:09 GMT (envelope-from gnats) Date: Fri, 15 Jun 2007 07:40:09 GMT Message-Id: <200706150740.l5F7e9N1077036@freefall.freebsd.org> To: freebsd-net@FreeBSD.org From: Alexey Illarionov Cc: Subject: Re: kern/113548: [dummynet] [patch] system hangs with dummynet queues X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Alexey Illarionov List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jun 2007 07:40:10 -0000 The following reply was made to PR kern/113548; it has been noted by GNATS. From: Alexey Illarionov To: Cristian KLEIN Cc: bug-followup@FreeBSD.org Subject: Re: kern/113548: [dummynet] [patch] system hangs with dummynet queues Date: Fri, 15 Jun 2007 11:11:39 +0400 This is a multi-part message in MIME format. --------------040704070900010000020204 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cristian KLEIN wrote: > I think the problem occurs because you use ipfw tags. As far as I know, > ipfw tags are stored as mbuf_tags(9). Dummynet uses mbuf tags too to > mark it's own packets. However, I suspect that in dn_tag_get(), dummynet > incorrectly assumes it is the only one using mbuf_tags(9). > Could you please apply the following patch? Also, could you test whether > removing "tag 1" from ipfw rules has any impact? Thanks for a fast reply and for the patch. It seems that panics have really been caused by ipfw tags. When I apply this patch, there were no panics for several days, but I have got the following dump today: kgdb: kvm_nlist(_stopped_cpus): kgdb: kvm_nlist(_stoppcbs): [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode fault virtual address = 0xec221d87 fault code = supervisor read, page not present instruction pointer = 0x20:0xc05dafc6 stack pointer = 0x28:0xde7b0c24 frame pointer = 0x28:0xde7b0c28 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 30 (dummynet) trap number = 12 panic: page fault KDB: stack backtrace: kdb_backtrace(100,c52ad480,28,de7b0be4,c,...) at kdb_backtrace+0x29 panic(c078df19,c07d4928,0,fffff,c09b,...) at panic+0xa4 trap_fatal(de7b0be4,ec221d87,c52ad480,c104b000,ec221000,...) at trap_fatal+0x2b7 trap_pfault(de7b0be4,0,ec221d87) at trap_pfault+0x16b trap(8,28,28,1,0,...) at trap+0x331 calltrap() at calltrap+0x5 --- trap 0xc, eip = 0xc05dafc6, esp = 0xde7b0c24, ebp = 0xde7b0c28 --- m_tag_locate(c55df900,0,f,0) at m_tag_locate+0x36 dn_tag_get(c55df900,2ffbd300,1,c05c3e7e,c088e858,...) at dn_tag_get+0x1d ready_event_wfq(c57b0800,de7b0cac,de7b0cb0) at ready_event_wfq+0x50b dummynet_task(0,1) at dummynet_task+0x24c taskqueue_run(c5562a00) at taskqueue_run+0xd1 taskqueue_thread_loop(c08ce950,de7b0d38,c08ce950,c05c01e0,0,...) at taskqueue_thread_loop+0x4a fork_exit(c05c01e0,c08ce950,de7b0d38) at fork_exit+0xa8 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xde7b0d6c, ebp = 0 --- Uptime: 50m0s Dumping 511 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:165 165 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:165 #1 0xc059f2a6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc059f57b in panic (fmt=0xc078df19 "%s") at /usr/src/sys/kern/kern_shutdown.c:565 #3 0xc076c1f7 in trap_fatal (frame=0xde7b0be4, eva=3961658759) at /usr/src/sys/i386/i386/trap.c:837 #4 0xc076bf0b in trap_pfault (frame=0xde7b0be4, usermode=0, eva=3961658759) at /usr/src/sys/i386/i386/trap.c:745 #5 0xc076bb71 in trap (frame= {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = 1, tf_esi = 0, tf_ebp = -562361304, tf_isp = -562361328, tf_ebx = 15, tf_edx = -333308545, tf_ecx = 0, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1067601978, tf_cs = 32, tf_eflags = 66178, tf_esp = 22, tf_ss = -562361280}) at /usr/src/sys/i386/i386/trap.c:435 #6 0xc0758bca in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc05dafc6 in m_tag_locate (m=0xec221d7f, cookie=0, type=15, t=0x0) at /usr/src/sys/kern/uipc_mbuf2.c:392 #8 0xc06279ad in dn_tag_get (m=0xec221d7f) at mbuf.h:881 #9 0xc06281fb in ready_event_wfq (p=0xc57b0800, head=0xde7b0cac, tail=0xde7b0cb0) at /usr/src/sys/netinet/ip_dummynet.c:705 #10 0xc06284cc in dummynet_task (context=0x0, pending=0) at /usr/src/sys/netinet/ip_dummynet.c:805 #11 0xc05bfe71 in taskqueue_run (queue=0xc5562a00) at /usr/src/sys/kern/subr_taskqueue.c:257 #12 0xc05c022a in taskqueue_thread_loop (arg=0x0) at /usr/src/sys/kern/subr_taskqueue.c:376 #13 0xc05897b8 in fork_exit (callout=0xc05c01e0 , arg=0xc08ce950, frame=0xde7b0d38) at /usr/src/sys/kern/kern_fork.c:821 #14 0xc0758c2c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208 (kgdb) up 9 #9 0xc06281fb in ready_event_wfq (p=0xc57b0800, head=0xde7b0cac, tail=0xde7b0cb0) at /usr/src/sys/netinet/ip_dummynet.c:705 705 dn_tag_get(p->tail)->output_time += t ; (kgdb) p *p $1 = {next = {sle_next = 0xc6713600}, pipe_nr = 1700, bandwidth = 50000000, delay = 0, head = 0x0, tail = 0xc55df900, scheduler_heap = {size = 16, elements = 1, offset = 0, p = 0xc57b2800}, not_eligible_heap = {size = 16, elements = 0, offset = 0, p = 0xc57ac700}, idle_heap = {size = 16, elements = 0, offset = 124, p = 0xc56a2800}, V = 9830400, sum = 10, numbytes = -1090027776, sched_time = 2997985, if_name = '\0' , ifp = 0x0, ready = 0, fs = { next = {sle_next = 0x0}, fs_nr = 0, flags_fs = 0, pipe = 0xc57b0800, parent_nr = 0, weight = 0, qsize = 50, plr = 0, flow_mask = {dst_ip = 0, src_ip = 0, dst_port = 0, src_port = 0, proto = 0 '\0', flags = 0 '\0', addr_type = 0 '\0', dst_ip6 = {__u6_addr = {__u6_addr8 = '\0' , __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = { 0, 0, 0, 0}}}, src_ip6 = {__u6_addr = {__u6_addr8 = '\0' , __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, flow_id6 = 0, frag_id6 = 0}, rq_size = 1, rq_elements = 0, rq = 0xc55791b0, last_expired = 0, backlogged = 0, w_q = 0, max_th = 0, min_th = 0, max_p = 0, c_1 = 0, c_2 = 0, c_3 = 0, c_4 = 0, w_q_lookup = 0x0, lookup_depth = 0, lookup_step = 0, lookup_weight = 0, avg_pkt_size = 0, max_pkt_size = 0}} When I remove "tag 1" the kernel stopped panick, but deadlocks didn't pass away. When I managed to enter DDB using serial console I found dummynet_task() looped on the following code: h = heaps[i]; while (h->elements > 0 && DN_KEY_LEQ(h->p[0].key, curr_time)) { ... ready_event_wfq(p, &head, &tail); ... } It seems to me that problem is in ready_event_wfq() in the following code: if (p->bandwidth > 0) t = (p->bandwidth -1 - p->numbytes) / p->bandwidth ; Since p->bandwidth and p->numbytes are signed integers, the result can be negative (i have p->bandwidth=50000000 and p->numbytes=-2147483647) Now i test attached patch. I hope it will help. :) --------------040704070900010000020204 Content-Type: text/x-patch; name="ip_dummynet.c.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ip_dummynet.c.patch" --- ip_dummynet.c_orig Sun Jun 10 20:19:33 2007 +++ ip_dummynet.c Fri Jun 15 07:37:46 2007 @@ -433,7 +433,7 @@ static struct dn_pkt_tag * dn_tag_get(struct mbuf *m) { - struct m_tag *mtag = m_tag_first(m); + struct m_tag *mtag = m_tag_find(m, PACKET_TAG_DUMMYNET, NULL); KASSERT(mtag != NULL && mtag->m_tag_cookie == MTAG_ABI_COMPAT && mtag->m_tag_id == PACKET_TAG_DUMMYNET, @@ -698,8 +698,10 @@ if (p->if_name[0]==0 && p->numbytes < 0) { /* this implies bandwidth >0 */ dn_key t=0 ; /* number of ticks i have to wait */ - if (p->bandwidth > 0) - t = ( p->bandwidth -1 - p->numbytes) / p->bandwidth ; + if (p->bandwidth > 0) + t = ( (u_int64_t)p->bandwidth -1 - p->numbytes) / p->bandwidth ; + + KASSERT( (curr_time + t) >= curr_time, ("wfq overflow")); dn_tag_get(p->tail)->output_time += t ; p->sched_time = curr_time ; heap_insert(&wfq_ready_heap, curr_time + t, (void *)p); --------------040704070900010000020204--