From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 20 02:03:05 2005 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 73EDF16A422 for ; Wed, 20 Jul 2005 02:03:05 +0000 (GMT) (envelope-from edwin@verolan.com) Received: from ns11.webmasters.com (ns11.webmasters.com [66.118.156.2]) by mx1.FreeBSD.org (Postfix) with SMTP id 4988A43D48 for ; Wed, 20 Jul 2005 02:03:04 +0000 (GMT) (envelope-from edwin@verolan.com) Received: (qmail 3478 invoked from network); 20 Jul 2005 01:59:55 -0000 Received: from unknown (HELO localhost.localdomain) (204.9.60.14) by ns11.webmasters.com with SMTP; 20 Jul 2005 01:59:55 -0000 Received: from localhost.localdomain (asx01 [127.0.0.1]) by localhost.localdomain (8.13.1/8.13.1) with ESMTP id j6K232o0024496; Tue, 19 Jul 2005 22:03:03 -0400 Received: (from edwin@localhost) by localhost.localdomain (8.13.1/8.13.1/Submit) id j6K232AJ024495; Tue, 19 Jul 2005 22:03:02 -0400 Date: Tue, 19 Jul 2005 22:03:02 -0400 From: Edwin To: freebsd-hackers@freebsd.org Message-ID: <20050720020302.GA24474@asx01.verolan.com> References: <20050719034215.GB20752@asx01.verolan.com> <200507191120.37526.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <200507191120.37526.jhb@FreeBSD.org> User-Agent: Mutt/1.4.1i X-Operating-System: Linux/(i686) Cc: edwin@verolan.com Subject: Re: help w/panic under heavy load - 5.4 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jul 2005 02:03:05 -0000 Hi John, Updated the kernel, same crash under load, looks like m is null, you're right. Not quite sure where to go from here. I'm happy to do the footwork - just still real hazy on the BSD kernel part of things. Thanks for the help! /Edwin Results from KDB/DDB/INVARIANTS/INVARIANT_SUPPORT - same crash (ddb and kdb output) panic: m_copym, offset > size of mbuf chain KDB: enter: panic [thread pid 27 tid 100021 ] Stopped at kdb_enter+0x2b: nop db> where Tracing pid 27 tid 100021 td 0xc0ed0180 kdb_enter(c0821a6a) at kdb_enter+0x2b panic(c0826049,0,c076b79c,c102d600,100) at panic+0xbb m_copym(0,5dc,5c8,1,14) at m_copym+0x60 ip_fragment(c123180e,c76d1c38,5dc,0,1) at ip_fragment+0x214 ip_fastforward(c11fee00) at ip_fastforward+0x6ed ether_demux(c0f90000,c11fee00,52,c0f8aad0,1f) at ether_demux+0x259 ether_input(c0f90000,c11fee00,c0f902d0,0,c08336ab) at ether_input+0x25d sis_rxeof(c0f90000,1,5,c08e5500,c76d1ce0) at sis_rxeof+0x1ab sis_poll(c0f90000,0,5) at sis_poll+0x7f netisr_poll(0) at netisr_poll+0x188 swi_net(0) at swi_net+0x81 ithread_loop(c0ec6580,c76d1d48,c0ec6580,c060030c,0) at ithread_loop+0x124 fork_exit(c060030c,c0ec6580,c76d1d48) at fork_exit+0xa4 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xc76d1d7c, ebp = 0 --- db> call doadump Dumping 128 MB 16 32 48 64 80 96 112 Dump complete 0xf db> reset mbsd05# kgdb kernel.debug /tmp/crash/vmcore.1 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". #0 doadump () at pcpu.h:159 159 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) where #0 doadump () at pcpu.h:159 #1 0xc04611f6 in db_fncall (dummy1=0, dummy2=0, dummy3=43, dummy4=0xc76d19c0 "�\031m�") at /usr/src/sys/ddb/db_command.c:531 #2 0xc0461004 in db_command (last_cmdp=0xc08c9264, cmd_table=0x0, aux_cmd_tablep=0xc08483b8, aux_cmd_tablep_end=0xc08483d4) at /usr/src/sys/ddb/db_command.c:349 #3 0xc04610cc in db_command_loop () at /usr/src/sys/ddb/db_command.c:455 #4 0xc0462c51 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221 #5 0xc0627af2 in kdb_trap (type=3, code=0, tf=0xc76d1afc) at /usr/src/sys/kern/subr_kdb.c:468 #6 0xc07b6394 in trap (frame= {tf_fs = -949157864, tf_es = -1067319280, tf_ds = -1065222128, tf_edi = 1, tf_esi = - 1065197495, tf_ebp = -949150916, tf_isp = -949150936, tf_ebx = -949150872, tf_edx = 0, tf_e cx = -1060921344, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1067288461, tf_cs = -1065222136, tf_eflags = 646, tf_esp = -949150884, tf_ss = -1067376657}) at /usr/src/sys/i386/i386/trap.c:584 #7 0xc07a69ca in calltrap () at /usr/src/sys/i386/i386/exception.s:140 #8 0xc76d0018 in ?? () #9 0xc0620010 in schedcpu () at /usr/src/sys/kern/sched_4bsd.c:461 #10 0xc0611fef in panic (fmt=0xc0820008 "default") at /usr/src/sys/kern/kern_shutdown.c:550 #11 0xc0641a2c in m_copym (m=0x0, off0=1500, len=1480, wait=1) at /usr/src/sys/kern/uipc_mbuf.c:385 #12 0xc069b694 in ip_fragment (ip=0xc123180e, m_frag=0xc76d1c38, mtu=-1056778752, if_hwassist_flags=0, sw_csum=1) at /usr/src/sys/netinet/ip_output.c:967 #13 0xc06933c1 in ip_fastforward (m=0xc11fee00) at /usr/src/sys/netinet/ip_fastfwd.c:572 #14 0xc0672a59 in ether_demux (ifp=0xc0f90000, m=0xc11fee00) at /usr/src/sys/net/if_ethersubr.c:770 #15 0xc06727f5 in ether_input (ifp=0xc0f90000, m=0xc11fee00) at /usr/src/sys/net/if_ethersubr.c:631 #16 0xc0713507 in sis_rxeof (sc=0xc0f90000) at /usr/src/sys/pci/if_sis.c:1636 #17 0xc07137cf in sis_poll (ifp=0xc0f90000, cmd=POLL_ONLY, count=0) at /usr/src/sys/pci/if_sis.c:1769 #18 0xc05f8280 in netisr_poll () at /usr/src/sys/kern/kern_poll.c:384 #19 0xc0679985 in swi_net (dummy=0x0) at /usr/src/sys/net/netisr.c:338 #20 0xc0600430 in ithread_loop (arg=0xc0ec6580) at /usr/src/sys/kern/kern_intr.c:547 #21 0xc05ff8a4 in fork_exit (callout=0xc060030c , arg=0xc0ec6580, frame=0xc76d1d48) at /usr/src/sys/kern/kern_fork.c:791 #22 0xc07a6a2c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209 (kgdb) f 12 #12 0xc069b694 in ip_fragment (ip=0xc123180e, m_frag=0xc76d1c38, mtu=-1056778752, if_hwassist_flags=0, sw_csum=1) at /usr/src/sys/netinet/ip_output.c:967 967 m->m_next = m_copy(m0, off, len); (kgdb) f 11 #11 0xc0641a2c in m_copym (m=0x0, off0=1500, len=1480, wait=1) at /usr/src/sys/kern/uipc_mbuf.c:385 385 KASSERT(m != NULL, ("m_copym, offset > size of mbuf chain")); (kgdb) l 380 KASSERT(len >= 0, ("m_copym, negative len %d", len)); 381 MBUF_CHECKSLEEP(wait); 382 if (off == 0 && m->m_flags & M_PKTHDR) 383 copyhdr = 1; 384 while (off > 0) { 385 KASSERT(m != NULL, ("m_copym, offset > size of mbuf chain")); 386 if (off < m->m_len) 387 break; 388 off -= m->m_len; 389 m = m->m_next; (kgdb) i loc n = (struct mbuf *) 0xc102d600 np = (struct mbuf **) 0xc102d654 off = 1432 top = (struct mbuf *) 0x1 copyhdr = 0 (kgdb) p m $14 = (struct mbuf *) 0x0 (kgdb) (kgdb) f 12 #12 0xc069b694 in ip_fragment (ip=0xc123180e, m_frag=0xc76d1c38, mtu=-1056778752, if_hwassist_flags=0, sw_csum=1) at /usr/src/sys/netinet/ip_output.c:967 967 m->m_next = m_copy(m0, off, len); (kgdb) l 962 len = ip->ip_len - off; 963 m->m_flags |= M_LASTFRAG; 964 } else 965 mhip->ip_off |= IP_MF; 966 mhip->ip_len = htons((u_short)(len + mhlen)); 967 m->m_next = m_copy(m0, off, len); 968 if (m->m_next == NULL) { /* copy failed */ 969 m_free(m); 970 error = ENOBUFS; /* ??? */ 971 ipstat.ips_odropped++; (kgdb) i loc mhip = (struct ip *) 0xc102d640 m = (struct mbuf *) 0xc102d600 mhlen = 20 error = 0 hlen = 20 len = 1480 off = 1500 m0 = (struct mbuf *) 0xc11fee00 firstlen = 1480 mnext = (struct mbuf **) 0xc11fee04 nfrags = 1 (kgdb) p *m0 $13 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xc123180e "E", mh_len = 68, mh_flags = 3, mh_type = 1}, M_dat = {MH = {MH_pkthdr = {rcvif = 0xc0f90000, len = 68, header = 0x0, csum_flags = 769, csum_data = 0, tags = {slh_first = 0x0}}, MH_dat = {MH_ext = {ext_buf = 0xc1231800 "", ext_free = 0, ext_args = 0x0, ext_size = 2048, ref_cnt = 0x0, ext_type = 3}, MH_databuf = "\000\030#�\000\000\000\000\000\000\000\000\000\b\000\000\000\000\00 0\ \000\003", '\0' }}, M_databuf = "\000\000��D\000\000\000\000\000\000\000\001\003", '\0' , , "\030#�\000\000\000\000\000\000\000\000\000\b\000\000\000\000\000\000\003", '\0' }} (kgdb) John Baldwin (jhb@FreeBSD.org) wrote: > On Monday 18 July 2005 11:42 pm, Edwin wrote: > > Hi, > > > > I have a recurring (re-producible) panic on the 5.3/5.4 kernels and I would > > like to ask for some help in tracking it down. :) - it could be some > > misconfig on my part - but i have tried several different configs of the > > kernel - ultimately w/ polling on/off, ipfw on/off, ipfastforwarding on/off > > - although with ipff off - the box still crashes but in a different > > location - it will even crash w/ GENERIC kernel under heavy load. > > > > I'm not quite sure where to look past the below (ie. what variables/etc to > > present to the list). > > Try turning INVARIANTS and INVARIANT_SUPPORT on in your kernel and see if you > can reproduce this. Also, try to get a traceback in ddb if possible as > sometimes ddb gives more reliable stack traces. It looks like your m is > NULL, in which case the KASSERT() on the previous line should fire if > INVARIANTS is on. > > -- > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > "Power Users Use the Power to Serve" = http://www.FreeBSD.org