From owner-freebsd-hackers@FreeBSD.ORG Sun May 28 20:34:50 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 490EE16CA55 for ; Sun, 28 May 2006 20:29:36 +0000 (UTC) (envelope-from lists-freebsd@silverwraith.com) Received: from pear.silverwraith.com (pear.silverwraith.com [69.12.167.160]) by mx1.FreeBSD.org (Postfix) with ESMTP id A2AEA43D68 for ; Sun, 28 May 2006 20:29:29 +0000 (GMT) (envelope-from lists-freebsd@silverwraith.com) Received: from avleen by pear.silverwraith.com with local (Exim 4.61 (FreeBSD)) (envelope-from ) id 1FkRtw-0002lD-3e for freebsd-hackers@freebsd.org; Sun, 28 May 2006 13:30:16 -0700 Date: Sun, 28 May 2006 13:30:16 -0700 From: Avleen Vig To: freebsd-hackers@freebsd.org Message-ID: <20060528203015.GA8791@silverwraith.com> References: <20060512220019.GA1911@silverwraith.com> <20060512223919.GA21382@fonon.realnet> <20060513014020.GE1911@silverwraith.com> <20060513074033.GA1236@fonon.realnet> <20060515175802.GA727@silverwraith.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060515175802.GA727@silverwraith.com> User-Agent: Mutt/1.5.11 Subject: 6.1 crash data (was: Re: no core file handler recognizes format) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 May 2006 20:34:53 -0000 On Mon, May 15, 2006 at 10:58:02AM -0700, Avleen Vig wrote: > On Sat, May 13, 2006 at 11:40:33AM +0400, Stanislav Sedov wrote: > > Rebuild your kernel with INVARIANTS enabled and debug info. It will > > provide more information in case the crash happens again. Ok, I finally got a core file with the crash :-) Where's what some of kgdb tells me. All I can tell, is that the bug happened somewhere around trying to set a TOS value for an outbound network packet? Help please? [root@gooseberry] ~ # kgdb -c /var/crash/vmcore.0 /usr/obj/usr/src/sys/GOOSEBERR Y/kernel.debug [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Unde fined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode fault virtual address = 0x58 fault code = supervisor write, page not present instruction pointer = 0x20:0xc05efa9a stack pointer = 0x28:0xd6cb7ae0 frame pointer = 0x28:0xd6cb7b10 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 20115 (python) trap number = 12 panic: page fault Uptime: 10d6h22m19s Dumping 511 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:165 165 pcpu.h: No such file or directory. in pcpu.h (kgdb) where #0 doadump () at pcpu.h:165 #1 0xc0553492 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:402 #2 0xc05537ac in panic (fmt=0xc071873f "%s") at /usr/src/sys/kern/kern_shutdown.c:558 #3 0xc06fc00c in trap_fatal (frame=0xd6cb7aa0, eva=0) at /usr/src/sys/i386/i386/trap.c:836 #4 0xc06fbd17 in trap_pfault (frame=0xd6cb7aa0, usermode=0, eva=88) at /usr/src/sys/i386/i386/trap.c:744 #5 0xc06fb94d in trap (frame= {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = 0, tf_esi = -691307388, tf_ebp = -691307760, tf_isp = -691307828, tf_ebx = 0, tf_edx = -691307120, tf_ecx = 0, tf_eax = 8, tf_trapno = 12, tf_err = 2, tf_eip = -1067517286, tf_cs = 32, tf_eflags = 66183, tf_esp = -691307388, tf_ss = -691307784}) at /usr/src/sys/i386/i386/trap.c:434 #6 0xc06e994a in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc05efa9a in ip_ctloutput (so=0x8, sopt=0xd6cb7c84) at /usr/src/sys/netinet/ip_output.c:1210 at /usr/src/sys/netinet/ip_output.c:1210 #8 0xc0601ad1 in tcp_ctloutput (so=0xc57aede8, sopt=0xd6cb7c84) at /usr/src/sys/netinet/tcp_usrreq.c:1038 #9 0xc05971a7 in sosetopt (so=0xc57aede8, sopt=0xd6cb7c84) at /usr/src/sys/kern/uipc_socket.c:1560 #10 0xc059cec9 in kern_setsockopt (td=0xc4b03900, s=8, level=8, name=8, val=0xbfbfab68, valseg=UIO_USERSPACE, valsize=0) at /usr/src/sys/kern/uipc_syscalls.c:1351 #11 0xc059cdee in setsockopt (td=0x8, uap=0xd6cb7d90) at /usr/src/sys/kern/uipc_syscalls.c:1307 #12 0xc06fc322 in syscall (frame= {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = -1077957792, tf_esi = -1077957784, tf_ebp = -1077957768, tf_isp = -691307164, tf_ebx = 708028888, tf_edx = 170620760, tf_ecx = -1077958488, tf_eax = 105, tf_trapno = 22, tf_err = 2, tf_eip = 673659967, tf_cs = 51, tf_eflags = 662, tf_esp = -1077957844, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:981 #13 0xc06e999f in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200 #14 0x00000033 in ?? () (kgdb) up 7 #7 0xc05efa9a in ip_ctloutput (so=0x8, sopt=0xd6cb7c84) at /usr/src/sys/netinet/ip_output.c:1210 1210 inp->inp_ip_tos = optval; (kgdb) p optval $1 = 8 (kgdb) p inp $2 = (struct inpcb *) 0x0 (kgdb) p inp->inp_ip_tos There is no member named inp_ip_tos. (kgdb) p inp->inp_depend4.inp4_ip_tos Cannot access memory at address 0x58 **** Here I went up one more, to #8: (kgdb) up 1 #8 0xc0601ad1 in tcp_ctloutput (so=0xc57aede8, sopt=0xd6cb7c84) at /usr/src/sys/netinet/tcp_usrreq.c:1038 1038 error = ip_ctloutput(so, sopt); (kgdb) p *so $14 = {so_count = 1, so_type = 1, so_options = 4, so_linger = 0, so_state = 8448, so_qstate = 0, so_pcb = 0x0, so_proto = 0xc076e588, so_head = 0x0, so_incomp = {tqh_first = 0x0, tqh_last = 0x0}, so_comp = { tqh_first = 0x0, tqh_last = 0x0}, so_list = {tqe_next = 0x0, tqe_prev = 0xc3baa5b4}, so_qlen = 0, so_incqlen = 0, so_qlimit = 0, so_timeo = 0, so_error = 54, so_sigio = 0x0, so_oobmark = 0, so_aiojobq = { tqh_first = 0x0, tqh_last = 0xc57aee30}, so_rcv = {sb_sel = {si_thrlist = { tqe_next = 0x0, tqe_prev = 0x0}, si_thread = 0x0, si_note = { kl_list = {slh_first = 0x0}, kl_lock = 0xc0535980 , kl_unlock = 0xc05359b0 , kl_locked = 0xc05359e0 , kl_lockarg = 0xc57aee5c}, si_flags = 0}, sb_mtx = {mtx_object = {lo_class = 0xc0764584, lo_name = 0xc07312d1 "so_rcv", lo_type = 0xc07312d1 "so_rcv", lo_flags = 196608, lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0}, sb_state = 32, sb_mb = 0x0, sb_mbtail = 0x0, sb_lastrecord = 0x0, sb_cc = 0, sb_hiwat = 65700, sb_mbcnt = 0, sb_mbmax = 525600, sb_ctl = 0, sb_lowat = 1, sb_timeo = 0, sb_flags = 0}, so_snd = {sb_sel = { si_thrlist = {tqe_next = 0x0, tqe_prev = 0x0}, si_thread = 0x0, si_note = {kl_list = {slh_first = 0x0}, kl_lock = 0xc0535980 , kl_unlock = 0xc05359b0 , kl_locked = 0xc05359e0 , kl_lockarg = 0xc57aeed4}, si_flags = 0}, sb_mtx = {mtx_object = {lo_class = 0xc0764584, lo_name = 0xc07312ca "so_snd", lo_type = 0xc07312ca "so_snd", lo_flags = 196608, lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0}, sb_state = 16, sb_mb = 0x0, sb_mbtail = 0x0, sb_lastrecord = 0x0, sb_cc = 0, sb_hiwat = 33580, sb_mbcnt = 0, sb_mbmax = 268640, sb_ctl = 0, sb_lowat = 2048, sb_timeo = 0, sb_flags = 0}, so_upcall = 0, so_upcallarg = 0x0, so_cred = 0xc54fc880, so_label = 0x0, so_peerlabel = 0x0, so_gencnt = 1765445, so_emuldata = 0x0, so_accf = 0x0} (kgdb) p *sopt $15 = {sopt_dir = SOPT_SET, sopt_level = 0, sopt_name = 3, sopt_val = 0xbfbfab68, sopt_valsize = 4, sopt_td = 0xc4b03900} That's about all I was about to find out with my limited debugging skills (and from reading Michael Lucas's OnLamp.com article on kernel debugging). Everything I've seen the panic, it's been while some python process was running, which seems like more than a coincedence.