From owner-freebsd-stable Wed Aug 21 19:58:47 2002 Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7B80137B400; Wed, 21 Aug 2002 19:58:40 -0700 (PDT) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id B7B9143E75; Wed, 21 Aug 2002 19:58:39 -0700 (PDT) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id g7M2vVi32850; Wed, 21 Aug 2002 22:57:31 -0400 (EDT) (envelope-from bmilekic@unixdaemons.com) Date: Wed, 21 Aug 2002 22:57:31 -0400 From: Bosko Milekic To: Gavin Atkinson Cc: Ian Dowse , freebsd-stable@FreeBSD.ORG, dillon@FreeBSD.ORG Subject: Re: mbuf usage - how do i track it down? Message-ID: <20020821225731.A32832@unixdaemons.com> References: <200208211849.aa43591@salmon.maths.tcd.ie> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from gavin@ury.york.ac.uk on Wed, Aug 21, 2002 at 11:01:34PM +0100 Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Wow. Is this something that only started happening recently? How recent -STABLE are you running? On Wed, Aug 21, 2002 at 11:01:34PM +0100, Gavin Atkinson wrote: > On Wed, 21 Aug 2002, Ian Dowse wrote: > > In message , Gavin Atkinson writes: > > >So how do I find out what is actually allocating these mbufs. Something > > >seems to be leaking them. > > http://www.maths.tcd.ie/~iedowse/FreeBSD/minfo/ > > It does a pile of consistency checks and can dump mbuf contents > > with the -x flag. Run it redirected to a file so that it gets > > a resonably consistent snapshot of the system, and then examine > > the file. > > OK, this is interesting. It looks like i end up with a loop in an mbuf > chain. > > Chain 0xc0cee900 > 0xc0cee900 len 41 flags 0x002 type 1 next 0xc0e3ba00 prev 0x0 > 0xc0e3ba00 len 181 flags 0x002 type 1 next 0xc0cc9a00 prev 0xc0cee900 > 0xc0cc9a00 len 135 flags 0x002 type 1 next 0xc0dacc00 prev 0xc0e3ba00 > 0xc0dacc00 len 185 flags 0x002 type 1 next 0xc0dd9c00 prev 0xc0cc9a00 > > 0xc0e50a00 len 144 flags 0x002 type 1 next 0xc0d16400 prev 0xc0e10900 > 0xc0d16400 len 236 flags 0x000 type 0 next 0xc0e45700 prev 0xc0e50a00 > 0xc0e45700 len 567 flags 0x003 type 0 next 0xc0e4f900 prev 0xc0da4f00 > 0xc0e4f900 len 16 flags 0x000 type 0 next 0xc0dd2b00 prev 0xc0e45700 > 0xc0dd2b00 len 40 flags 0x002 type 0 next 0xc0bd7600 prev 0xc0e4f900 > 0xc0bd7600 len 16 flags 0x000 type 8 next 0xc0dd2b00 prev 0xc0dd2b00 > 0xc0dd2b00 len 40 flags 0x002 type 0 next 0xc0bd7600 prev 0xc0e4f900 > 0xc0bd7600 len 16 flags 0x000 type 8 next 0xc0dd2b00 prev 0xc0dd2b00 > > Running it again a while later, gets stuck in a different loop > 0xc0f8ac00 len 203 flags 0x002 type 0 next 0xc0decb00 prev 0xc0f98e00 > 0xc0decb00 len 209 flags 0x002 type 1 next 0xc0f8ac00 prev 0xc0f8ac00 > 0xc0f8ac00 len 203 flags 0x002 type 0 next 0xc0decb00 prev 0xc0f98e00 > 0xc0decb00 len 209 flags 0x002 type 1 next 0xc0f8ac00 prev 0xc0f8ac00 > 0xc0f8ac00 len 203 flags 0x002 type 0 next 0xc0decb00 prev 0xc0f98e00 > 0xc0decb00 len 209 flags 0x002 type 1 next 0xc0f8ac00 prev 0xc0f8ac00 > 0xc0f8ac00 len 203 flags 0x002 type 0 next 0xc0decb00 prev 0xc0f98e00 > 0xc0decb00 len 209 flags 0x002 type 1 next 0xc0f8ac00 prev 0xc0f8ac00 > > Running it again, i see other suspicious results > > 0xc107cf00 len 150 flags 0x002 type 1 next 0xc0fe3d00 prev 0xc104a400 > 0xc107d000 len 0 flags 0x000 type 0 next 0x0 prev 0xc107d100 > 0xc107d100 len 876824626 flags 0x5757 type 22839 next 0xc107d000 prev 0xc107d200 > 0xc107d200 len 0 flags 0x77e7 type 0 next 0xc107d100 prev 0xc107d300 > 0xc107d300 len 539915361 flags 0x6369 type 27713 next 0xc107d200 prev 0xc107d400 > 0xc107d400 len 13315 flags 0x000 type 0 next 0xc107d300 prev 0xc107d500 > 0xc107d500 len -4014831 flags 0x1fe7 type -12313 next 0xc107d400 prev 0xc107d600 > 0xc107d600 len 1852994816 flags 0x000 type 58 next 0xc107d500 prev 0xc0e57e00 > 0xc107d700 len 208 flags 0x002 type 0 next 0xc0fee700 prev 0xc107c100 > 0xc107d800 len 37 flags 0x002 type 0 next 0xc0e72c00 prev 0xc1045f00 > 0xc107d900 len 16 flags 0x000 type 0 next 0xc104d300 prev 0xc0fd1200 > > So i'm at a loss as to where to go from here. Assuming minfo works, i may > be seeing mbuf chain corruption. > > I have also seen the fillowing panic, during times of low mbufs: > panic messages: > --- > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0xc > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc0196db8 > stack pointer = 0x10:0xc97f0bf0 > frame pointer = 0x10:0xc97f0bfc > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 167 (natd) > interrupt mask = > trap number = 12 > panic: page fault > > #5 0xc02bd553 in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16, tf_edi = 1, > tf_esi = 0, tf_ebp = -914420740, tf_isp = -914420772, > tf_ebx = -1058123776, tf_edx = -1, tf_ecx = -925641856, tf_eax = -1, > tf_trapno = 12, tf_err = 0, tf_eip = -1072075336, tf_cs = 8, > tf_eflags = 66050, tf_esp = -1058123776, tf_ss = 1}) > at /usr/src/sys/i386/i386/trap.c:466 > #6 0xc0196db8 in m_copydata (m=0x0, off=-1, len=1, > cp=0xc0ee5070 ":sha1:UPA6SGRR2Z7Y3YSND2J3JYQTFPMKK5JI") > at /usr/src/sys/kern/uipc_mbuf.c:863 > #7 0xc01e438c in tcp_output (tp=0xc8e11140) > at /usr/src/sys/netinet/tcp_output.c:607 > #8 0xc01e325b in tcp_input (m=0xc0ee5000, off0=20, proto=6) > at /usr/src/sys/netinet/tcp_input.c:2158 > #9 0xc01dcc73 in ip_input (m=0xc0ee5000) > at /usr/src/sys/netinet/ip_input.c:821 > #10 0xc01d605b in div_output (so=0xc8d3ce40, m=0xc0ee5000, sin=0xc1ac2650, > control=0x0) at /usr/src/sys/netinet/ip_divert.c:327 > #11 0xc01d61fb in div_send (so=0xc8d3ce40, flags=0, m=0xc0ee5000, > nam=0xc1ac2650, control=0x0, p=0xc874b380) > at /usr/src/sys/netinet/ip_divert.c:440 > #12 0xc01990db in sosend (so=0xc8d3ce40, addr=0xc1ac2650, uio=0xc97f0ecc, > top=0xc0ee5000, control=0x0, flags=0, p=0xc874b380) > at /usr/src/sys/kern/uipc_socket.c:609 > #13 0xc019c49b in sendit (p=0xc874b380, s=3, mp=0xc97f0f0c, flags=0) > at /usr/src/sys/kern/uipc_syscalls.c:585 > #14 0xc019c59e in sendto (p=0xc874b380, uap=0xc97f0f80) > at /usr/src/sys/kern/uipc_syscalls.c:638 > #15 0xc02bdfe1 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, > tf_edi = -1078002552, tf_esi = 1, tf_ebp = -1077937016, > tf_isp = -914419756, tf_ebx = 60, tf_edx = 3, tf_ecx = 1, tf_eax = 133, > tf_trapno = 7, tf_err = 2, tf_eip = 134551364, tf_cs = 31, > tf_eflags = 643, tf_esp = -1078002724, tf_ss = 47}) > at /usr/src/sys/i386/i386/trap.c:1175 > #16 0xc02aef45 in Xint0x80_syscall () > > (kgdb) f 6 > #6 0xc0196db8 in m_copydata (m=0x0, off=-1, len=1, > cp=0xc0ee5070 ":sha1:UPA6SGRR2Z7Y3YSND2J3JYQTFPMKK5JI") > at /usr/src/sys/kern/uipc_mbuf.c:863 > 863 while (len > 0) { > > (kgdb) f 7 > #7 0xc01e438c in tcp_output (tp=0xc8e11140) > at /usr/src/sys/netinet/tcp_output.c:607 > 607 m_copydata(so->so_snd.sb_mb, off, (int) len, > (kgdb) > (kgdb) p so > $1 = (struct socket *) 0xc8d3d380 > (kgdb) p *so > $2 = {so_type = 1, so_options = 4, so_linger = 0, so_state = 258, > so_pcb = 0xc8e11080 "\003X{`k(", so_proto = 0xc0332be8, > so_head = 0x0, so_incomp = {tqh_first = 0x0, tqh_last = 0xc8d3d394}, > so_comp = {tqh_first = 0x0, tqh_last = 0xc8d3d39c}, so_list = { > tqe_next = 0x0, tqe_prev = 0x0}, so_qlen = 0, so_incqlen = 0, > so_qlimit = 0, so_timeo = 0, so_error = 0, so_sigio = 0x0, so_oobmark = 0, > so_aiojobq = {tqh_first = 0x0, tqh_last = 0xc8d3d3c0}, so_rcv = {sb_cc = 0, > sb_hiwat = 57920, sb_mbcnt = 0, sb_mbmax = 262144, sb_lowat = 1, > sb_mb = 0x0, sb_sel = {si_pid = 0, si_note = {slh_first = 0x0}, > si_flags = 0}, sb_flags = 0, sb_timeo = 0}, so_snd = {sb_cc = 0, > sb_hiwat = 33304, sb_mbcnt = 0, sb_mbmax = 262144, sb_lowat = 2048, > sb_mb = 0x0, sb_sel = {si_pid = 0, si_note = {slh_first = 0x0}, > si_flags = 0}, sb_flags = 0, sb_timeo = 0}, so_upcall = 0, > so_upcallarg = 0x0, so_cred = 0xc1a0a900, so_gencnt = 72140, > so_emuldata = 0x0, so_accf = 0x0} > (kgdb) > > SO it looks like somewhere there is also a use-mbuf-alloc-without-checking > bug somewhere. > > Gavin > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message > -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message