From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 25 16:35:29 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6AE7316A4FB for ; Fri, 25 Jan 2008 16:35:29 +0000 (UTC) (envelope-from kfl@xiplink.com) Received: from pop-mx00.ca.mci.com (pop-mx00.ca.mci.com [142.77.1.56]) by mx1.freebsd.org (Postfix) with ESMTP id 4011C13C501 for ; Fri, 25 Jan 2008 16:35:29 +0000 (UTC) (envelope-from kfl@xiplink.com) Received: from mail.net (custpop.ca.mci.com [142.77.1.111]) by pop-mx00.ca.mci.com (Postfix) with ESMTP id 5E563D5C43 for ; Fri, 25 Jan 2008 11:17:18 -0500 (EST) Received: from [216.95.199.148] (HELO [192.168.1.7]) by mail.net (CommuniGate Pro SMTP 5.0.1) with ESMTP id 1703102944 for freebsd-hackers@freebsd.org; Fri, 25 Jan 2008 11:17:18 -0500 Message-ID: <479A0B9D.5020607@xiplink.com> Date: Fri, 25 Jan 2008 11:17:33 -0500 From: Karim Fodil-Lemelin User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Fri, 25 Jan 2008 16:41:51 +0000 Subject: vm_zone corruption 4.x X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Jan 2008 16:35:29 -0000 Good day, I have stumbled into a strange problem where my FBSD 4.x box keeps crashing under network traffic load. I have enabled INVARIANTS and debugging and was able to gather a trace. The context here is that a listening connection created a syncache entry sent a syn-ack and is now processing the ack it got back. Everything seems find until it tries to create a new socket from the listening one and as it is about to get another tcp control block the kernel dies :( (kgdb) bt #0 Debugger (msg=0xc02bd93b "panic") at ../../i386/i386/db_interface.c:321 #1 0xc016b080 in panic (fmt=0xc02e6cd9 "zone: entry not free") at ../../kern/kern_shutdown.c:593 #2 0xc025046b in zerror () at ../../vm/vm_zone.c:547 #3 0xc02500ab in zalloci (z=0xce703180) at ../../vm/vm_zone.c:76 #4 0xc01c1809 in in_pcballoc (so=0xeef12fe0, pcbinfo=0xc034ba40, p=0x0) at ../../netinet/in_pcb.c:167 #5 0xc01df8f0 in tcp_attach (so=0xeef12fe0, p=0x0) at ../../netinet/tcp_usrreq.c:1603 #6 0xc01ddbc9 in tcp_usr_attach (so=0xeef12fe0, proto=0, p=0x0) at ../../netinet/tcp_usrreq.c:175 #7 0xc018cd1d in sonewconn3 (head=0xeedfb7c0, connstatus=2, p=0x0) at ../../kern/uipc_socket2.c:223 #8 0xc018cc54 in sonewconn (head=0xeedfb7c0, connstatus=2) at ../../kern/uipc_socket2.c:196 #9 0xc01dbc40 in syncache_socket (sc=0xf0f0ac80, lso=0xeedfb7c0) at ../../netinet/tcp_syncache.c:594 #10 0xc01dc290 in syncache_expand (inc=0xf585ac50, th=0xc61d0034, sop=0xf585ac48, m=0xc3774200) at ../../netinet/tcp_syncache.c:946 #11 0xc01d2ce7 in tcp_input (m=0xc3774200, off0=20, proto=6) at ../../netinet/tcp_input.c:1058 #12 0xc01ca93f in ip_input (m=0xc3774200) at ../../netinet/ip_input.c:1279 #13 0xc01ca9a3 in ipintr () at ../../netinet/ip_input.c:1300 #14 0xc027e5b9 in swi_net_next () #15 0xc016de61 in tsleep (ident=0xce7e9700, priority=280, wmesg=0xc02bb3b8 "kqread", timo=3) at ../../kern/kern_synch.c:479 #16 0xc01616e3 in kqueue_scan (fp=0xce7f7040, maxevents=65535, ulistp=0x80a2000, tsp=0xf585af2c, p=0xed3c3d80) at ../../kern/kern_event.c:645 #17 0xc0161211 in kevent (p=0xed3c3d80, uap=0xf585af80) at ../../kern/kern_event.c:454 #18 0xc028c33e in syscall2 (frame={tf_fs = 47, tf_es = -562495441, tf_ds = -1078001617, tf_edi = 60, tf_esi = 134881340, tf_ebp = -1077937120, tf_isp = -175788076, tf_ebx = 134852608, tf_edx = 1, tf_ecx = -1077937128, tf_eax = 363, tf_trapno = 7, tf_err = 2, tf_eip = 134690428, tf_cs = 31, tf_eflags = 663, tf_esp = -1077937180, tf_ss = 47}) at ../../i386/i386/trap.c:1175 #19 0xc027d155 in Xint0x80_syscall () (kgdb) p *z $2 = {zlock = {lock_data = 0}, zitems = 0x0, zfreecnt = 13945, zfreemin = 6, znalloc = 253356, zkva = 4021252096, zpagecount = 3687, zpagemax = 5120, zmax = 32768, ztotal = 23596, zsize = 640, zalloc = 1, zflags = 1, zallocflag = 1, zobj = 0xc0341c80, zname = 0xc02cb489 "tcpcb", znext = 0xce703200} Now there is a couple of strange things here and maybe someone with more experience with the vm can shed some light into it. 1) I can't help but find unusual that zitems is NULL ... 2) The sum of zfreecnt + ztotal is bigger the zmax ... 3) If we are in zalloci() why is the zlock not held (0)? What else should I be looking for here, the crash only happens after a certain amount of items are used (>20k so far). Thanks, Karim.