From owner-freebsd-hackers@FreeBSD.ORG  Fri Jan 25 16:35:29 2008
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6AE7316A4FB
	for <freebsd-hackers@freebsd.org>; Fri, 25 Jan 2008 16:35:29 +0000 (UTC)
	(envelope-from kfl@xiplink.com)
Received: from pop-mx00.ca.mci.com (pop-mx00.ca.mci.com [142.77.1.56])
	by mx1.freebsd.org (Postfix) with ESMTP id 4011C13C501
	for <freebsd-hackers@freebsd.org>; Fri, 25 Jan 2008 16:35:29 +0000 (UTC)
	(envelope-from kfl@xiplink.com)
Received: from mail.net (custpop.ca.mci.com [142.77.1.111])
	by pop-mx00.ca.mci.com (Postfix) with ESMTP id 5E563D5C43
	for <freebsd-hackers@freebsd.org>; Fri, 25 Jan 2008 11:17:18 -0500 (EST)
Received: from [216.95.199.148] (HELO [192.168.1.7])
	by mail.net (CommuniGate Pro SMTP 5.0.1)
	with ESMTP id 1703102944 for freebsd-hackers@freebsd.org;
	Fri, 25 Jan 2008 11:17:18 -0500
Message-ID: <479A0B9D.5020607@xiplink.com>
Date: Fri, 25 Jan 2008 11:17:33 -0500
From: Karim Fodil-Lemelin <kfl@xiplink.com>
User-Agent: Thunderbird 2.0.0.9 (Windows/20071031)
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Mailman-Approved-At: Fri, 25 Jan 2008 16:41:51 +0000
Subject: vm_zone corruption 4.x
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 25 Jan 2008 16:35:29 -0000

Good day,

    I have stumbled into a strange problem where my FBSD 4.x box keeps 
crashing under network traffic load. I have enabled INVARIANTS and 
debugging and was able to gather a trace. The context here is that a 
listening connection created a syncache entry sent a syn-ack and is now 
processing the ack it got back. Everything seems find until it tries to 
create a new socket from the listening one and as it is about to get 
another tcp control block the kernel dies :(

(kgdb) bt
#0  Debugger (msg=0xc02bd93b "panic") at ../../i386/i386/db_interface.c:321
#1  0xc016b080 in panic (fmt=0xc02e6cd9 "zone: entry not free") at 
../../kern/kern_shutdown.c:593
#2  0xc025046b in zerror () at ../../vm/vm_zone.c:547
#3  0xc02500ab in zalloci (z=0xce703180) at ../../vm/vm_zone.c:76
#4  0xc01c1809 in in_pcballoc (so=0xeef12fe0, pcbinfo=0xc034ba40, p=0x0)
    at ../../netinet/in_pcb.c:167
#5  0xc01df8f0 in tcp_attach (so=0xeef12fe0, p=0x0) at 
../../netinet/tcp_usrreq.c:1603
#6  0xc01ddbc9 in tcp_usr_attach (so=0xeef12fe0, proto=0, p=0x0) at 
../../netinet/tcp_usrreq.c:175
#7  0xc018cd1d in sonewconn3 (head=0xeedfb7c0, connstatus=2, p=0x0) at 
../../kern/uipc_socket2.c:223
#8  0xc018cc54 in sonewconn (head=0xeedfb7c0, connstatus=2) at 
../../kern/uipc_socket2.c:196
#9  0xc01dbc40 in syncache_socket (sc=0xf0f0ac80, lso=0xeedfb7c0)
    at ../../netinet/tcp_syncache.c:594
#10 0xc01dc290 in syncache_expand (inc=0xf585ac50, th=0xc61d0034, 
sop=0xf585ac48, m=0xc3774200)
    at ../../netinet/tcp_syncache.c:946
#11 0xc01d2ce7 in tcp_input (m=0xc3774200, off0=20, proto=6) at 
../../netinet/tcp_input.c:1058
#12 0xc01ca93f in ip_input (m=0xc3774200) at ../../netinet/ip_input.c:1279
#13 0xc01ca9a3 in ipintr () at ../../netinet/ip_input.c:1300
#14 0xc027e5b9 in swi_net_next ()
#15 0xc016de61 in tsleep (ident=0xce7e9700, priority=280, 
wmesg=0xc02bb3b8 "kqread", timo=3)
    at ../../kern/kern_synch.c:479
#16 0xc01616e3 in kqueue_scan (fp=0xce7f7040, maxevents=65535, 
ulistp=0x80a2000, tsp=0xf585af2c,
    p=0xed3c3d80) at ../../kern/kern_event.c:645
#17 0xc0161211 in kevent (p=0xed3c3d80, uap=0xf585af80) at 
../../kern/kern_event.c:454
#18 0xc028c33e in syscall2 (frame={tf_fs = 47, tf_es = -562495441, tf_ds 
= -1078001617,
      tf_edi = 60, tf_esi = 134881340, tf_ebp = -1077937120, tf_isp = 
-175788076,
      tf_ebx = 134852608, tf_edx = 1, tf_ecx = -1077937128, tf_eax = 
363, tf_trapno = 7,
      tf_err = 2, tf_eip = 134690428, tf_cs = 31, tf_eflags = 663, 
tf_esp = -1077937180,
      tf_ss = 47}) at ../../i386/i386/trap.c:1175
#19 0xc027d155 in Xint0x80_syscall ()

(kgdb) p *z
$2 = {zlock = {lock_data = 0}, zitems = 0x0, zfreecnt = 13945, zfreemin 
= 6, znalloc = 253356,
  zkva = 4021252096, zpagecount = 3687, zpagemax = 5120, zmax = 32768, 
ztotal = 23596, zsize = 640,
  zalloc = 1, zflags = 1, zallocflag = 1, zobj = 0xc0341c80, zname = 
0xc02cb489 "tcpcb",
  znext = 0xce703200}

Now there is a couple of strange things here and maybe someone with more 
experience with the vm can shed some light into it.
1) I can't help but find unusual that zitems is NULL ...
2) The sum of zfreecnt + ztotal is bigger the zmax ...
3) If we are in zalloci() why is the zlock not held (0)?

What else should I be looking for here, the crash only happens after a 
certain amount of items are used (>20k so far).

Thanks,

Karim.