From owner-freebsd-bugs@FreeBSD.ORG Mon Dec 4 15:00:38 2006 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3832916A407 for ; Mon, 4 Dec 2006 15:00:38 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [69.147.83.40]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5711543CC5 for ; Mon, 4 Dec 2006 14:59:49 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id kB4F0MTx069465 for ; Mon, 4 Dec 2006 15:00:22 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id kB4F0M7K069464; Mon, 4 Dec 2006 15:00:22 GMT (envelope-from gnats) Resent-Date: Mon, 4 Dec 2006 15:00:22 GMT Resent-Message-Id: <200612041500.kB4F0M7K069464@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Nikolay Pavlov Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C835216A403 for ; Mon, 4 Dec 2006 14:55:25 +0000 (UTC) (envelope-from quetzal@zone3000.net) Received: from mx1.sitevalley.com (sitevalley.com [209.67.60.43]) by mx1.FreeBSD.org (Postfix) with SMTP id 2B96643CA2 for ; Mon, 4 Dec 2006 14:54:51 +0000 (GMT) (envelope-from quetzal@zone3000.net) Received: from unknown (HELO viking.zone3000.net) (217.144.69.37) by 209.67.61.254 with SMTP; 4 Dec 2006 14:55:23 -0000 Received: from viking.zone3000.net (localhost [127.0.0.1]) by viking.zone3000.net (8.13.6/8.13.6) with ESMTP id kB4EsrlP003210 for ; Mon, 4 Dec 2006 16:54:56 +0200 (EET) (envelope-from quetzal@viking.zone3000.net) Received: (from quetzal@localhost) by viking.zone3000.net (8.13.6/8.13.6/Submit) id kB4EsrZn003209; Mon, 4 Dec 2006 16:54:53 +0200 (EET) (envelope-from quetzal) Message-Id: <200612041454.kB4EsrZn003209@viking.zone3000.net> Date: Mon, 4 Dec 2006 16:54:53 +0200 (EET) From: Nikolay Pavlov To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Cc: Subject: kern/106317: deadlock in "zoneli" state X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Nikolay Pavlov List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Dec 2006 15:00:38 -0000 >Number: 106317 >Category: kern >Synopsis: deadlock in "zoneli" state >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Dec 04 15:00:21 GMT 2006 >Closed-Date: >Last-Modified: >Originator: Nikolay Pavlov >Release: FreeBSD 6.2-PRERELEASE i386 >Organization: zone3000.net >Environment: System: FreeBSD accel1 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #3: Sat Nov 25 04:39:42 EST 2006 root@accel1:/usr/obj/usr/src/sys/ACCEL i386 >Description: I have a deadlock on 6.2-PRERELEASE and I can easily trigger it with a high rate of requests. The box is running squid server in accelerator mode. As you can see above the squid proccess is locked in some "zoneli" state. It's unlikely that the problem is with the hardware since I transferred everything completely to a different machine except for RAID PCI card and HDD drives. But i want to notice that in all my tests i've used em cards, Xeon CPU's and the same models of Motherboards. I can provide root access if someone intrested. Here is also original thread about this problem: http://www.freebsd.org/cgi/getmsg.cgi?fetch=395155+403056+/usr/local/www/db/text/2006/freebsd-stable/20061126.freebsd-stable Here is detailed information: last pid: 1197; load averages: 0.00, 0.00, 0.00 up 0+01:54:58 14:46:40 31 processes: 1 running, 29 sleeping, 1 zombie CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 704M Active, 629M Inact, 447M Wired, 12K Cache, 112M Buf, 2109M Free Swap: 4070M Total, 4070M Free PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 671 squid 1 -16 0 688M 688M zoneli 6:32 0.00% squid ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 680 root 1 96 0 6628K 4760K select 0:02 0.00% snmpd 1170 root 1 96 0 2332K 1588K RUN 0:00 0.00% top 698 root 1 -8 0 7768K 7288K piperd 0:00 0.00% perl5.8.8 634 root 1 96 0 2984K 1808K select 0:00 0.00% ntpd 362 _pflogd 1 -58 0 1600K 1144K bpf 0:00 0.00% pflogd 1097 quetzal 1 96 0 6220K 3220K select 0:00 0.00% sshd 709 root 1 96 0 3464K 2796K select 0:00 0.00% sendmail 1100 root 1 20 0 5036K 3064K pause 0:00 0.00% tcsh 551 root 1 96 0 1352K 996K select 0:00 0.00% syslogd 1085 root 1 4 0 6232K 3204K sbwait 0:00 0.00% sshd 1095 root 1 4 0 6232K 3204K sbwait 0:00 0.00% sshd 1088 quetzal 1 6 0 4724K 2952K ttywai 0:00 0.00% tcsh 719 root 1 8 0 1364K 1060K nanslp 0:00 0.00% cron 1098 quetzal 1 20 0 4704K 2932K pause 0:00 0.00% tcsh root@accel1:~# netstat -h 1 input (Total) output packets errs bytes packets errs bytes colls 1.6K 0 1.3M 1.5K 0 1.6M 0 1.8K 0 1.6M 1.7K 0 1.6M 0 1.3K 0 1.0M 1.4K 0 1.4M 0 1.5K 0 1.3M 1.5K 0 1.4M 0 1.6K 0 1.4M 1.6K 0 1.5M 0 1.7K 0 1.5M 1.6K 0 1.5M 0 1.3K 0 830K 1.4K 0 1.5M 0 1.1K 0 679K 1.3K 0 1.4M 0 812 0 501K 912 0 971K 0 1.2K 0 1.1M 1.2K 0 1.1M 0 617 0 325K 742 0 806K 0 634 0 312K 769 0 818K 0 1.8K 0 1.7M 1.5K 0 1.1M 0 11K 0 13M 7.5K 0 3.8M 0 10K 0 12M 8.0K 0 5.2M 0 9.7K 0 9.9M 8.2K 0 6.3M 0 513 1.7K 666K 328 0 151K 0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here goes load... 1.0K 543 782K 434 0 247K 0 0 2.3K 0 0 0 0 0 2 605 1.5K 2 0 132 0 input (Total) output packets errs bytes packets errs bytes colls 0 334 0 0 0 0 0 0 286 0 0 0 0 0 0 288 0 0 0 0 0 819 204 689K 328 0 122K 0 0 1.7K 0 0 0 0 0 866 1.2K 719K 375 0 141K 0 144 1.5K 175K 111 0 55K 0 0 1.3K 0 0 0 0 0 687 182 426K 304 0 73K 0 0 3.2K 0 0 0 0 0 1.0K 0 723K 405 0 126K 0 17 1.8K 25K 11 0 2.2K 0 598 990 409K 163 0 32K 0 Another intresting thing - i see that mbufs deniend counter is growing. All my tests showed that errors appeared exactly the same moment when the quantity of "mbufs in use" reached the magic number 65550. I am absolutely sure about this fact. root@accel1:~# netstat -m 65485/65/65550 mbufs in use (current/cache/total) ^^^^^^^ 65284/418/65702/131072 mbuf clusters in use (current/cache/total/max) 65284/0 mbuf+clusters out of packet secondary zone in use (current/cache) 0/0/0/0 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/0 9k jumbo clusters in use (current/cache/total/max) 0/0/0/0 16k jumbo clusters in use (current/cache/total/max) 146939K/852K/147791K bytes allocated to network (current/cache/total) 21823/0/75016 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/3/6656 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines root@accel1:~# netstat -m 65538/12/65550 mbufs in use (current/cache/total) ^^^^^^^ 65338/364/65702/131072 mbuf clusters in use (current/cache/total/max) 65338/6 mbuf+clusters out of packet secondary zone in use (current/cache) 0/0/0/0 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/0 9k jumbo clusters in use (current/cache/total/max) 0/0/0/0 16k jumbo clusters in use (current/cache/total/max) 147060K/731K/147791K bytes allocated to network (current/cache/total) 23262/0/86054 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/3/6656 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines My configuration files: root@accel1:~# cat /boot/loader.conf # --- Loader settings --- autoboot_delay="5" # Delay in seconds before autobooting # --- Kernel tunables --- kern.ipc.nmbclusters="131072" # Set the number of mbuf clusters kern.cam.scsi_delay="5000" # Delay (in ms) before probing SCSI kern.maxdsiz="2752M" # Allow more memory allocation for squid kern.dfldsiz="2752M" # Allow more memory allocation for squid #kern.maxssiz="256M" # Allow more memory allocation for squid #kern.maxusers="512" # --- Networking modules --- pf_load="YES" # Packet filter # --- Other modules --- accf_data_load="YES" # Wait for data accept filter accf_http_load="YES" # Wait for full HTTP request accept filter root@accel1:~# cat /etc/sysctl.conf # $FreeBSD: src/etc/sysctl.conf,v 1.8 2003/03/13 18:43:50 mux Exp $ # # This file is read when going to multi-user and its contents piped thru # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for details. # # Uncomment this to prevent users from seeing information about processes that # are being run under another UID. #security.bsd.see_other_uids=0 # --- MAC access for squid --- net.inet.ip.portrange.reservedlow=0 net.inet.ip.portrange.reservedhigh=0 security.mac.portacl.rules=uid:100:tcp:80 # --- Kernel tunning --- kern.ipc.somaxconn=8192 # --- Network tunning and protection --- kern.ipc.maxsockbuf=1048576 net.inet.tcp.sendspace=262144 net.inet.tcp.recvspace=131072 net.inet.tcp.msl=3000 net.inet.icmp.icmplim=50 net.inet.icmp.drop_redirect=1 net.inet.icmp.log_redirect=1 net.inet.ip.redirect=0 net.inet6.ip6.redirect=0 Some kernel options: options VM_KMEM_SIZE_MAX=536870912" options MAC options MAC_PORTACL Ahh... And here is also a panic :) root@accel1:/usr/obj/usr/src/sys/ACCEL# kgdb kernel.debug /var/crash/vmcore.4 kgdb: kvm_nlist(_stopped_cpus): kgdb: kvm_nlist(_stoppcbs): [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: lock order reversal: (sleepable after non-sleepable) 1st 0xca21567c so_snd (so_snd) @ /usr/src/sys/netinet/tcp_output.c:253 2nd 0xc070bd84 user map (user map) @ /usr/src/sys/vm/vm_map.c:3074 KDB: stack backtrace: kdb_backtrace(ffffffff,c071ccb0,c071c210,c06e5c4c,c0758f18,...) at kdb_backtrace+0x29 witness_checkorder(c070bd84,9,c06be56c,c02,c070d2c4,0,c06aab25,9f) at witness_checkorder+0x4cd _sx_xlock(c070bd84,c06be56c,c02) at _sx_xlock+0x2c _vm_map_lock_read(c070bd40,c06be56c,c02,184637d,c92796b0,...) at _vm_map_lock_read+0x37 vm_map_lookup(f48a29d0,0,1,f48a29d4,f48a29c4,f48a29c8,f48a29ab,f48a29ac) at vm_map_lookup+0x28 vm_fault(c070bd40,0,1,0,c927aa80,...) at vm_fault+0x65 trap_pfault(f48a2a98,0,c) at trap_pfault+0xee trap(8,c06b0028,f48a0028,1,0,...) at trap+0x325 calltrap() at calltrap+0x5 --- trap 0xc, eip = 0xc053ea34, esp = 0xf48a2ad8, ebp = 0xf48a2ae4 --- m_copydata(0,ffffffff,1,d0020d74,c1040468,...) at m_copydata+0x28 tcp_output(d21c5570) at tcp_output+0x9af tcp_input(d0020d00,14,e9,93935ce,0,...) at tcp_input+0x24a2 ip_input(d0020d00) at ip_input+0x561 netisr_processqueue(c075a6d8) at netisr_processqueue+0x6e swi_net(0) at swi_net+0xc2 ithread_execute_handlers(c9279648,c92c3400) at ithread_execute_handlers+0xce ithread_loop(c92436a0,f48a2d38,c070db20,0,c06a818a,...) at ithread_loop+0x4e fork_exit(c04f76d4,c92436a0,f48a2d38) at fork_exit+0x61 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xf48a2d6c, ebp = 0 --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0xc fault code = supervisor read, page not present instruction pointer = 0x20:0xc053ea34 stack pointer = 0x28:0xf48a2ad8 frame pointer = 0x28:0xf48a2ae4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 13 (swi1: net) trap number = 12 panic: page fault KDB: stack backtrace: kdb_backtrace(100,c927aa80,28,f48a2a98,c,...) at kdb_backtrace+0x29 panic(c069b8a1,c06c5f2c,0,fffff,c927d69b,...) at panic+0xa8 trap_fatal(f48a2a98,c,c927aa80,c070bd40,0,...) at trap_fatal+0x2a6 trap_pfault(f48a2a98,0,c) at trap_pfault+0x187 trap(8,c06b0028,f48a0028,1,0,...) at trap+0x325 calltrap() at calltrap+0x5 --- trap 0xc, eip = 0xc053ea34, esp = 0xf48a2ad8, ebp = 0xf48a2ae4 --- m_copydata(0,ffffffff,1,d0020d74,c1040468,...) at m_copydata+0x28 tcp_output(d21c5570) at tcp_output+0x9af tcp_input(d0020d00,14,e9,93935ce,0,...) at tcp_input+0x24a2 ip_input(d0020d00) at ip_input+0x561 netisr_processqueue(c075a6d8) at netisr_processqueue+0x6e swi_net(0) at swi_net+0xc2 ithread_execute_handlers(c9279648,c92c3400) at ithread_execute_handlers+0xce ithread_loop(c92436a0,f48a2d38,c070db20,0,c06a818a,...) at ithread_loop+0x4e fork_exit(c04f76d4,c92436a0,f48a2d38) at fork_exit+0x61 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xf48a2d6c, ebp = 0 --- Uptime: 25m13s Dumping 3967 MB (3 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 3966MB (1015280 pages) 3950 3934 3918 3902 3886 3870 3854 3838 3822 3806 3790 3774 3758 3742 3726 3710 3694 3678 3662 3646 3630 3614 3598 3582 3566 3550 3534 3518 3502 3486 3470 3454 3438 3422 3406 3390 3374 3358 3342 3326 3310 3294 3278 3262 3246 3230 3214 3198 3182 3166 3150 3134 3118 3102 3086 3070 3054 3038 3022 3006 2990 2974 2958 2942 2926 2910 2894 2878 2862 2846 2830 2814 2798 2782 2766 2750 2734 2718 2702 2686 2670 2654 2638 2622 2606 2590 2574 2558 2542 2526 2510 2494 2478 2462 2446 2430 2414 2398 2382 2366 2350 2334 2318 2302 2286 2270 2254 2238 2222 2206 2190 2174 2158 2142 2126 2110 2094 2078 2062 2046 2030 2014 1998 1982 1966 1950 1934 1918 1902 1886 1870 1854 1838 1822 1806 1790 1774 1758 1742 1726 1710 1694 1678 1662 1646 1630 1614 1598 1582 1566 1550 1534 1518 1502 1486 1470 1454 1438 1422 1406 1390 1374 1358 1342 1326 1310 1294 1278 1262 1246 1230 1214 1198 1182 1166 1150 1134 1118 1102 1086 1070 1054 1038 1022 1006 990 974 958 942 926 910 894 878 862 846 830 814 798 782 766 750 734 718 702 686 670 654 638 622 606 590 574 558 542 526 510 494 478 462 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 ... ok chunk 2: 1MB (128 pages) #0 doadump () at pcpu.h:165 165 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) list *0xc053ea34 0xc053ea34 is in m_copydata (libkern.h:56). 51 static __inline int imax(int a, int b) { return (a > b ? a : b); } 52 static __inline int imin(int a, int b) { return (a < b ? a : b); } 53 static __inline long lmax(long a, long b) { return (a > b ? a : b); } 54 static __inline long lmin(long a, long b) { return (a < b ? a : b); } 55 static __inline u_int max(u_int a, u_int b) { return (a > b ? a : b); } 56 static __inline u_int min(u_int a, u_int b) { return (a < b ? a : b); } 57 static __inline quad_t qmax(quad_t a, quad_t b) { return (a > b ? a : b); } 58 static __inline quad_t qmin(quad_t a, quad_t b) { return (a < b ? a : b); } 59 static __inline u_long ulmax(u_long a, u_long b) { return (a > b ? a : b); } 60 static __inline u_long ulmin(u_long a, u_long b) { return (a < b ? a : b); } (kgdb) bt #0 doadump () at pcpu.h:165 #1 0xc050ae04 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc050b05f in panic (fmt=0xc069b8a1 "%s") at /usr/src/sys/kern/kern_shutdown.c:565 #3 0xc0674fa2 in trap_fatal (frame=0xf48a2a98, eva=12) at /usr/src/sys/i386/i386/trap.c:837 #4 0xc0674cd3 in trap_pfault (frame=0xf48a2a98, usermode=0, eva=12) at /usr/src/sys/i386/i386/trap.c:745 #5 0xc0674939 in trap (frame= {tf_fs = 8, tf_es = -1066729432, tf_ds = -192282584, tf_edi = 1, tf_esi = 0, tf_ebp = -192271644, tf_isp = -192271676, tf_ebx = 4380, tf_edx = -1, tf_ecx = 0, tf_eax = -805171852, tf_trapno = 12, tf_err = 0, tf_eip = -1068242380, tf_cs = 32, tf_eflags = 590338, tf_esp = 4380, tf_ss = -769895056}) at /usr/src/sys/i386/i386/trap.c:435 #6 0xc0663bba in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc053ea34 in m_copydata (m=0x0, off=-1, len=1, cp=0xd0020d74 "") at /usr/src/sys/kern/uipc_mbuf.c:543 #8 0xc0590aeb in tcp_output (tp=0xd21c5570) at /usr/src/sys/netinet/tcp_output.c:770 #9 0xc058f536 in tcp_input (m=0xd0020d00, off0=20) at /usr/src/sys/netinet/tcp_input.c:2471 #10 0xc058755d in ip_input (m=0xd0020d00) at /usr/src/sys/netinet/ip_input.c:785 #11 0xc0578252 in netisr_processqueue (ni=0xc075a6d8) at /usr/src/sys/net/netisr.c:236 #12 0xc057841a in swi_net (dummy=0x0) at /usr/src/sys/net/netisr.c:349 #13 0xc04f762e in ithread_execute_handlers (p=0xc9279648, ie=0xc92c3400) at /usr/src/sys/kern/kern_intr.c:682 #14 0xc04f7722 in ithread_loop (arg=0xc92436a0) at /usr/src/sys/kern/kern_intr.c:765 #15 0xc04f697d in fork_exit (callout=0xc04f76d4 , arg=0xc92436a0, frame=0xf48a2d38) at /usr/src/sys/kern/kern_fork.c:821 #16 0xc0663c1c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208 >How-To-Repeat: I can reproduce it easily in my environment and provide root access for developers. >Fix: >Release-Note: >Audit-Trail: >Unformatted: