From owner-freebsd-bugs@freebsd.org Fri Sep 23 10:10:05 2016 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B87BBE5B3A for ; Fri, 23 Sep 2016 10:10:05 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E6057C9A for ; Fri, 23 Sep 2016 10:10:04 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u8NAA4iI051706 for ; Fri, 23 Sep 2016 10:10:04 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 212920] Li loaded web server cath race condition on _close () from /lib/libc.so.7 with accf_http Date: Fri, 23 Sep 2016 10:10:04 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.3-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: fbsd98816551@avksrv.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter cc Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Sep 2016 10:10:05 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D212920 Bug ID: 212920 Summary: Li loaded web server cath race condition on _close () from /lib/libc.so.7 with accf_http Product: Base System Version: 10.3-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: fbsd98816551@avksrv.org CC: freebsd-amd64@FreeBSD.org CC: freebsd-amd64@FreeBSD.org Hello! Recently we upgraded our high loaded web server to FREEBSD-STABLE 10.3 r305= 091 and got problem with NGINX (nginx-1.10.1_2,2 compiled from latest ports with most default settings). After some time one worker stopped answer requests = and top command shows it in state soclos 1072 nobody 1 22 0 1698M 65680K soclos 5 0:13 0.00% ng= inx after short while next worker stops in same state and so on untill all work= ers become "soclos" and web server stops serve requests (but still accept connections, which die on timeout after client sent a request). Increasing workers count only move problem to next half an hour. Restarting nginx fix for some not so long time. Server is more or less high loaded with 1000-2000 request/sec. Actually server is frontend proxy with proxy_cache functionality. We tried on 2 different phisical servers with actually different NICs and CPUs. When we returned kernel (only kernel and modules at /boot/kernel, not world) to r302223, problem gone. We tried to upgrade to yesterdey's r306194. Problem is still here. Somethin= g=20 changed between end of Jun and end of Aug in kernel code what generate a problem backtrace from nginx while it in "soclos" #0 0x0000000801a17d28 in _close () from /lib/libc.so.7 #1 0x000000080098a925 in pthread_suspend_all_np () from /lib/libthr.so.3 #2 0x00000000004329b9 in ngx_close_connection (c=3D0x869c1de70) at src/core/ngx_connection.c:1169 #3 0x0000000000486370 in ngx_http_close_connection (c=3D0x869c1de70) at src/http/ngx_http_request.c:3543 #4 0x0000000000488e86 in ngx_http_close_request (r=3D0x80244c050, rc=3D408= ) at src/http/ngx_http_request.c:3406 #5 0x000000000048d9ed in ngx_http_process_request_headers (rev=3D0x807810b= 70) at src/http/ngx_http_request.c:1202 #6 0x000000000044fdbd in ngx_event_expire_timers () at src/event/ngx_event_timer.c:94 #7 0x000000000044e60f in ngx_process_events_and_timers (cycle=3D0x80248805= 0) at src/event/ngx_event.c:256 #8 0x000000000045f406 in ngx_worker_process_cycle (cycle=3D0x802488050, data=3D0xa) at src/os/unix/ngx_process_cycle.c:753 #9 0x000000000045ae7c in ngx_spawn_process (cycle=3D0x802488050, proc=3D0x= 45f2f0 , data=3D0xa, name=3D0x53ecea "worker process", respawn=3D-3) at src/os/unix/ngx_process.c:198 #10 0x000000000045cc89 in ngx_start_worker_processes (cycle=3D0x802488050, = n=3D16, type=3D-3) at src/os/unix/ngx_process_cycle.c:358 #11 0x000000000045c486 in ngx_master_process_cycle (cycle=3D0x802488050) at src/os/unix/ngx_process_cycle.c:130 #12 0x0000000000413288 in main (argc=3D1, argv=3D0x7fffffffead0) at src/core/nginx.c:367 (gdb) list src/core/ngx_connection.c:1169 1164=20=20=20 1165 if (c->shared) { 1166 return; 1167 } 1168=20=20=20 1169 if (ngx_close_socket(fd) =3D=3D -1) { <<<<<<<< 1170=20=20=20 1171 err =3D ngx_socket_errno; 1172=20=20=20 1173 if (err =3D=3D NGX_ECONNRESET || err =3D=3D NGX_ENOTCONN) { and actually called close(fd): #define ngx_close_socket close All TCP sessions opened by worker frose in present state. Same if we do not load and do not use in nginx config accf_http, problem not repeased with all 3 tested kernels kernel GENERIC and only extra accf_http ipmi smbus mfip ums zfs and opensol= aris module loaded As long as accf_http did some good for our server, we can not simple disabe the module in production env. I'll debug more, but as long as I'm not is good C programmer, it will take = some time. If someone knows what changed in related functions, may be it will be faster to check from that side.. --=20 You are receiving this mail because: You are the assignee for the bug.=