From owner-freebsd-hackers@FreeBSD.ORG Mon Jan 24 13:54:30 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 61E2C1065694 for ; Mon, 24 Jan 2011 13:54:30 +0000 (UTC) (envelope-from freebsd-hackers@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id CDA8B8FC12 for ; Mon, 24 Jan 2011 13:54:29 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1PhMsG-00043X-G0 for freebsd-hackers@freebsd.org; Mon, 24 Jan 2011 14:54:28 +0100 Received: from 49-168.dsl.iskon.hr ([89.164.49.168]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 24 Jan 2011 14:54:28 +0100 Received: from ivoras by 49-168.dsl.iskon.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 24 Jan 2011 14:54:28 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-hackers@freebsd.org From: Ivan Voras Date: Mon, 24 Jan 2011 14:54:22 +0100 Lines: 95 Message-ID: References: <4D3CB2AF.9050003@yandex.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: 49-168.dsl.iskon.hr User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101207 Lightning/1.0b2 Thunderbird/3.1.7 In-Reply-To: <4D3CB2AF.9050003@yandex.ru> Subject: Re: Tracking down a problem with php on FreeBSD X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Jan 2011 13:54:30 -0000 On 23.1.2011 23:58, Ruslan Mahmatkhanov wrote: > > Good day! > > We are using custom php application on FreeBSD 8.1R amd64. It is started > with php-fpm 5.3.3 from ports as backend and nginx 0.8.54 as frontend. > Several times per day this app is making self unavailable. I think it would be more appropriate to ask this on the stable@ list. > Simple php-fpm restart solves the problem, but i need to track it down > to the cause of this situation and ask for your assistance and > instructions on how to debug it. Some facts about this: On one hand, FPM is said to be very experimental... Personally, I've been using apache22-worker or apache22-event + mod_fcgid for years without trouble. > - I don't know how to manually reproduce this, but it happens several > times every day > - It happens on FreeBSD 7.x too > - It happens with apache+mod_php instead php-fpm > - It happens with lighthttpd instead nginx > - It happens with both SHED_4BSD and SHED_ULE > - It doesn't happen on php =< 5.2.12 > - It happens with and w/o eaccelerator It looks very application-specific, possibly not really an OS problem (or maybe a problem of different expectations from the OS when porting from Linux). > - `top -mio` shows very high (80000-90000 for VCSW) VCSW/IVCSW values > for php-fpm processes and LA is more than 120 How many "real" user request are in these 120? Do any users at the time of problem (this doesn't look like a "crash") receive valid responses? > - user seeing http 502 error code in browser > - php-fpm log has many of this lines in time of crash: > Jan 23 17:56:58.176425 [WARNING] [pool world] server reached > max_children setting (100), consider raising it Did you try raising it? Does the error happen ONLY when this limit is reached? > 2011/01/23 17:57:00 [error] 38018#0: *26006023 writev() failed (54: > Connection reset by peer) while sending request to upstream, client: > xx.xx.xx.xx, server: some.server.org, request: "POST /?ctrl=Chat& > a=chatList&__path=chat_list&h=8093b9e1cf448762d5677e21bded97ae& > h1=38ca8b747a46098c3b1a4f39e6658170 HTTP/1.1", upstream: > "fastcgi://127.0.0.1:9002", host: "some.server.org", referrer: > "http://some.server.org/" > 2011/01/23 17:57:00 [error] 38016#0: *26029878 kevent() reported > about an closed connection (54: Connection reset by peer) while > reading response header from upstream, client: xx.xx.xx.xx, server: > some.server.org, request: "POST /?ctrl=Location&a=refresh& > __path=refresh&h=276f591df26a65d9a1736f6e1006f4ab& > h1=3c0916c16b1fc2e7015b71e90bbc3d23 HTTP/1.1", upstream: > "fastcgi://127.0.0.1:9002", host: "some.server.org", referrer: > "http://some.server.org/" > 2011/01/23 17:57:02 [crit] 38020#0: *26034390 open() "/tmp/nginx > /client_temp/1/74/0000000741" failed (13: Permission denied) while > sending request to upstream, client: xx.xx.xx.xx, server: > some.server.org, request: "POST /?ctrl=Chat&a=send&__path=chat_send& > h=4a27d8d382ba9b1059412323a451ef84& > h1=b0a53c86e3c744a01356a5030559ed1a HTTP/1.1", upstream: > "fastcgi://127.0.0.1:9002", host: "some.server.org", referrer: > "http://some.server.org/" > 2011/01/23 17:57:02 [alert] 38020#0: *26034390 http request count is > zero while sending to client, client: xx.xx.xx.xx, server: > some.server.org, request: "POST /?ctrl=Chat&a=send&__path=chat_send& > h=4a27d8d382ba9b1059412323a451ef84& > h1=b0a53c86e3c744a01356a5030559ed1a HTTP/1.1", upstream: > "fastcgi://127.0.0.1:9002", host: "some.server.org", referrer: > "http://some.server.org/" > 2011/01/23 17:57:03 [error] 38014#0: *25997903 upstream prematurely > closed connection while reading response header from upstream, > client: 109.229.69.186, server: some.server.org, request: "POST > /?ctrl=Chat&a=chatList&__path=chat_list& > h=c8723de73c4f8ebb98f9bf746d75e965& > h1=3ab289760a009b07b73c6d96cc94a509 HTTP/1.1", upstream: > "fastcgi://127.0.0.1:9002", host: "some.server.org", referrer: > "http://some.server.org/" These are some very varied errors, not especially consistent with each other. Did you try some generic socket & TCP tuning like described in http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel ? Other than that, you will probably have to debug the php-fpm processes. Start by observing in which state they are (top without "-mio"). If the processes are blocking, try "procstat -k " on them.