Date: Sat, 05 Feb 2011 21:43:25 +0300 From: Ruslan Mahmatkhanov <cvs-src@yandex.ru> To: Ivan Voras <ivoras@freebsd.org> Cc: freebsd-hackers@freebsd.org Subject: Re: Tracking down a problem with php on FreeBSD Message-ID: <4D4D9A4D.2070508@yandex.ru> In-Reply-To: <ihk0a9$2on$1@dough.gmane.org> References: <4D3CB2AF.9050003@yandex.ru> <ihk0a9$2on$1@dough.gmane.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, Ivan! Thank you much for response and sorry for late answer. We was able to collect some data about the issue to make discussion more objective. See below. 24.01.2011 16:54, Ivan Voras пишет: > On 23.1.2011 23:58, Ruslan Mahmatkhanov wrote: >> >> Good day! >> >> We are using custom php application on FreeBSD 8.1R amd64. It is started >> with php-fpm 5.3.3 from ports as backend and nginx 0.8.54 as frontend. >> Several times per day this app is making self unavailable. > > I think it would be more appropriate to ask this on the stable@ list. I believe that this issue is not related to stable only and problem resolution will involve some debugging and stuff, so this was the reason why i post this to hackers@. > >> Simple php-fpm restart solves the problem, but i need to track it down >> to the cause of this situation and ask for your assistance and >> instructions on how to debug it. Some facts about this: > > On one hand, FPM is said to be very experimental... > > Personally, I've been using apache22-worker or apache22-event + > mod_fcgid for years without trouble. We prefer to avoid using apache at all, because in this it's just adds yet another unneeded link and complexity. > It looks very application-specific, possibly not really an OS problem > (or maybe a problem of different expectations from the OS when porting > from Linux). > >> - `top -mio` shows very high (80000-90000 for VCSW) VCSW/IVCSW values >> for php-fpm processes and LA is more than 120 > > How many "real" user request are in these 120? Do any users at the time > of problem (this doesn't look like a "crash") receive valid responses? This problem is seen on development server too with < 5 users online. And nobody receives valid response, they all get this message: Maximum execution time of 30 seconds exceeded in Unknown on line 0. > >> - user seeing http 502 error code in browser >> - php-fpm log has many of this lines in time of crash: >> Jan 23 17:56:58.176425 [WARNING] [pool world] server reached >> max_children setting (100), consider raising it > > Did you try raising it? Does the error happen ONLY when this limit is > reached? Yes, we did. Nothing changes. I'm not sure about second question, i believe it's not related. > These are some very varied errors, not especially consistent with each > other. > > Did you try some generic socket & TCP tuning like described in > http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel > ? Yes we did some on initial stage when server was built. And recently tried to raise kern.ipc.somaxconn, but without success. There is full current sysctl.conf: security.bsd.see_other_uids = 0 security.bsd.see_other_gids = 0 security.bsd.unprivileged_proc_debug=0 security.bsd.unprivileged_read_msgbuf=0 security.bsd.conservative_signals=1 security.bsd.hardlink_check_uid=1 security.bsd.hardlink_check_gid=1 vfs.usermount=0 net.inet.tcp.drop_synfin=1 net.inet.icmp.drop_redirect=1 net.inet.icmp.log_redirect=1 net.inet.icmp.bmcastecho=0 net.inet.icmp.maskrepl=0 net.inet.icmp.icmplim=100 net.link.ether.inet.max_age=800 net.inet.ip.redirect=0 net.inet6.ip6.redirect=0 net.inet.ip.sourceroute=0 net.inet.ip.accept_sourceroute=0 net.inet.tcp.blackhole=2 net.inet.udp.blackhole=1 kern.ps_showallprocs=0 net.inet.ip.random_id=1 kern.corefile=/tmp/%U.%N.%P.core > Other than that, you will probably have to debug the php-fpm processes. > Start by observing in which state they are (top without "-mio"). If the > processes are blocking, try "procstat -k <pid>" on them. Thay are not in blocking state: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 39405 www 1 101 0 108M 14832K RUN 1 0:54 25.39% php-fpm 39544 www 1 101 0 108M 14832K CPU1 1 0:42 25.39% php-fpm 39395 www 1 101 0 108M 14828K CPU1 1 0:57 23.39% php-fpm 39401 www 1 101 0 108M 14832K CPU3 3 0:55 23.19% php-fpm 39481 www 1 101 0 108M 14832K RUN 3 0:55 23.10% php-fpm 13024 www 1 101 0 112M 47156K RUN 0 1:30 23.00% php-fpm 39398 www 1 4 0 108M 14832K accept 0 1:00 13.48% php-fpm 39373 www 1 4 0 108M 14832K accept 0 1:00 10.60% php-fpm 39385 www 1 4 0 108M 14832K accept 1 1:00 9.28% php-fpm 22130 www 1 4 0 111M 40200K accept 1 1:32 7.57% php-fpm There is procstat -k/-t for couple of processes from this list: PID TID COMM TDNAME KSTACK 39405 100110 php-fpm - dmapbase PID TID COMM TDNAME CPU PRI STATE WCHAN 39405 100110 php-fpm - 1 185 run - PID TID COMM TDNAME KSTACK 39398 100154 php-fpm - tdq_cpu PID TID COMM TDNAME CPU PRI STATE WCHAN 39398 100154 php-fpm - 0 88 sleep accept When attaching to any hanging php-fpm proccess with truss, than i see a lot of this calls: sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) = 0 (0x0) Thank you in advance for any tips. -- Regards, Ruslan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D4D9A4D.2070508>