FreeBSD Mail Archives

Date:      Sat, 05 Feb 2011 21:43:25 +0300
From:      Ruslan Mahmatkhanov <cvs-src@yandex.ru>
To:        Ivan Voras <ivoras@freebsd.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Tracking down a problem with php on FreeBSD
Message-ID:  <4D4D9A4D.2070508@yandex.ru>
In-Reply-To: <ihk0a9$2on$1@dough.gmane.org>
References:  <4D3CB2AF.9050003@yandex.ru> <ihk0a9$2on$1@dough.gmane.org>


Hi, Ivan!

Thank you much for response and sorry for late answer. We was able to 
collect some data about the issue to make discussion more objective. See 
below.

24.01.2011 16:54, Ivan Voras пишет:
> On 23.1.2011 23:58, Ruslan Mahmatkhanov wrote:
>>
>> Good day!
>>
>> We are using custom php application on FreeBSD 8.1R amd64. It is started
>> with php-fpm 5.3.3 from ports as backend and nginx 0.8.54 as frontend.
>> Several times per day this app is making self unavailable.
>
> I think it would be more appropriate to ask this on the stable@ list.

I believe that this issue is not related to stable only and problem 
resolution will involve some debugging and stuff, so this was the reason 
why i post this to hackers@.

>
>> Simple php-fpm restart solves the problem, but i need to track it down
>> to the cause of this situation and ask for your assistance and
>> instructions on how to debug it. Some facts about this:
>
> On one hand, FPM is said to be very experimental...
>
> Personally, I've been using apache22-worker or apache22-event +
> mod_fcgid for years without trouble.

We prefer to avoid using apache at all, because in this it's just adds 
yet another unneeded link and complexity.

> It looks very application-specific, possibly not really an OS problem
> (or maybe a problem of different expectations from the OS when porting
> from Linux).
>
>> - `top -mio` shows very high (80000-90000 for VCSW) VCSW/IVCSW values
>> for php-fpm processes and LA is more than 120
>
> How many "real" user request are in these 120? Do any users at the time
> of problem (this doesn't look like a "crash") receive valid responses?

This problem is seen on development server too with < 5 users online. 
And nobody receives valid response, they all get this message:

Maximum execution time of 30 seconds exceeded in Unknown on line 0.

>
>> - user seeing http 502 error code in browser
>> - php-fpm log has many of this lines in time of crash:
>> Jan 23 17:56:58.176425 [WARNING] [pool world] server reached
>> max_children setting (100), consider raising it
>
> Did you try raising it? Does the error happen ONLY when this limit is
> reached?

Yes, we did. Nothing changes.
I'm not sure about second question, i believe it's not related.

> These are some very varied errors, not especially consistent with each
> other.
>
> Did you try some generic socket & TCP tuning like described in
> http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel
> ?

Yes we did some on initial stage when server was built. And recently 
tried to raise kern.ipc.somaxconn, but without success.

There is full current sysctl.conf:

security.bsd.see_other_uids = 0
security.bsd.see_other_gids = 0
security.bsd.unprivileged_proc_debug=0
security.bsd.unprivileged_read_msgbuf=0
security.bsd.conservative_signals=1
security.bsd.hardlink_check_uid=1
security.bsd.hardlink_check_gid=1
vfs.usermount=0
net.inet.tcp.drop_synfin=1
net.inet.icmp.drop_redirect=1
net.inet.icmp.log_redirect=1
net.inet.icmp.bmcastecho=0
net.inet.icmp.maskrepl=0
net.inet.icmp.icmplim=100
net.link.ether.inet.max_age=800
net.inet.ip.redirect=0
net.inet6.ip6.redirect=0
net.inet.ip.sourceroute=0
net.inet.ip.accept_sourceroute=0
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
kern.ps_showallprocs=0
net.inet.ip.random_id=1
kern.corefile=/tmp/%U.%N.%P.core

> Other than that, you will probably have to debug the php-fpm processes.
> Start by observing in which state they are (top without "-mio"). If the
> processes are blocking, try "procstat -k <pid>" on them.

Thay are not in blocking state:
   PID USERNAME    THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
39405 www           1 101    0   108M 14832K RUN    1   0:54 25.39% php-fpm
39544 www           1 101    0   108M 14832K CPU1   1   0:42 25.39% php-fpm
39395 www           1 101    0   108M 14828K CPU1   1   0:57 23.39% php-fpm
39401 www           1 101    0   108M 14832K CPU3   3   0:55 23.19% php-fpm
39481 www           1 101    0   108M 14832K RUN    3   0:55 23.10% php-fpm
13024 www           1 101    0   112M 47156K RUN    0   1:30 23.00% php-fpm
39398 www           1   4    0   108M 14832K accept 0   1:00 13.48% php-fpm
39373 www           1   4    0   108M 14832K accept 0   1:00 10.60% php-fpm
39385 www           1   4    0   108M 14832K accept 1   1:00  9.28% php-fpm
22130 www           1   4    0   111M 40200K accept 1   1:32  7.57% php-fpm

There is procstat -k/-t for couple of processes from this list:

PID      TID COMM             TDNAME     KSTACK
39405 100110 php-fpm             -       dmapbase
   PID    TID COMM             TDNAME     CPU  PRI STATE   WCHAN
39405 100110 php-fpm             -        1   185 run      -

   PID    TID COMM             TDNAME     KSTACK
39398 100154 php-fpm          -          tdq_cpu
   PID    TID COMM             TDNAME     CPU  PRI STATE   WCHAN
39398 100154 php-fpm          -           0   88 sleep   accept

When attaching to any hanging php-fpm proccess with truss, than i see a 
lot of this calls:
sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) 
= 0 (0x0)
sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) 
= 0 (0x0)
sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) 
= 0 (0x0)
sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) 
= 0 (0x0)

Thank you in advance for any tips.

-- 
Regards,
Ruslan

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D4D9A4D.2070508>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation