Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Sep 2016 23:01:43 +0300
From:      Slawa Olhovchenkov <slw@zxy.spb.ru>
To:        Julien Charbon <julien.charbon@gmail.com>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@FreeBSD.org, hiren panchasara <hiren@strugglingcoder.info>
Subject:   Re: 11.0 stuck on high network load
Message-ID:  <20160923200143.GG2840@zxy.spb.ru>
In-Reply-To: <e4e0188c-b22b-29af-ed15-b650c3ec4553@gmail.com>
References:  <20160916181839.GC2960@zxy.spb.ru> <20160916183053.GL9397@strugglingcoder.info> <20160916190330.GG2840@zxy.spb.ru> <78cbcdc9-f565-1046-c157-2ddd8fcccc62@freebsd.org> <20160919204328.GN2840@zxy.spb.ru> <8ba75d6e-4f01-895e-0aed-53c6c6692cb9@freebsd.org> <20160920202633.GQ2840@zxy.spb.ru> <f644cd52-4377-aa90-123a-3a2887972bbc@freebsd.org> <20160921195155.GW2840@zxy.spb.ru> <e4e0188c-b22b-29af-ed15-b650c3ec4553@gmail.com>

index | next in thread | previous in thread | raw e-mail

On Wed, Sep 21, 2016 at 11:25:18PM +0200, Julien Charbon wrote:

> 
>  Hi Slawa,
> 
> On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote:
> > On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote:
> >>  You can also use Dtrace and lockstat (especially with the lockstat -s
> >> option):
> >>
> >> https://wiki.freebsd.org/DTrace/One-Liners#Kernel_Locks
> >> https://www.freebsd.org/cgi/man.cgi?query=lockstat&manpath=FreeBSD+11.0-RELEASE
> >>
> >>  But I am less familiar with Dtrace/lockstat tools.
> > 
> > I am still use old kernel and got lockdown again.
> > Try using lockstat (I am save more output), interesting may be next:
> > 
> > R/W writer spin on writer: 190019 events in 1.070 seconds (177571 events/sec)
> > 
> > -------------------------------------------------------------------------------
> > Count indv cuml rcnt     nsec Lock                   Caller                  
> > 140839  74%  74% 0.00    24659 tcpinp                 tcp_tw_2msl_scan+0xc6   
> > 
> >       nsec ------ Time Distribution ------ count     Stack                   
> >       4096 |                               913       tcp_twstart+0xa3        
> >       8192 |@@@@@@@@@@@@                   58191     tcp_do_segment+0x201f   
> >      16384 |@@@@@@                         29594     tcp_input+0xe1c         
> >      32768 |@@@@                           23447     ip_input+0x15f          
> >      65536 |@@@                            16197     
> >     131072 |@                              8674      
> >     262144 |                               3358      
> >     524288 |                               456       
> >    1048576 |                               9         
> > -------------------------------------------------------------------------------
> > Count indv cuml rcnt     nsec Lock                   Caller                  
> > 49180  26% 100% 0.00    15929 tcpinp                 tcp_tw_2msl_scan+0xc6   
> > 
> >       nsec ------ Time Distribution ------ count     Stack                   
> >       4096 |                               157       pfslowtimo+0x54         
> >       8192 |@@@@@@@@@@@@@@@                24796     softclock_call_cc+0x179 
> >      16384 |@@@@@@                         11223     softclock+0x44          
> >      32768 |@@@@                           7426      intr_event_execute_handlers+0x95
> >      65536 |@@                             3918      
> >     131072 |                               1363      
> >     262144 |                               278       
> >     524288 |                               19        
> > -------------------------------------------------------------------------------
> 
>  This is interesting, it seems that you have two call paths competing
> for INP locks here:
> 
>  - pfslowtimo()/tcp_tw_2msl_scan(reuse=0) and
> 
>  - tcp_input()/tcp_twstart()/tcp_tw_2msl_scan(reuse=1)

My current hypothesis:

nginx do write() (or may be close()?) to socket, kernel lock
first inp in V_twq_2msl, happen callout for pfslowtimo() on the same
CPU core and tcp_tw_2msl_scan infinity locked on same inp.

In this case you modification can't help, before next try we need some
like yeld().


help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160923200143.GG2840>