Date: Fri, 26 Aug 2016 17:49:26 +0300 From: Slawa Olhovchenkov <slw@zxy.spb.ru> To: Bruce Simpson <bms@fastmail.net> Cc: Ryan Stone <rysto32@gmail.com>, "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, Ryan Stone <rstone@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, Adrian Chadd <adrian@freebsd.org> Subject: Re: svn commit: r304436 - in head: . sys/netinet Message-ID: <20160826144926.GE88122@zxy.spb.ru> In-Reply-To: <20160821000400.GY8192@zxy.spb.ru> References: <CAFMmRNx=2v=M8GCBQ_cN4pnuZ4VnyzncwAgsqMUE=ebz7pkp2A@mail.gmail.com> <20160820184506.GV8192@zxy.spb.ru> <CAFMmRNy-e1uzdtz2cb5DAa9kRd%2BkHg%2BmWbf=HNDWVdGGjOPUWA@mail.gmail.com> <eb4c228e-8efe-b519-e85b-87800b3ec7a1@fastmail.net> <0f42c5fb-f930-c6e3-75d6-df97f67c201d@fastmail.net> <20160820204106.GW8192@zxy.spb.ru> <0acba141-4701-d9c2-0ddb-46d1f60ff55b@fastmail.net> <20160820220510.GX8192@zxy.spb.ru> <8ac23bd1-dcb3-7c64-f195-5039f9af0eaf@fastmail.net> <20160821000400.GY8192@zxy.spb.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Aug 21, 2016 at 03:04:00AM +0300, Slawa Olhovchenkov wrote: > On Sun, Aug 21, 2016 at 12:25:46AM +0100, Bruce Simpson wrote: > > > On 20/08/16 23:05, Slawa Olhovchenkov wrote: > > > I am think this substitution is very bad idea (by design). > > > Also, on transmit side this is must be irrelevant on received L2 > > > header (and this in many cases this is will be L2 unicast packet). For > > > other cases packet will be created on host and don't have any received > > > information. > > > > > > > Whilst I agree with your concerns about multipoint, I support the > > motivation behind Ryan's original change: optimize the common case. > > Oh, common case... > I am have pmc profiling for TCP output and see on this SVG picture and > don't find any simple way. > You want to watch too? At time peak network traffic (more then 25K connections, about 20Gbit total traffic) half of cores fully utilised by network stack. This is flamegraph from one core: http://zxy.spb.ru/cpu10.svg This is same, but stack cut of at ixgbe_rxeof for more unified tcp/ip stack view http://zxy.spb.ru/cpu10u.svg Top 3 used lines is: 7036 0xffffffff804bf02d atomic_cmpset_long /usr/obj/usr/src/sys/VSTREAM/./machine/atomic.h:163 static __inline int atomic_cmpset_long(volatile u_long *dst, u_long expect, u_long src) { u_char res; __asm __volatile( " " MPLOCKED " " > " cmpxchgq %3,%1 ; " " sete %0 ; " "# atomic_cmpset_long" : "=q" (res), /* 0 */ "+m" (*dst), /* 1 */ "+a" (expect) /* 2 */ : "r" (src) /* 3 */ : "memory", "cc"); return (res); } 6099 0xffffffff81171963 ?? ??:0 0xffffffff81171940 <ixgbe_rxeof+1168>: mov 0x10(%r15),%rax 0xffffffff81171944 <ixgbe_rxeof+1172>: add $0x8,%rax 0xffffffff81171948 <ixgbe_rxeof+1176>: mov -0x4c(%rbp),%ecx 0xffffffff8117194b <ixgbe_rxeof+1179>: test %cx,%cx 0xffffffff8117194e <ixgbe_rxeof+1182>: mov %rax,0x10(%r15) 0xffffffff81171952 <ixgbe_rxeof+1186>: je 0xffffffff8117198d <ixgbe_rxeof+1245> 0xffffffff81171954 <ixgbe_rxeof+1188>: mov 0x10(%rdi),%rcx 0xffffffff81171958 <ixgbe_rxeof+1192>: mov -0x4c(%rbp),%edx 0xffffffff8117195b <ixgbe_rxeof+1195>: nopl 0x0(%rax,%rax,1) 0xffffffff81171960 <ixgbe_rxeof+1200>: mov (%rcx),%rsi 0xffffffff81171963 <ixgbe_rxeof+1203>: mov %rsi,(%rax) 0xffffffff81171966 <ixgbe_rxeof+1206>: mov 0x8(%rcx),%rsi 0xffffffff8117196a <ixgbe_rxeof+1210>: mov %rsi,0x8(%rax) 0xffffffff8117196e <ixgbe_rxeof+1214>: mov 0x10(%rcx),%rsi 0xffffffff81171972 <ixgbe_rxeof+1218>: mov %rsi,0x10(%rax) 0xffffffff81171976 <ixgbe_rxeof+1222>: mov 0x18(%rcx),%rsi 0xffffffff8117197a <ixgbe_rxeof+1226>: mov %rsi,0x18(%rax) 0xffffffff8117197e <ixgbe_rxeof+1230>: add $0xffffffffffffffe0,%edx 0xffffffff81171981 <ixgbe_rxeof+1233>: add $0x20,%rcx 0xffffffff81171985 <ixgbe_rxeof+1237>: add $0x20,%rax 0xffffffff81171989 <ixgbe_rxeof+1241>: test %edx,%edx 5594 0xffffffff8053395a mb_free_ext /usr/src/sys/kern/uipc_mbuf.c:301 if (*(m->m_ext.ref_cnt) == 1 || I am able collect and process more measure for help to improve FreeBSD network stack. Do you have some idea about this? I am don't see evident and simple points of optimisation :(
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160826144926.GE88122>