From owner-freebsd-performance@FreeBSD.ORG  Sun Jan 30 06:22:16 2011
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B910E106564A;
	Sun, 30 Jan 2011 06:22:16 +0000 (UTC)
	(envelope-from robbysun@gmail.com)
Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 5B1CD8FC08;
	Sun, 30 Jan 2011 06:22:16 +0000 (UTC)
Received: by gwj21 with SMTP id 21so1718384gwj.13
	for <multiple recipients>; Sat, 29 Jan 2011 22:22:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=JzggQa3bPhTjK2HY+/+h8TCBCu+zx7gnDXJL96Jmpkg=;
	b=ada8OPYb9M5PiDG8QVVg1p+vAFzNK+E1kSGYrV+VPmWzLWpK951WC8yv3dy38fldEO
	MZaxMuWvte79gPvLjaIzaWmq6wDRgEbxJwjiMDg0OmUBxFXzDiUegO+PzTuH1jAtSR29
	Z7TLqR6WntM7fRx+SvcAf5t5sNfriQFQWpVLU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=s+SgSZcqoZDynLChrfbwjnbO88qCgiTYeODoEy5yhI2loRwkmOtMO2NRJQiekBlMo3
	iIiGqYrDFeNTyvOLPVvB/D1BRQUxc0C+ZP/Ix+l1w3b73O1xJi0XQv631vVdjiwWupdR
	g7BcMokBTC4mf2vARp9qO02pjUuWhlc6P0AyI=
MIME-Version: 1.0
Received: by 10.150.144.14 with SMTP id r14mr6424782ybd.28.1296366907051; Sat,
	29 Jan 2011 21:55:07 -0800 (PST)
Received: by 10.151.45.7 with HTTP; Sat, 29 Jan 2011 21:55:06 -0800 (PST)
In-Reply-To: <20110128215215.GJ18170@zxy.spb.ru>
References: <20110128143355.GD18170@zxy.spb.ru>
	<22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com>
	<20110128161035.GF18170@zxy.spb.ru>
	<CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com>
	<4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru>
	<20110129070205.Q7034@besplex.bde.org>
	<20110128215215.GJ18170@zxy.spb.ru>
Date: Sat, 29 Jan 2011 21:55:06 -0800
Message-ID: <AANLkTimP4RybWKY_Qhuv6mi0+VNVASJUL3rxy-eoy6z_@mail.gmail.com>
From: Robby Sun <robbysun@gmail.com>
To: Slawa Olhovchenkov <slw@zxy.spb.ru>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-performance@freebsd.org, Julian Elischer <julian@freebsd.org>,
	Bruce Evans <brde@optusnet.com.au>,
	Stefan Lambrev <stefan.lambrev@moneybookers.com>
Subject: Re: Interrupt performance
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 30 Jan 2011 06:22:16 -0000

I'd like to suggest that you use the same bit-width for 'Dummy' as that for
'count', and initialize it to 0, so as to ensure that it won't overflow.

-Robby

On Fri, Jan 28, 2011 at 1:52 PM, Slawa Olhovchenkov <slw@zxy.spb.ru> wrote:

> On Sat, Jan 29, 2011 at 07:52:11AM +1100, Bruce Evans wrote:
>
> > >> there are of course several possible answers, including:
> > >>
> > >> 1/ Sometimes BSD and Linux report things differently. Linux may or may
> not
> > >> account for the lowest level interrupt tie the same as BSD
> > >
> > > But I see only 20% idle on FreeBSD and 80% idle on Linux.
> >
> > The time must be counted somewhere, so when it is not properly accounted
> > to packet handling, and nothing much else is running, it is accounted to
> > idle.
> >
> > To see how much CPU is actually available, run something else and see how
> > fast it runs.  A simple counting loops works well on UP systems.
>
> ===
> #include <stdio.h>
> #include <sys/time.h>
>
> int Dummy;
>
> int
> main(int argc, char *argv[])
> {
>  long int count,i,dt;
>  struct timeval st,et;
>
>  count = atol(argv[1]);
>
>  gettimeofday(&st, NULL);
>  for(i=count;i;i--) Dummy++;
>  gettimeofday(&et, NULL);
>  dt = (et.tv_sec-st.tv_sec)*1000000 + et.tv_usec-st.tv_usec;
>  printf("Elapsed %d us\n",dt);
> }
> ===
>
> This is ok?
>
> ./loop 2000000000
>
> FreeBSD
> 1 process: Elapsed 7554193 us
> 2 process: Elapsed 14493692 us
> netperf + 1 process: Elapsed 21403644 us
>
> Linux
> 1 process: Elapsed 7524843 us
> 2 process: Elapsed 14995866 us
> netperf + 1 process: Elapsed 14107670 us
>
>

From owner-freebsd-performance@FreeBSD.ORG  Sun Jan 30 12:16:07 2011
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 178231065673;
	Sun, 30 Jan 2011 12:16:07 +0000 (UTC) (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
	by mx1.freebsd.org (Postfix) with ESMTP id 7CB6A8FC08;
	Sun, 30 Jan 2011 12:16:06 +0000 (UTC)
Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD))
	(envelope-from <slw@zxy.spb.ru>)
	id 1PjWCJ-000Pi0-3b; Sun, 30 Jan 2011 15:16:03 +0300
Date: Sun, 30 Jan 2011 15:16:03 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Robby Sun <robbysun@gmail.com>
Message-ID: <20110130121603.GN18170@zxy.spb.ru>
References: <20110128143355.GD18170@zxy.spb.ru>
	<22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com>
	<20110128161035.GF18170@zxy.spb.ru>
	<CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com>
	<4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru>
	<20110129070205.Q7034@besplex.bde.org>
	<20110128215215.GJ18170@zxy.spb.ru>
	<AANLkTimP4RybWKY_Qhuv6mi0+VNVASJUL3rxy-eoy6z_@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <AANLkTimP4RybWKY_Qhuv6mi0+VNVASJUL3rxy-eoy6z_@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
Cc: freebsd-performance@freebsd.org, Julian Elischer <julian@freebsd.org>,
	Bruce Evans <brde@optusnet.com.au>,
	Stefan Lambrev <stefan.lambrev@moneybookers.com>
Subject: Re: Interrupt performance
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 30 Jan 2011 12:16:07 -0000

On Sat, Jan 29, 2011 at 09:55:06PM -0800, Robby Sun wrote:

> I'd like to suggest that you use the same bit-width for 'Dummy' as that for
> 'count', and initialize it to 0, so as to ensure that it won't overflow.

I don't use value of Dummy, overflow don't meaning.

> -Robby
> 
> On Fri, Jan 28, 2011 at 1:52 PM, Slawa Olhovchenkov <slw@zxy.spb.ru> wrote:
> 
> > On Sat, Jan 29, 2011 at 07:52:11AM +1100, Bruce Evans wrote:
> >
> > > >> there are of course several possible answers, including:
> > > >>
> > > >> 1/ Sometimes BSD and Linux report things differently. Linux may or may
> > not
> > > >> account for the lowest level interrupt tie the same as BSD
> > > >
> > > > But I see only 20% idle on FreeBSD and 80% idle on Linux.
> > >
> > > The time must be counted somewhere, so when it is not properly accounted
> > > to packet handling, and nothing much else is running, it is accounted to
> > > idle.
> > >
> > > To see how much CPU is actually available, run something else and see how
> > > fast it runs.  A simple counting loops works well on UP systems.
> >
> > ===
> > #include <stdio.h>
> > #include <sys/time.h>
> >
> > int Dummy;
> >
> > int
> > main(int argc, char *argv[])
> > {
> >  long int count,i,dt;
> >  struct timeval st,et;
> >
> >  count = atol(argv[1]);
> >
> >  gettimeofday(&st, NULL);
> >  for(i=count;i;i--) Dummy++;
> >  gettimeofday(&et, NULL);
> >  dt = (et.tv_sec-st.tv_sec)*1000000 + et.tv_usec-st.tv_usec;
> >  printf("Elapsed %d us\n",dt);
> > }
> > ===
> >
> > This is ok?
> >
> > ./loop 2000000000
> >
> > FreeBSD
> > 1 process: Elapsed 7554193 us
> > 2 process: Elapsed 14493692 us
> > netperf + 1 process: Elapsed 21403644 us
> >
> > Linux
> > 1 process: Elapsed 7524843 us
> > 2 process: Elapsed 14995866 us
> > netperf + 1 process: Elapsed 14107670 us
> >
> >
> _______________________________________________
> freebsd-performance@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org"

From owner-freebsd-performance@FreeBSD.ORG  Sun Jan 30 12:55:37 2011
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 021DF106566B;
	Sun, 30 Jan 2011 12:55:37 +0000 (UTC) (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
	by mx1.freebsd.org (Postfix) with ESMTP id 504808FC0A;
	Sun, 30 Jan 2011 12:55:35 +0000 (UTC)
Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD))
	(envelope-from <slw@zxy.spb.ru>)
	id 1PjWoW-000PwX-OX; Sun, 30 Jan 2011 15:55:32 +0300
Date: Sun, 30 Jan 2011 15:55:32 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20110130125532.GO18170@zxy.spb.ru>
References: <22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com>
	<20110128161035.GF18170@zxy.spb.ru>
	<CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com>
	<4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru>
	<20110129070205.Q7034@besplex.bde.org>
	<20110128215215.GJ18170@zxy.spb.ru>
	<20110129133859.O967@besplex.bde.org>
	<20110129102420.GK18170@zxy.spb.ru>
	<20110129233542.O20731@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110129233542.O20731@besplex.bde.org>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
Cc: freebsd-performance@FreeBSD.org, Julian Elischer <julian@FreeBSD.org>,
	Stefan Lambrev <stefan.lambrev@moneybookers.com>
Subject: Re: Interrupt performance
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 30 Jan 2011 12:55:37 -0000

On Sat, Jan 29, 2011 at 11:54:11PM +1100, Bruce Evans wrote:

> > And I see drammaticaly less number of context switches in linux stats
> > (by dstat).
> 
> FreeBSD uses ithreds for most interrupts, so of course it does many
> more context switches (at least 2 per interrupt).  This doesn't make
> much difference provided there are not too many.  I think the version
> of re that you are using actually uses "fast" interrupts and a task
> queue.  This also seems to be making little difference.  You get a
> relatively lightweight "fast" interrupt following by followed by a
> context switch to and from the task.  IIRC, your statistics showed 
> about twice as many context switches as interrupts, so the task queue
> isn't doing much to reduce the "interrupt overhead" -- it just gives
> context switches to the task instead of to an ithread.

Now I build kernel with polling and profiling.
Network performance with profiling (off) don't change.

 procs      memory      page                   disk   faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr ad0   in   sy   cs us sy id
 1 0 0  98824K   431M     0   0   0   0     0   0   0    0  117 2172  0  1 99
 0 0 0  98824K   431M     0   0   0   0     0   0   0    0  123 2176  0  1 99
 0 0 0  98824K   431M     0   0   0   0     0   0   0    0  115 2175  0  1 99
 0 0 0  98824K   431M     0   0   0   0     0   0   0    0  115 2197  0  1 99
 0 0 0  98824K   431M     0   0   0   0     0   0   0    0  115 2175  0  1 99


Network traffic ON:


 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107548 3206  4 96  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107778 3183  5 95  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107548 3184  1 99  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107155 3182  2 98  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107945 3206  2 98  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107613 3182  7 93  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107432 3180  5 95  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107523 3181  4 96  0

Report from gprof:

granularity: each sample hit covers 16 byte(s) for 0.00% of 75.16 seconds

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 41.4      31.12    31.12        0  100.00%           __mcount [1]
 36.2      58.30    27.18    54341     0.50     0.50  acpi_cpu_c1 [6]
  8.9      65.01     6.71  2521168     0.00     0.00  copyin [17]
  2.8      67.11     2.10   419006     0.01     0.01  in_cksum_skip [23]
  1.0      67.86     0.75 12236575     0.00     0.00  memcpy [29]
  0.8      68.43     0.58  9309659     0.00     0.00  uma_zalloc_arg [25]
  0.6      68.89     0.45  7293157     0.00     0.00  mb_ctor_mbuf [32]
  0.6      69.32     0.43  1008034     0.00     0.00  uma_find_refcnt [34]
  0.5      69.71     0.39  2933058     0.00     0.00  ether_output [24]
  0.5      70.07     0.36  2933058     0.00     0.00  if_transmit [38]
  0.3      70.31     0.25   504035     0.00     0.01  ip_output [18]
  0.3      70.56     0.24  2933257     0.00     0.00  bcmp [48]
  0.3      70.77     0.21   504032     0.00     0.01  m_uiotombuf [19]
  0.3      70.98     0.21  3352048     0.00     0.00  mb_dupcl [51]
  0.3      71.19     0.21  2514036     0.00     0.00  m_copym [28]
  0.3      71.39     0.20   419006     0.00     0.01  ip_fragment [21]
  0.2      71.56     0.17   504017     0.00     0.02  udp_send [16]
  0.2      71.74     0.17  2520731     0.00     0.00  bzero [53]
  0.2      71.91     0.17   504648     0.00     0.03  Xint0x80_syscall [8]
  0.2      72.07     0.16   504017     0.00     0.00  in_pcbconnect_setup [30]
  0.2      72.22     0.15   504017     0.00     0.03  sosend_dgram [15]
  0.2      72.37     0.15 25113400     0.00     0.00  critical_exit <cycle 1> [57]
  0.2      72.51     0.14 25113400     0.00     0.00  critical_enter [59]
  0.2      72.63     0.13   504104     0.00     0.00  mb_ctor_pack [60]
  0.2      72.75     0.11  1512179     0.00     0.00  _rw_runlock [62]
  0.1      72.85     0.10   504017     0.00     0.03  kern_sendit [13]
  0.1      72.95     0.10  9311895     0.00     0.00  uma_zfree_arg [49]
  0.1      73.05     0.10   504114     0.00     0.00  free [54]
  0.1      73.14     0.10  1512161     0.00     0.00  uiomove [20]

granularity: each sample hit covers 16 byte(s) for 0.00% of 75.16 seconds

                                  called/total       parents
index  %time    self descendents  called+self    name           index   
                                  called/total       children

                                                     <spontaneous>
[1]     41.4   31.12        0.00                 __mcount [1]

-----------------------------------------------

                                                     <spontaneous>
[2]     36.2    0.01       27.18                 sched_idletd [2]
                0.00       27.18   54341/54341       cpu_idle [4]

-----------------------------------------------

                0.00       27.18   54341/54341       cpu_idle_acpi [5]
[3]     36.2    0.00       27.18   54341         acpi_cpu_idle [3]
               27.18        0.00   54341/54341       acpi_cpu_c1 [6]
                0.00        0.00  108682/108682      AcpiHwRead [157]
                0.00        0.00   54341/54341       acpi_TimerDelta [653]

-----------------------------------------------

                0.00       27.18   54341/54341       sched_idletd [2]
[4]     36.2    0.00       27.18   54341         cpu_idle [4]
                0.00       27.18   54341/54341       cpu_idle_acpi [5]   
                0.00        0.00   54341/54341       mp_grab_cpu_hlt [654]

-----------------------------------------------

From owner-freebsd-performance@FreeBSD.ORG  Tue Feb  1 11:37:27 2011
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BB9151065673;
	Tue,  1 Feb 2011 11:37:27 +0000 (UTC) (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
	by mx1.freebsd.org (Postfix) with ESMTP id 2DE008FC17;
	Tue,  1 Feb 2011 11:37:27 +0000 (UTC)
Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD))
	(envelope-from <slw@zxy.spb.ru>)
	id 1PkEY0-000GWj-WF; Tue, 01 Feb 2011 14:37:25 +0300
Date: Tue, 1 Feb 2011 14:37:24 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20110201113724.GS18170@zxy.spb.ru>
References: <20110128143355.GD18170@zxy.spb.ru>
	<22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com>
	<20110128161035.GF18170@zxy.spb.ru>
	<CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com>
	<4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru>
	<20110129070205.Q7034@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110129070205.Q7034@besplex.bde.org>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
Cc: freebsd-performance@freebsd.org, Julian Elischer <julian@freebsd.org>,
	Stefan Lambrev <stefan.lambrev@moneybookers.com>
Subject: Re: Interrupt performance
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Feb 2011 11:37:27 -0000

On Sat, Jan 29, 2011 at 07:52:11AM +1100, Bruce Evans wrote:

> >> there are profiling tools that you may decide to run.
> >
> > What tools I can use on amd64?
> >
> > I boot kernel configured with 'config -p'.
> > Most time in spinlock_exit and acpi_cpu_c1.
> 
> Normal profiling works poorly (I see you found my old mail about high
> resolution profiling).  Linux might be misreporting the overhead for
> exactly the same reasons that normal profiling works poorly:
> - the profiling clock frequency of ~1 KHz was adequate for 5 MHz machines
>    in 1998, but is now too slow.  Statistics clocks are even slower (128
>    Hz in FreeBSD, and possibly 100 Hz (?) jiffies in Linux).
> - the statistics clock might be too synchronized with other interrupts.
>    The above spinlock_exit and acpi_cpu_c1 times indicate that the
>    statistics clock almost always fires on exit from another spinlock
>    and/or inside ACPI, for waking up from idle for the latter.  Seeing
>    lots of exits from spinlocks may indicated that spinlocks are being
>    used too much.
> But FreeBSD will report interrupt times and system for non-fast-interrupts
> to an accuracy of about 1 microsecond, since it doesn't use the
> statistics clock much for this.  OTOH, for fast interrupts it is typical
> behaviour in FreeBSD and Linux to not see them at all from the statistics
> clock interrupt, since they mask all interrupts so they mask the
> statistics clock interrupt in particular.  In FreeBSD, lots of time
> apparently spent in spinlock_exit is a typical result of this, or at
> least similar things, since spinlock_enter masks all interrupts (except
> in my version of course).  Linux doesn't have fast interrupts in the
> same way that FreeBSD does, but at least in old versions almost all of
> its interrupts masked other interrupts a lot.

I do some more test and build kernel with KTR.
Now I don't think that inetrrupt overhead on FreeBSD weight: I try
polling and don't see any difference.

I see many reported by netperf send errors. I found this
http://docs.freebsd.org/cgi/mid.cgi?E1Aice9-0002by-00.

After insert into src/nettest_bsd.c usleep(1000) if ENOBUF I see 53%
idle and ./loop 2000000000 "Elapsed 15188006 us" -- this near to linux
(Elapsed 14107670 us).

10% of difference may be by more weight network stack (only 32104
ticks from 126136 in interrupt handler and task switching, and 94032
-- UDP processing in network stack and passing datagram to driver).
May be weight SOCKBUF_LOCK/SOCKBUF_UNLOCK and/or
INP_INFO_RUNLOCK/INP_RUNLOCK.


From owner-freebsd-performance@FreeBSD.ORG  Tue Feb  1 12:07:55 2011
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A65BC106564A;
	Tue,  1 Feb 2011 12:07:55 +0000 (UTC)
	(envelope-from stefan.lambrev@moneybookers.com)
Received: from g1.moneybookers.com (g1.moneybookers.com [217.18.249.148])
	by mx1.freebsd.org (Postfix) with ESMTP id D43A48FC15;
	Tue,  1 Feb 2011 12:07:54 +0000 (UTC)
Received: from g1.moneybookers.com (localhost [127.0.0.1])
	by g1.moneybookers.com (Postfix) with ESMTPS id 42F3C272C73;
	Tue,  1 Feb 2011 13:07:53 +0100 (CET)
Received: from jailbay5-inferno.sf.moneybookers.net
	(jailbay5-inferno.sf.moneybookers.net [10.128.2.69])
	by g1.moneybookers.com (Postfix) with ESMTP id 0135E272C68;
	Tue,  1 Feb 2011 13:07:52 +0100 (CET)
Received: from hater.sf.moneybookers.net (hater.sf.moneybookers.net
	[10.129.23.125])
	by jailbay5-inferno.sf.moneybookers.net (Postfix) with ESMTP id
	D802B3612387; Tue,  1 Feb 2011 13:07:51 +0100 (CET)
Mime-Version: 1.0 (Apple Message framework v1082)
From: Stefan Lambrev <stefan.lambrev@moneybookers.com>
In-Reply-To: <20110201113724.GS18170@zxy.spb.ru>
Date: Tue, 1 Feb 2011 14:07:51 +0200
Message-Id: <8979148D-8F2E-49E3-86EE-41CE6F57CDA4@moneybookers.com>
References: <20110128143355.GD18170@zxy.spb.ru>
	<22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com>
	<20110128161035.GF18170@zxy.spb.ru>
	<CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com>
	<4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru>
	<20110129070205.Q7034@besplex.bde.org>
	<20110201113724.GS18170@zxy.spb.ru>
To: Slawa Olhovchenkov <slw@zxy.spb.ru>
X-Mailer: Apple Mail (2.1082)
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,HTML_MESSAGE,
	T_FRT_STOCK2,UNPARSEABLE_RELAY autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	g1.sf.moneybookers.net
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-performance@freebsd.org, Julian Elischer <julian@freebsd.org>,
	Bruce Evans <brde@optusnet.com.au>
Subject: Re: Interrupt performance
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Feb 2011 12:07:55 -0000

Hi,

On Feb 1, 2011, at 1:37 PM, Slawa Olhovchenkov wrote:

> On Sat, Jan 29, 2011 at 07:52:11AM +1100, Bruce Evans wrote:
>=20
>>>> there are profiling tools that you may decide to run.
>>>=20
>>> What tools I can use on amd64?
>>>=20
>>> I boot kernel configured with 'config -p'.
>>> Most time in spinlock_exit and acpi_cpu_c1.
>>=20
>> Normal profiling works poorly (I see you found my old mail about high
>> resolution profiling).  Linux might be misreporting the overhead for
>> exactly the same reasons that normal profiling works poorly:
>> - the profiling clock frequency of ~1 KHz was adequate for 5 MHz =
machines
>>   in 1998, but is now too slow.  Statistics clocks are even slower =
(128
>>   Hz in FreeBSD, and possibly 100 Hz (?) jiffies in Linux).
>> - the statistics clock might be too synchronized with other =
interrupts.
>>   The above spinlock_exit and acpi_cpu_c1 times indicate that the
>>   statistics clock almost always fires on exit from another spinlock
>>   and/or inside ACPI, for waking up from idle for the latter.  Seeing
>>   lots of exits from spinlocks may indicated that spinlocks are being
>>   used too much.
>> But FreeBSD will report interrupt times and system for =
non-fast-interrupts
>> to an accuracy of about 1 microsecond, since it doesn't use the
>> statistics clock much for this.  OTOH, for fast interrupts it is =
typical
>> behaviour in FreeBSD and Linux to not see them at all from the =
statistics
>> clock interrupt, since they mask all interrupts so they mask the
>> statistics clock interrupt in particular.  In FreeBSD, lots of time
>> apparently spent in spinlock_exit is a typical result of this, or at
>> least similar things, since spinlock_enter masks all interrupts =
(except
>> in my version of course).  Linux doesn't have fast interrupts in the
>> same way that FreeBSD does, but at least in old versions almost all =
of
>> its interrupts masked other interrupts a lot.
>=20
> I do some more test and build kernel with KTR.
> Now I don't think that inetrrupt overhead on FreeBSD weight: I try
> polling and don't see any difference.
>=20
> I see many reported by netperf send errors. I found this
> http://docs.freebsd.org/cgi/mid.cgi?E1Aice9-0002by-00.
>=20
> After insert into src/nettest_bsd.c usleep(1000) if ENOBUF I see 53%
> idle and ./loop 2000000000 "Elapsed 15188006 us" -- this near to linux
> (Elapsed 14107670 us).
>=20
> 10% of difference may be by more weight network stack (only 32104
> ticks from 126136 in interrupt handler and task switching, and 94032
> -- UDP processing in network stack and passing datagram to driver).
> May be weight SOCKBUF_LOCK/SOCKBUF_UNLOCK and/or
> INP_INFO_RUNLOCK/INP_RUNLOCK.

Try to run with the same network buffers on FreeBSD and Linux.
I think, the default values in freebsd are much, much lower.
Also in the past ENOBUF was not handled properly in linux.

http://wiki.freebsd.org/AvoidingLinuxisms - Do not rely on =
Linux-specific socket behaviour. In particular, default socket buffer =
sizes are different (call setsockopt() with SO_SNDBUF and SO_RCVBUF), =
and while Linux's send() blocks when the socket buffer is full, =
FreeBSD's will fail and set ENOBUFS in errno.


--
Best Wishes,
Stefan Lambrev
ICQ# 24134177






From owner-freebsd-performance@FreeBSD.ORG  Tue Feb  1 12:18:05 2011
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C0F39106564A;
	Tue,  1 Feb 2011 12:18:05 +0000 (UTC) (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
	by mx1.freebsd.org (Postfix) with ESMTP id 313A38FC0C;
	Tue,  1 Feb 2011 12:18:04 +0000 (UTC)
Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD))
	(envelope-from <slw@zxy.spb.ru>)
	id 1PkFBL-000HHu-Kg; Tue, 01 Feb 2011 15:18:03 +0300
Date: Tue, 1 Feb 2011 15:18:03 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Stefan Lambrev <stefan.lambrev@moneybookers.com>
Message-ID: <20110201121803.GT18170@zxy.spb.ru>
References: <20110128143355.GD18170@zxy.spb.ru>
	<22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com>
	<20110128161035.GF18170@zxy.spb.ru>
	<CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com>
	<4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru>
	<20110129070205.Q7034@besplex.bde.org>
	<20110201113724.GS18170@zxy.spb.ru>
	<8979148D-8F2E-49E3-86EE-41CE6F57CDA4@moneybookers.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <8979148D-8F2E-49E3-86EE-41CE6F57CDA4@moneybookers.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
Cc: freebsd-performance@freebsd.org, Julian Elischer <julian@freebsd.org>,
	Bruce Evans <brde@optusnet.com.au>
Subject: Re: Interrupt performance
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Feb 2011 12:18:05 -0000

On Tue, Feb 01, 2011 at 02:07:51PM +0200, Stefan Lambrev wrote:

> > I do some more test and build kernel with KTR.
> > Now I don't think that inetrrupt overhead on FreeBSD weight: I try
> > polling and don't see any difference.
> > 
> > I see many reported by netperf send errors. I found this
> > http://docs.freebsd.org/cgi/mid.cgi?E1Aice9-0002by-00.
> > 
> > After insert into src/nettest_bsd.c usleep(1000) if ENOBUF I see 53%
> > idle and ./loop 2000000000 "Elapsed 15188006 us" -- this near to linux
> > (Elapsed 14107670 us).
> > 
> > 10% of difference may be by more weight network stack (only 32104
> > ticks from 126136 in interrupt handler and task switching, and 94032
> > -- UDP processing in network stack and passing datagram to driver).
> > May be weight SOCKBUF_LOCK/SOCKBUF_UNLOCK and/or
> > INP_INFO_RUNLOCK/INP_RUNLOCK.
> 
> Try to run with the same network buffers on FreeBSD and Linux.
> I think, the default values in freebsd are much, much lower.

Set large buffers on FreeBSD -- the first that I try.
Also, netperf use setsockopt() and netperf run on linux with same
options (include -s 128K -S 128K). 

> Also in the past ENOBUF was not handled properly in linux.
> 
> http://wiki.freebsd.org/AvoidingLinuxisms - Do not rely on Linux-specific socket behaviour. In particular, default socket buffer sizes are different (call setsockopt() with SO_SNDBUF and SO_RCVBUF), and while Linux's send() blocks when the socket buffer is full, FreeBSD's will fail and set ENOBUFS in errno.

Yes, about ENOBUFS with udp socket I told.
And this behaviour (block on udp socket send) in Solaris too.
I don't know what behaviour is right.

From owner-freebsd-performance@FreeBSD.ORG  Tue Feb  1 12:23:35 2011
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 571F0106564A;
	Tue,  1 Feb 2011 12:23:35 +0000 (UTC)
	(envelope-from stefan.lambrev@moneybookers.com)
Received: from g1.moneybookers.com (g1.moneybookers.com [217.18.249.148])
	by mx1.freebsd.org (Postfix) with ESMTP id 058C98FC1B;
	Tue,  1 Feb 2011 12:23:34 +0000 (UTC)
Received: from g1.moneybookers.com (localhost [127.0.0.1])
	by g1.moneybookers.com (Postfix) with ESMTPS id C06D3272C51;
	Tue,  1 Feb 2011 13:23:33 +0100 (CET)
Received: from jailbay5-inferno.sf.moneybookers.net
	(jailbay5-inferno.sf.moneybookers.net [10.128.2.69])
	by g1.moneybookers.com (Postfix) with ESMTP id E686A272C8E;
	Tue,  1 Feb 2011 13:23:32 +0100 (CET)
Received: from hater.sf.moneybookers.net (hater.sf.moneybookers.net
	[10.129.23.125])
	by jailbay5-inferno.sf.moneybookers.net (Postfix) with ESMTP id
	C92F83612387; Tue,  1 Feb 2011 13:23:32 +0100 (CET)
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: text/plain; charset=us-ascii
From: Stefan Lambrev <stefan.lambrev@moneybookers.com>
In-Reply-To: <20110201121803.GT18170@zxy.spb.ru>
Date: Tue, 1 Feb 2011 14:23:32 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <CAE4CCBC-F934-45E7-AAE6-BD914C3F5577@moneybookers.com>
References: <20110128143355.GD18170@zxy.spb.ru>
	<22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com>
	<20110128161035.GF18170@zxy.spb.ru>
	<CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com>
	<4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru>
	<20110129070205.Q7034@besplex.bde.org>
	<20110201113724.GS18170@zxy.spb.ru>
	<8979148D-8F2E-49E3-86EE-41CE6F57CDA4@moneybookers.com>
	<20110201121803.GT18170@zxy.spb.ru>
To: Slawa Olhovchenkov <slw@zxy.spb.ru>
X-Mailer: Apple Mail (2.1082)
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_FRT_STOCK2,
	UNPARSEABLE_RELAY autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	g1.sf.moneybookers.net
Cc: freebsd-performance@freebsd.org, Julian Elischer <julian@freebsd.org>,
	Bruce Evans <brde@optusnet.com.au>
Subject: Re: Interrupt performance
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Feb 2011 12:23:35 -0000


On Feb 1, 2011, at 2:18 PM, Slawa Olhovchenkov wrote:

> On Tue, Feb 01, 2011 at 02:07:51PM +0200, Stefan Lambrev wrote:
>=20
>>> I do some more test and build kernel with KTR.
>>> Now I don't think that inetrrupt overhead on FreeBSD weight: I try
>>> polling and don't see any difference.
>>>=20
>>> I see many reported by netperf send errors. I found this
>>> http://docs.freebsd.org/cgi/mid.cgi?E1Aice9-0002by-00.
>>>=20
>>> After insert into src/nettest_bsd.c usleep(1000) if ENOBUF I see 53%
>>> idle and ./loop 2000000000 "Elapsed 15188006 us" -- this near to =
linux
>>> (Elapsed 14107670 us).
>>>=20
>>> 10% of difference may be by more weight network stack (only 32104
>>> ticks from 126136 in interrupt handler and task switching, and 94032
>>> -- UDP processing in network stack and passing datagram to driver).
>>> May be weight SOCKBUF_LOCK/SOCKBUF_UNLOCK and/or
>>> INP_INFO_RUNLOCK/INP_RUNLOCK.
>>=20
>> Try to run with the same network buffers on FreeBSD and Linux.
>> I think, the default values in freebsd are much, much lower.
>=20
> Set large buffers on FreeBSD -- the first that I try.
> Also, netperf use setsockopt() and netperf run on linux with same
> options (include -s 128K -S 128K).=20
>=20
>> Also in the past ENOBUF was not handled properly in linux.
>>=20
>> http://wiki.freebsd.org/AvoidingLinuxisms - Do not rely on =
Linux-specific socket behaviour. In particular, default socket buffer =
sizes are different (call setsockopt() with SO_SNDBUF and SO_RCVBUF), =
and while Linux's send() blocks when the socket buffer is full, =
FreeBSD's will fail and set ENOBUFS in errno.
>=20
> Yes, about ENOBUFS with udp socket I told.
> And this behaviour (block on udp socket send) in Solaris too.
> I don't know what behaviour is right.

Well, according to the man pages in linux and fbsd the bsd behavior is =
right. I was looking into this long time ago with some red hat linux.

--
Best Wishes,
Stefan Lambrev
ICQ# 24134177






From owner-freebsd-performance@FreeBSD.ORG  Tue Feb  1 13:15:04 2011
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0270A1065673;
	Tue,  1 Feb 2011 13:15:03 +0000 (UTC) (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
	by mx1.freebsd.org (Postfix) with ESMTP id 4A7BE8FC13;
	Tue,  1 Feb 2011 13:15:03 +0000 (UTC)
Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD))
	(envelope-from <slw@zxy.spb.ru>)
	id 1PkG4T-000IKw-2i; Tue, 01 Feb 2011 16:15:01 +0300
Date: Tue, 1 Feb 2011 16:15:01 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Stefan Lambrev <stefan.lambrev@moneybookers.com>
Message-ID: <20110201131501.GV18170@zxy.spb.ru>
References: <22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com>
	<20110128161035.GF18170@zxy.spb.ru>
	<CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com>
	<4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru>
	<20110129070205.Q7034@besplex.bde.org>
	<20110201113724.GS18170@zxy.spb.ru>
	<8979148D-8F2E-49E3-86EE-41CE6F57CDA4@moneybookers.com>
	<20110201121803.GT18170@zxy.spb.ru>
	<CAE4CCBC-F934-45E7-AAE6-BD914C3F5577@moneybookers.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAE4CCBC-F934-45E7-AAE6-BD914C3F5577@moneybookers.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
Cc: freebsd-performance@freebsd.org, Julian Elischer <julian@freebsd.org>,
	Bruce Evans <brde@optusnet.com.au>
Subject: Re: Interrupt performance
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Feb 2011 13:15:04 -0000

On Tue, Feb 01, 2011 at 02:23:32PM +0200, Stefan Lambrev wrote:

> >> Also in the past ENOBUF was not handled properly in linux.
> >> 
> >> http://wiki.freebsd.org/AvoidingLinuxisms - Do not rely on Linux-specific socket behaviour. In particular, default socket buffer sizes are different (call setsockopt() with SO_SNDBUF and SO_RCVBUF), and while Linux's send() blocks when the socket buffer is full, FreeBSD's will fail and set ENOBUFS in errno.
> > 
> > Yes, about ENOBUFS with udp socket I told.
> > And this behaviour (block on udp socket send) in Solaris too.
> > I don't know what behaviour is right.
> 
> Well, according to the man pages in linux and fbsd the bsd behavior is right. I was looking into this long time ago with some red hat linux.

I have't any idea for blocking UDP socket, other then benchmarks.

Now I test TCP and see strange result.

# netperf -H 10.200.0.1 -t TCP_STREAM -C -c -l 60 -- -s 128K -S 128K
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.200.0.1 (10.200.0.1) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % U      % U      us/KB   us/KB

131072 131072 131072    60.00       522.08   -1.00    -1.00    0.000   -0.314

Now I run ./loop 2000000000 and see stoping transmit.

 procs      memory      page                   disk   faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr ad0   in   sy   cs us sy id
 2 0 0    107M   435M     0   0   0   0     0   0   0 15939  618 39502  0 77 23
 1 0 0    107M   435M     0   0   0   0     0   0   0 15904  619 39355  0 75 25
 1 0 0    107M   435M     0   0   0   0     0   0   0 16193  615 40085  0 79 21
 1 0 0    107M   435M     0   0   0   0     0   0   0 16028  623 39708  1 74 26
 1 0 0    107M   435M     0   0   0   0     0   0   0 15965  615 39475  0 77 23
 1 0 0    107M   435M     0   0   0   0     0   0   0 16012  636 39666  0 84 16 <-- run ./loop 2000000000
 2 0 0    109M   435M    46   0   0   0     9   0   0 9632  507 24041 48 51  1
 2 0 0    109M   435M     0   0   0   0     0   0   0 6592  319 16419 73 27  0
 2 0 0    109M   435M     0   0   0   0     0   0   0  455  136 1250 100 0  0
 2 0 0    109M   435M     0   0   0   0     0   0   0  420  127 1170 99  1  0
 2 0 0    109M   435M     0   0   0   0     0   0   0  395  127 1127 100 0  0
 2 0 0    109M   435M     0   0   0   0     0   0   0  428  127 1209 100 0  0
 2 0 0    109M   435M     0   0   0   0     0   0   0  537  130 1434 99  1  0
 2 0 0    109M   435M     0   0   0   0     0   0   0  449  136 1255 100 0  0
 1 0 0    107M   435M    14   0   0   0    37   0   0 7634  400 19044 56 30 14  <- end ./loop (Elapsed 8470990 us)
 1 0 0    107M   435M     0   0   0   0     0   0   0 14893  579 37088  0 75 25
 1 0 0    107M   435M     0   0   0   0     0   0   0 16123  615 40163  0 78 22
 1 0 0    107M   435M     0   0   0   0     0   0   0 15220  582 37939  0 72 28

Wtf?

From owner-freebsd-performance@FreeBSD.ORG  Tue Feb  1 14:46:33 2011
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5CA02106566C;
	Tue,  1 Feb 2011 14:46:33 +0000 (UTC) (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
	by mx1.freebsd.org (Postfix) with ESMTP id AA38B8FC1A;
	Tue,  1 Feb 2011 14:46:32 +0000 (UTC)
Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD))
	(envelope-from <slw@zxy.spb.ru>)
	id 1PkHUz-000Jpp-Sg; Tue, 01 Feb 2011 17:46:29 +0300
Date: Tue, 1 Feb 2011 17:46:29 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Stefan Lambrev <stefan.lambrev@moneybookers.com>
Message-ID: <20110201144629.GW18170@zxy.spb.ru>
References: <20110128161035.GF18170@zxy.spb.ru>
	<CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com>
	<4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru>
	<20110129070205.Q7034@besplex.bde.org>
	<20110201113724.GS18170@zxy.spb.ru>
	<8979148D-8F2E-49E3-86EE-41CE6F57CDA4@moneybookers.com>
	<20110201121803.GT18170@zxy.spb.ru>
	<CAE4CCBC-F934-45E7-AAE6-BD914C3F5577@moneybookers.com>
	<20110201131501.GV18170@zxy.spb.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110201131501.GV18170@zxy.spb.ru>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
Cc: freebsd-performance@freebsd.org, Julian Elischer <julian@freebsd.org>,
	Bruce Evans <brde@optusnet.com.au>
Subject: Re: Interrupt performance
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Feb 2011 14:46:33 -0000

On Tue, Feb 01, 2011 at 04:15:01PM +0300, Slawa Olhovchenkov wrote:

> On Tue, Feb 01, 2011 at 02:23:32PM +0200, Stefan Lambrev wrote:
> 
> > >> Also in the past ENOBUF was not handled properly in linux.
> > >> 
> > >> http://wiki.freebsd.org/AvoidingLinuxisms - Do not rely on Linux-specific socket behaviour. In particular, default socket buffer sizes are different (call setsockopt() with SO_SNDBUF and SO_RCVBUF), and while Linux's send() blocks when the socket buffer is full, FreeBSD's will fail and set ENOBUFS in errno.
> > > 
> > > Yes, about ENOBUFS with udp socket I told.
> > > And this behaviour (block on udp socket send) in Solaris too.
> > > I don't know what behaviour is right.
> > 
> > Well, according to the man pages in linux and fbsd the bsd behavior is right. I was looking into this long time ago with some red hat linux.
> 
> I have't any idea for blocking UDP socket, other then benchmarks.
> 
> Now I test TCP and see strange result.
> 
> # netperf -H 10.200.0.1 -t TCP_STREAM -C -c -l 60 -- -s 128K -S 128K
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.200.0.1 (10.200.0.1) port 0 AF_INET
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % U      % U      us/KB   us/KB
> 
> 131072 131072 131072    60.00       522.08   -1.00    -1.00    0.000   -0.314
> 
> Now I run ./loop 2000000000 and see stoping transmit.
> 
>  procs      memory      page                   disk   faults         cpu
>  r b w     avm    fre   flt  re  pi  po    fr  sr ad0   in   sy   cs us sy id
>  2 0 0    107M   435M     0   0   0   0     0   0   0 15939  618 39502  0 77 23
>  1 0 0    107M   435M     0   0   0   0     0   0   0 15904  619 39355  0 75 25
>  1 0 0    107M   435M     0   0   0   0     0   0   0 16193  615 40085  0 79 21
>  1 0 0    107M   435M     0   0   0   0     0   0   0 16028  623 39708  1 74 26
>  1 0 0    107M   435M     0   0   0   0     0   0   0 15965  615 39475  0 77 23
>  1 0 0    107M   435M     0   0   0   0     0   0   0 16012  636 39666  0 84 16 <-- run ./loop 2000000000
>  2 0 0    109M   435M    46   0   0   0     9   0   0 9632  507 24041 48 51  1
>  2 0 0    109M   435M     0   0   0   0     0   0   0 6592  319 16419 73 27  0
>  2 0 0    109M   435M     0   0   0   0     0   0   0  455  136 1250 100 0  0
>  2 0 0    109M   435M     0   0   0   0     0   0   0  420  127 1170 99  1  0
>  2 0 0    109M   435M     0   0   0   0     0   0   0  395  127 1127 100 0  0
>  2 0 0    109M   435M     0   0   0   0     0   0   0  428  127 1209 100 0  0
>  2 0 0    109M   435M     0   0   0   0     0   0   0  537  130 1434 99  1  0
>  2 0 0    109M   435M     0   0   0   0     0   0   0  449  136 1255 100 0  0
>  1 0 0    107M   435M    14   0   0   0    37   0   0 7634  400 19044 56 30 14  <- end ./loop (Elapsed 8470990 us)
>  1 0 0    107M   435M     0   0   0   0     0   0   0 14893  579 37088  0 75 25
>  1 0 0    107M   435M     0   0   0   0     0   0   0 16123  615 40163  0 78 22
>  1 0 0    107M   435M     0   0   0   0     0   0   0 15220  582 37939  0 72 28
> 
> Wtf?

Only with ULE sheduler.
No effect with 4BSD sheduler (./loop Elapsed 30611224 us).

w/o CPU load:
x# netperf -H 10.200.0.1 -t TCP_STREAM -C -c -l 60 -- -s 128K -S 128K
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.200.0.1 (10.200.0.1) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % U      % U      us/KB   us/KB

131072 131072 131072    60.00       520.15   -1.00    -1.00    0.000   -0.315 

with CPU load:
x# netperf -H 10.200.0.1 -t TCP_STREAM -C -c -l 60 -- -s 128K -S 128K
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.200.0.1 (10.200.0.1) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % U      % U      us/KB   us/KB

131072 131072 131072    60.00       519.58   -1.00    -1.00    0.000   -0.315 

w/o CPU load and with TOE enabled on re0:
x# netperf -H 10.200.0.1 -t TCP_STREAM -C -c -l 60 -- -s 128K -S 128K
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.200.0.1 (10.200.0.1) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % U      % U      us/KB   us/KB

131072 131072 131072    60.00       634.03   -1.00    -1.00    0.000   -0.258 

(Maximum on linux 576.27).

From owner-freebsd-performance@FreeBSD.ORG  Tue Feb  1 17:04:42 2011
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BF7F2106566B;
	Tue,  1 Feb 2011 17:04:42 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au
	[211.29.132.187])
	by mx1.freebsd.org (Postfix) with ESMTP id 26FC88FC16;
	Tue,  1 Feb 2011 17:04:41 +0000 (UTC)
Received: from c122-106-165-206.carlnfd1.nsw.optusnet.com.au
	(c122-106-165-206.carlnfd1.nsw.optusnet.com.au [122.106.165.206])
	by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p11H4aAR017659
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 2 Feb 2011 04:04:38 +1100
Date: Wed, 2 Feb 2011 04:04:36 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Slawa Olhovchenkov <slw@zxy.spb.ru>
In-Reply-To: <20110201113724.GS18170@zxy.spb.ru>
Message-ID: <20110202034337.E1550@besplex.bde.org>
References: <20110128143355.GD18170@zxy.spb.ru>
	<22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com>
	<20110128161035.GF18170@zxy.spb.ru>
	<CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com>
	<4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru>
	<20110129070205.Q7034@besplex.bde.org>
	<20110201113724.GS18170@zxy.spb.ru>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-performance@freebsd.org, Julian Elischer <julian@freebsd.org>,
	Bruce Evans <brde@optusnet.com.au>,
	Stefan Lambrev <stefan.lambrev@moneybookers.com>
Subject: Re: Interrupt performance
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Feb 2011 17:04:42 -0000

On Tue, 1 Feb 2011, Slawa Olhovchenkov wrote:

> I do some more test and build kernel with KTR.
> Now I don't think that inetrrupt overhead on FreeBSD weight: I try
> polling and don't see any difference.
>
> I see many reported by netperf send errors. I found this
> http://docs.freebsd.org/cgi/mid.cgi?E1Aice9-0002by-00.
>
> After insert into src/nettest_bsd.c usleep(1000) if ENOBUF I see 53%
> idle and ./loop 2000000000 "Elapsed 15188006 us" -- this near to linux
> (Elapsed 14107670 us).

This partly works around the problem that it is impossible to select()
on the ENOBUFS condition in FreeBSD at least, and thus impossible to
write ttcp or nettest correctly.  The userland sender either has to
sleep for a while it gets an ENOBUFS error, and thus let the hardware
sender go idle in the interval between the condition becoming clear
and the sleep finishing, or it has to retry immediately and thus consume
100% CPU getting ENOBUFS errors until the condition clears.  I use
HZ = 100.  Thus usleep(1000) would actually sleep for an average of 15000
us, and the system would be idle (doing nothing) for about 10 times
as long as with HZ = 1000.  I uses an old version of ttcp which tries
to sleep for 18000 us.  This ensures that the the sleep is too long
even with HZ = 1000 (except I changed 1 line in the old ttcp to either
not sleep at all or to try to sleep for only 1000 us).  Not sleeping
at all uses 100% CPU, but since I mostly use this for testing the
maximum packet rate I don't care much about that unless the CPU being
used by ttcp interferes with kernel and/or hardware activity.

Another reply said that Linux blocks on ENOBUFS instead of returning
it.  That seems better, provided it doesn't block in the O_NOBLOCK case.
This should involve select() working so that you can avoid the block
even in the !O_NONBLOCK case.  Correct versions of ttcp and maybe
nettest can then be written very easily -- at least ttcp would prefer
to just block in sendto().

Bruce

From owner-freebsd-performance@FreeBSD.ORG  Tue Feb  1 17:13:49 2011
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0722C106566C;
	Tue,  1 Feb 2011 17:13:49 +0000 (UTC) (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
	by mx1.freebsd.org (Postfix) with ESMTP id 6C30F8FC14;
	Tue,  1 Feb 2011 17:13:48 +0000 (UTC)
Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD))
	(envelope-from <slw@zxy.spb.ru>)
	id 1PkJnW-000M46-Mk; Tue, 01 Feb 2011 20:13:46 +0300
Date: Tue, 1 Feb 2011 20:13:46 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20110201171346.GX18170@zxy.spb.ru>
References: <20110128143355.GD18170@zxy.spb.ru>
	<22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com>
	<20110128161035.GF18170@zxy.spb.ru>
	<CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com>
	<4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru>
	<20110129070205.Q7034@besplex.bde.org>
	<20110201113724.GS18170@zxy.spb.ru>
	<20110202034337.E1550@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110202034337.E1550@besplex.bde.org>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
Cc: freebsd-performance@freebsd.org, Julian Elischer <julian@freebsd.org>,
	Stefan Lambrev <stefan.lambrev@moneybookers.com>
Subject: Re: Interrupt performance
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Feb 2011 17:13:49 -0000

On Wed, Feb 02, 2011 at 04:04:36AM +1100, Bruce Evans wrote:

> On Tue, 1 Feb 2011, Slawa Olhovchenkov wrote:
> 
> > I do some more test and build kernel with KTR.
> > Now I don't think that inetrrupt overhead on FreeBSD weight: I try
> > polling and don't see any difference.
> >
> > I see many reported by netperf send errors. I found this
> > http://docs.freebsd.org/cgi/mid.cgi?E1Aice9-0002by-00.
> >
> > After insert into src/nettest_bsd.c usleep(1000) if ENOBUF I see 53%
> > idle and ./loop 2000000000 "Elapsed 15188006 us" -- this near to linux
> > (Elapsed 14107670 us).
> 
> This partly works around the problem that it is impossible to select()
> on the ENOBUFS condition in FreeBSD at least, and thus impossible to
> write ttcp or nettest correctly.  The userland sender either has to
> sleep for a while it gets an ENOBUFS error, and thus let the hardware
> sender go idle in the interval between the condition becoming clear
> and the sleep finishing, or it has to retry immediately and thus consume
> 100% CPU getting ENOBUFS errors until the condition clears.  I use
> HZ = 100.  Thus usleep(1000) would actually sleep for an average of 15000
> us, and the system would be idle (doing nothing) for about 10 times
> as long as with HZ = 1000.  I uses an old version of ttcp which tries
> to sleep for 18000 us.  This ensures that the the sleep is too long
> even with HZ = 1000 (except I changed 1 line in the old ttcp to either
> not sleep at all or to try to sleep for only 1000 us).  Not sleeping
> at all uses 100% CPU, but since I mostly use this for testing the
> maximum packet rate I don't care much about that unless the CPU being
> used by ttcp interferes with kernel and/or hardware activity.
> 
> Another reply said that Linux blocks on ENOBUFS instead of returning
> it.  That seems better, provided it doesn't block in the O_NOBLOCK case.
> This should involve select() working so that you can avoid the block
> even in the !O_NONBLOCK case.  Correct versions of ttcp and maybe
> nettest can then be written very easily -- at least ttcp would prefer
> to just block in sendto().

It's not simple to me, modify kernel code for working select().

Now I see another use for blocking behavior: some application,
runnig on the same box, don't handle ENOBUFS. If bad programm exhaust
buffers -- this application can fail.