From owner-freebsd-bugs@freebsd.org  Mon Jan 29 14:36:09 2018
Return-Path: <owner-freebsd-bugs@freebsd.org>
Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 87E51ECA760
 for <freebsd-bugs@mailman.ysv.freebsd.org>;
 Mon, 29 Jan 2018 14:36:09 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::19:3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "mxrelay.ysv.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 20E4F7648E
 for <freebsd-bugs@FreeBSD.org>; Mon, 29 Jan 2018 14:36:09 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 4BACF1115B
 for <freebsd-bugs@FreeBSD.org>; Mon, 29 Jan 2018 14:36:08 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w0TEa8Ic077981
 for <freebsd-bugs@FreeBSD.org>; Mon, 29 Jan 2018 14:36:08 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
Received: (from www@localhost)
 by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w0TEa8Gp077980
 for freebsd-bugs@FreeBSD.org; Mon, 29 Jan 2018 14:36:08 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
X-Authentication-Warning: kenobi.freebsd.org: www set sender to
 bugzilla-noreply@freebsd.org using -f
From: bugzilla-noreply@freebsd.org
To: freebsd-bugs@FreeBSD.org
Subject: [Bug 225535] Delays in TCP connection over Gigabit Ethernet
 connections; Regression from 6.3-RELEASE
Date: Mon, 29 Jan 2018 14:36:08 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 10.3-RELEASE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Some People
X-Bugzilla-Who: aeder@list.ru
X-Bugzilla-Status: New
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform
 op_sys bug_status bug_severity priority component assigned_to reporter
 attachments.created
Message-ID: <bug-225535-8@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-bugs>,
 <mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs/>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
 <mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Jan 2018 14:36:09 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D225535

            Bug ID: 225535
           Summary: Delays in TCP connection over Gigabit Ethernet
                    connections; Regression from 6.3-RELEASE
           Product: Base System
           Version: 10.3-RELEASE
          Hardware: i386
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: aeder@list.ru

Created attachment 190162
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D190162&action=
=3Dedit
cycle_clock.c

Delays in TCP connection over Gigabit Ethernet connections; Regression from
6.3-RELEASE.

Long description of usage case:

The company I'm working in is developing soft-realtime applications, with
internal cycle like 600 ms (0.6) seconds. The core of applications is worki=
ng
(for functions safety) simulatenously on two computers with different
architecture (A - FreeBSD/Intel, B - Linux/PowerPC).=20
A and B computers connected together via copper Gigabit switch, with no oth=
er
devices connected to the same network.

Normal cycle for application looks like this:

1. Process input from object controllers;
2. Cross-compare input between A and B computers
3. Make evaluations (simulate complex automata).
4. Produce output to send to objects controllers.
5. Cross-compare output between A and B.
6. Send output to object controllers.
7. Sleep until the end-of-cycle.=20
8. Go to step 1

Cross-compare part is done using tcp connection over gigabit ethernet, betw=
een
A and B computers.
If A or B do not able to handle all operations in appropriate time (600 ms =
+/-
150 ms) the whole system halts due to internal time checks.

Yes, may be using UDP packets may be better - but even in this configuratio=
n,
last released version of hardware is working just fine - uptime ~ 1 year per
installation, approximatelly 100 installations worldwide. Moreover, no case=
s of
halts due to internal time checks was found - some other defects, software =
or
hardware, was causing rare halts.

So, the old release use A industrial computer,

CPU: Intel(R) Core(TM)2 Duo CPU     L7400  @ 1.50GHz (1500.12-MHz 686-class
CPU)
  Origin =3D "GenuineIntel"  Id =3D 0x6fb  Stepping =3D 11
=20
Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,=
MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=3D0xe3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
  AMD Features=3D0x20100000<NX,LM>
  AMD Features2=3D0x1<LAHF>
  Cores per package: 2
....
em0: <Intel(R) PRO/1000 Network Connection Version - 6.7.2> port 0xdf00-0xd=
f1f
mem 0xff880000-0xff89ffff,0xff860000-0xff87ffff irq 16 at device 0.0 on pci4
em0: Ethernet address: 00:30:64:09:6f:ee
....

FreeBSD fspa2 6.3-RELEASE FreeBSD 6.3-RELEASE #0: Wed Jan 16 04:45:45 UTC 2=
008=20
   root@dessler.cse.buffalo.edu:/usr/obj/usr/src/sys/SMP  i386

and this configuration works just fine.=20
I have made test application simulating main cycle of the complex system, a=
nd
maximum delay is 24 ms for the 1 million cycles.

Test application (running on two different computers) is working like this:

1. Establish tcp connection (one instance listen, another connect).

2. send() small fixed amount of data from both sides
3. recv() small fixed data.
4. send() 40 K of data from both sides
5. recv() 40 K of data on both sides
6. Perform complex, but useless computation (simulate actual application lo=
ad).
7. Calculate how much to nanosleep() until end-of-cycle, call nanosleep().

go on step 2

------------------
every operation is guarded with clock_gettime() and duration of send(), rec=
v(),
evaluate() and nanosleep() is printed out.

I will attach test application sources to this ticket.

For FreeBSD 6.3-RELEASE, output looks like this - running between two ident=
ical
computers with FreeBSD-6.3 installed (only 2-4 columns is shown):

fspa2# grep times fspa2_fspa2_vpu.txt | awk '{print $3 " " $4 " " $6 " " $8=
 " "
$10}' | sort | uniq -c
1115322 send_sync 0  0 0 0
7425    send_sync 0  0 0 1
73629   send_sync 0  1 0 0
   1    send_sync 0 13 0 0
  66    send_sync 0  2 0 0
   1    send_sync 0 24 0 0
  27    send_sync 0  3 0 0
  17    send_sync 0  4 0 0

As you can see, the maximum delay is 24 milliseconds (0.024 s) and it's
happends only once.

So now the real problem: using FreeBSD 10.3-RELEASE and much more powerfull
hardware (Moxa DA-820 industrial computer):

CPU: Intel(R) Core(TM) i7-3555LE CPU @ 2.50GHz (2494.39-MHz 686-class CPU)
  Origin=3D"GenuineIntel"  Id=3D0x306a9  Family=3D0x6  Model=3D0x3a  Steppi=
ng=3D9
=20
Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,=
MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
=20
Features2=3D0x7fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSS=
E3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSA=
VE,AVX,F16C,RDRAND>
  AMD Features=3D0x28100000<NX,RDTSCP,LM>
  AMD Features2=3D0x1<LAHF>
  Structured Extended Features=3D0x281<FSGSBASE,SMEP,ERMS>
  XSAVE Features=3D0x1<XSAVEOPT>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics

em0: <Intel(R) PRO/1000 Network Connection 7.6.1-k> port 0x6600-0x661f mem
0xc0a00000-0xc0a1ffff,0xc0a27000-0xc0a27fff irq 20 at device 25.0 on pci0
em0: Using an MSI interrupt
em0: Ethernet address: 00:90:e8:69:ea:3c


root@fspa2:~/clock/new_res # uname -a
FreeBSD fspa2 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 03:5=
1:29
UTC 2016     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  i386

and the same test application we get (running between two identical devices,
both Moxa DA-820 with FreeBSD-10.3):

root@fspa2:~/clock/new_res # grep times  direct_1.txt  | awk '{print $3 " "=
 $4
"
 " $6 " " $8 " " $10 ;}' | sort | uniq -c
155042 send_sync 0 0 0 0
122890 send_sync 0 0 0 1
   1 send_sync 0 0 0 100
   1 send_sync 0 0 0 102
   1 send_sync 0 0 0 111
   1 send_sync 0 0 0 12
   1 send_sync 0 0 0 125
   2 send_sync 0 0 0 13
   1 send_sync 0 0 0 130
   1 send_sync 0 0 0 131
   1 send_sync 0 0 0 133
   2 send_sync 0 0 0 136
   1 send_sync 0 0 0 148
   2 send_sync 0 0 0 149
   3 send_sync 0 0 0 150
   1 send_sync 0 0 0 156
   1 send_sync 0 0 0 159
   1 send_sync 0 0 0 16
   1 send_sync 0 0 0 161
   1 send_sync 0 0 0 17
   1 send_sync 0 0 0 176
   1 send_sync 0 0 0 19
  18 send_sync 0 0 0 2
   1 send_sync 0 0 0 229
   1 send_sync 0 0 0 23
   1 send_sync 0 0 0 24
   2 send_sync 0 0 0 25
   1 send_sync 0 0 0 26
   1 send_sync 0 0 0 28
   1 send_sync 0 0 0 282
   1 send_sync 0 0 0 29
  14 send_sync 0 0 0 3
   1 send_sync 0 0 0 30
   1 send_sync 0 0 0 31
   1 send_sync 0 0 0 32
   1 send_sync 0 0 0 37
   1 send_sync 0 0 0 38
  14 send_sync 0 0 0 4
   1 send_sync 0 0 0 40
   1 send_sync 0 0 0 41
   1 send_sync 0 0 0 43
   1 send_sync 0 0 0 45
   2 send_sync 0 0 0 46
   4 send_sync 0 0 0 49
  16 send_sync 0 0 0 5
   2 send_sync 0 0 0 53
   2 send_sync 0 0 0 57
   4 send_sync 0 0 0 58
  14 send_sync 0 0 0 59
  17 send_sync 0 0 0 6
  20 send_sync 0 0 0 60
  16 send_sync 0 0 0 61
   8 send_sync 0 0 0 62
   4 send_sync 0 0 0 63
   1 send_sync 0 0 0 64
   1 send_sync 0 0 0 67
   1 send_sync 0 0 0 68
   9 send_sync 0 0 0 7
   1 send_sync 0 0 0 70
   1 send_sync 0 0 0 72
   1 send_sync 0 0 0 79
   5 send_sync 0 0 0 8
   3 send_sync 0 0 0 80
   1 send_sync 0 0 0 81
   1 send_sync 0 0 0 82
   1 send_sync 0 0 0 84
   1 send_sync 0 0 0 89
   3 send_sync 0 0 0 9
   1 send_sync 0 0 0 90
   1 send_sync 0 0 0 93
   1 send_sync 0 0 0 95
   1 send_sync 0 0 0 97
 147 send_sync 0 1 0 0
   1 send_sync 0 33 0 0

As you can see, with only 300.000 cycles (1.100.000 cycles in old
configuration) we have multiple cases of long and very long delays, includi=
ng
delay for 229 ms.

The only difference from standart OS configuration is=20

sysctl kern.timecounter.alloweddeviation=3D0

because without it, nanosleep() return with random error up to 4%.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
I have tryed everything that I can think of:

1. Tweaking em0 syscontrols - do not help. In polling mode, works much wors=
e.
2. Replace intel em0 devices with Realtec re0 device - slightly better, but
stil produce significant delays.
3. Tweaking kernel syscontrols - disabling new rfc compatibility - do not h=
elp.
4. Replace cables and switch to different model just from the box - do not
help.


If anybody have any ideas how to fix it, please comment it here.

--=20
You are receiving this mail because:
You are the assignee for the bug.=