From owner-freebsd-performance@FreeBSD.ORG Tue Oct 9 17:53:00 2007 Return-Path: Delivered-To: performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8722516A418 for ; Tue, 9 Oct 2007 17:53:00 +0000 (UTC) (envelope-from hopet@ics.muni.cz) Received: from minas.ics.muni.cz (minas.ics.muni.cz [147.251.4.40]) by mx1.freebsd.org (Postfix) with ESMTP id A0C1A13C480 for ; Tue, 9 Oct 2007 17:52:59 +0000 (UTC) (envelope-from hopet@ics.muni.cz) Received: from KLOBOUCEK (w54-145.fi.muni.cz [147.251.54.145]) (authenticated user=hopet@ICS.MUNI.CZ bits=0) by minas.ics.muni.cz (8.13.8/8.13.8/SuSE Linux 0.8) with ESMTP id l99HbEKo020401 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Tue, 9 Oct 2007 19:37:14 +0200 From: "Petr Holub" To: Date: Tue, 9 Oct 2007 19:38:24 +0200 Message-ID: <036c01c80a9b$3145b640$5317fb93@KLOBOUCEK> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2377.0 Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1807 X-Muni-Spam-TestIP: 147.251.54.145 X-Muni-Envelope-From: hopet@ics.muni.cz X-Muni-Envelope-To: rdivacky@freebsd.org X-Muni-Virus-Test: Clean X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-3.0 (minas.ics.muni.cz [147.251.4.35]); Tue, 09 Oct 2007 19:37:15 +0200 (CEST) X-Mailman-Approved-At: Tue, 09 Oct 2007 17:58:30 +0000 Cc: rdivacky@freebsd.org Subject: Myrinet 10GE performance on 7.0-CURRENT X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Oct 2007 17:53:00 -0000 Dear all, I've performed inital set of experiments with FreeBSD 7.0-CURRENT (built on Oct 8th) with Myrinet 10GE cards. Kernel is based on GENERIC with the following options disabled: #options INVARIANTS #options INVARIANT_SUPPORT #options WITNESS #options WITNESS_SKIPSPIN and SCHED_ULE instead of SCHED_4BSD. Userland is built using production malloc.c (MALLOC_PRODUCTION defined in lib/libc/stdlib/malloc.c) dmesg output is available at the end of the email (basically, 2x dual-core Intel Xeon 5160 @ GHz, identical machines for both sending and receiving running identical systems). The two machines are connected point to point using LR XFPs and about 4m of fiber. The following tunables have been set: net.inet.tcp.sendspace: 8388608 net.inet.tcp.recvspace: 8388608 net.inet.udp.recvspace: 8388608 net.inet.raw.recvspace: 8388608 kern.ipc.maxsockbuf: 10000000 on both sender and receiver. sender: [root@synchro-brno ~]# iperf -c 192.168.1.1 -u -l 8500 -i 1 -t 15 -b 9G -w 2M ------------------------------------------------------------ Client connecting to 192.168.1.1, UDP port 5001 Sending 8500 byte datagrams UDP buffer size: 2.00 MByte ------------------------------------------------------------ [ 3] local 192.168.1.2 port 55844 connected with 192.168.1.1 port 5001 [ 3] 0.0- 1.0 sec 1.07 GBytes 9.21 Gbits/sec [ 3] 1.0- 2.0 sec 1.07 GBytes 9.20 Gbits/sec [ 3] 2.0- 3.0 sec 1.07 GBytes 9.20 Gbits/sec [ 3] 3.0- 4.0 sec 1.07 GBytes 9.20 Gbits/sec [ 3] 4.0- 5.0 sec 1.07 GBytes 9.20 Gbits/sec [ 3] 5.0- 6.0 sec 1.07 GBytes 9.21 Gbits/sec [ 3] 6.0- 7.0 sec 1.07 GBytes 9.20 Gbits/sec [ 3] 7.0- 8.0 sec 1.07 GBytes 9.20 Gbits/sec [ 3] 8.0- 9.0 sec 1.07 GBytes 9.20 Gbits/sec [ 3] 9.0-10.0 sec 1.07 GBytes 9.20 Gbits/sec [ 3] 10.0-11.0 sec 1.07 GBytes 9.20 Gbits/sec [ 3] 11.0-12.0 sec 1.07 GBytes 9.20 Gbits/sec [ 3] 12.0-13.0 sec 1.07 GBytes 9.20 Gbits/sec [ 3] 13.0-14.0 sec 1.07 GBytes 9.21 Gbits/sec [ 3] 0.0-15.0 sec 16.1 GBytes 9.20 Gbits/sec [ 3] Sent 2030369 datagrams [ 3] Server Report: [ 3] 0.0-15.0 sec 16.1 GBytes 9.20 Gbits/sec 0.002 ms 1655/2030369 (0.082%) receiver: [root@synchro-plzen ~]# iperf -s -u -l 8500 -i 1 ------------------------------------------------------------ Server listening on UDP port 5001 Receiving 8500 byte datagrams UDP buffer size: 8.00 MByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.1 port 5001 connected with 192.168.1.2 port 55844 [ 3] 0.0- 1.0 sec 1.07 GBytes 9.21 Gbits/sec 0.004 ms 0/135463 (0%) [ 3] 1.0- 2.0 sec 1.07 GBytes 9.20 Gbits/sec 0.003 ms 0/135343 (0%) [ 3] 2.0- 3.0 sec 1.07 GBytes 9.20 Gbits/sec 0.002 ms 0/135363 (0%) [ 3] 3.0- 4.0 sec 1.07 GBytes 9.21 Gbits/sec 0.002 ms 0/135368 (0%) [ 3] 4.0- 5.0 sec 1.07 GBytes 9.20 Gbits/sec 0.003 ms 0/135337 (0%) [ 3] 5.0- 6.0 sec 1.07 GBytes 9.21 Gbits/sec 0.002 ms 0/135374 (0%) [ 3] 6.0- 7.0 sec 1.07 GBytes 9.20 Gbits/sec 0.002 ms 0/135336 (0%) [ 3] 7.0- 8.0 sec 1.07 GBytes 9.20 Gbits/sec 0.002 ms 0/135355 (0%) [ 3] 8.0- 9.0 sec 1.07 GBytes 9.20 Gbits/sec 0.002 ms 0/135306 (0%) [ 3] 9.0-10.0 sec 1.07 GBytes 9.20 Gbits/sec 0.002 ms 0/135355 (0%) [ 3] 10.0-11.0 sec 1.07 GBytes 9.20 Gbits/sec 0.003 ms 0/135329 (0%) [ 3] 11.0-12.0 sec 1.06 GBytes 9.09 Gbits/sec 0.003 ms 1655/135337 (1.2%) [ 3] 12.0-13.0 sec 1.07 GBytes 9.20 Gbits/sec 0.002 ms 0/135344 (0%) [ 3] 13.0-14.0 sec 1.07 GBytes 9.21 Gbits/sec 0.004 ms 0/135397 (0%) [ 3] 0.0-15.0 sec 16.1 GBytes 9.20 Gbits/sec 0.002 ms 1655/2030369 (0.082%) CPU-wise, iperf takes 200% WCPU, about 36% is system time, 14% user time, 1.5% interrupt and 48.6% idle. Sometimes, I can observe behavior, when after some time, performance drops from >9 Gbps to about 8.7 Gbps, as shown below: [root@synchro-brno ~]# iperf -c 192.168.1.1 -u -l 8500 -i 1 -t 60 -b 9900M -w 2M ------------------------------------------------------------ Client connecting to 192.168.1.1, UDP port 5001 Sending 8500 byte datagrams UDP buffer size: 2.00 MByte ------------------------------------------------------------ [ 3] local 192.168.1.2 port 60761 connected with 192.168.1.1 port 5001 [ 3] 0.0- 1.0 sec 1.12 GBytes 9.64 Gbits/sec [ 3] 1.0- 2.0 sec 1.12 GBytes 9.63 Gbits/sec [ 3] 2.0- 3.0 sec 1.12 GBytes 9.63 Gbits/sec [ 3] 3.0- 4.0 sec 1.12 GBytes 9.63 Gbits/sec [ 3] 4.0- 5.0 sec 1.12 GBytes 9.63 Gbits/sec [ 3] 5.0- 6.0 sec 1.12 GBytes 9.64 Gbits/sec [ 3] 6.0- 7.0 sec 1.12 GBytes 9.64 Gbits/sec [ 3] 7.0- 8.0 sec 1.12 GBytes 9.64 Gbits/sec [ 3] 8.0- 9.0 sec 1.12 GBytes 9.63 Gbits/sec [ 3] 9.0-10.0 sec 1.12 GBytes 9.60 Gbits/sec [ 3] 10.0-11.0 sec 1.01 GBytes 8.71 Gbits/sec [ 3] 11.0-12.0 sec 1.01 GBytes 8.71 Gbits/sec [ 3] 12.0-13.0 sec 1.01 GBytes 8.71 Gbits/sec [ 3] 13.0-14.0 sec 1.01 GBytes 8.71 Gbits/sec [ 3] 14.0-15.0 sec 1.01 GBytes 8.71 Gbits/sec [ 3] 15.0-16.0 sec 1.01 GBytes 8.71 Gbits/sec [ 3] 16.0-17.0 sec 1.01 GBytes 8.71 Gbits/sec [ 3] 17.0-18.0 sec 1.01 GBytes 8.71 Gbits/sec [ 3] 18.0-19.0 sec 1.01 GBytes 8.71 Gbits/sec [ 3] 19.0-20.0 sec 1.01 GBytes 8.71 Gbits/sec [ 3] 20.0-21.0 sec 1.01 GBytes 8.71 Gbits/sec [ 3] 21.0-22.0 sec 1.01 GBytes 8.71 Gbits/sec [ 3] 0.0-22.9 sec 24.3 GBytes 9.11 Gbits/sec [ 3] Sent 3063929 datagrams [ 3] Server Report: [ 3] 0.0-22.9 sec 24.3 GBytes 9.11 Gbits/sec 0.003 ms 0/3063928 (0%) [ 3] 0.0-22.9 sec 1 datagrams received out-of-order Sometimes I can get also very close to wirespeed 9.90 Gbps when systat -ifstat 1 says about 9.97 Gbps on both sender and receiver (without any packet loss!). However, as shown above, this is not stable and it seems to fluctuate in longer time between 8.7, 9.6, and 9.9 Gbps (e.g. it can run on each speed for couple of tens of seconds and then the performance changes either upwards or downwards). As shown above, there are also sometimes some random packet losses. BTW, with WITNESS and INVARIANTS enabled, I can do about 2.8 Gbps and iperf eats about 200% WCPU while about 50% time is spent in system. I will do more testing tomorrow. If you have some ideas for further tuning and experiments, let me know. Petr ==================== Machine info follows =================================== [root@synchro-brno ~]# dmesg Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-CURRENT #1: Tue Oct 9 16:59:21 CEST 2007 root@:/usr/obj/usr/src/sys/GENERIC WARNING: WITNESS option enabled, expect reduced performance. Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU 5160 @ 3.00GHz (3000.12-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x6f6 Stepping = 6 Features=0xbfebfbff Features2=0x4e3bd AMD Features=0x20100800 AMD Features2=0x1 Cores per package: 2 usable memory = 4281200640 (4082 MB) avail memory = 4119842816 (3928 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 6 cpu3 (AP): APIC ID: 7 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: on acpi0 est0: on cpu0 p4tcc0: on cpu0 cpu1: on acpi0 est1: on cpu1 p4tcc1: on cpu1 cpu2: on acpi0 est2: on cpu2 p4tcc2: on cpu2 cpu3: on acpi0 est3: on cpu3 p4tcc3: on cpu3 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: at device 2.0 on pci0 pci1: on pcib1 pcib2: irq 16 at device 0.0 on pci1 pci2: on pcib2 pcib3: irq 16 at device 0.0 on pci2 pci3: on pcib3 pcib4: irq 18 at device 2.0 on pci2 pci4: on pcib4 em0: port 0x2000-0x201f m em 0xd9200000-0xd921ffff irq 18 at device 0.0 on pci4 em0: Ethernet address: 00:30:48:33:86:5e em0: [FILTER] em1: port 0x2020-0x203f m em 0xd9220000-0xd923ffff irq 19 at device 0.1 on pci4 em1: Ethernet address: 00:30:48:33:86:5f em1: [FILTER] pcib5: at device 0.3 on pci1 pci5: on pcib5 pcib6: at device 4.0 on pci0 pci6: on pcib6 pci6: at device 0.0 (no driver attached) pcib7: at device 6.0 on pci0 pci7: on pcib7 pci0: at device 8.0 (no driver attached) uhci0: port 0x1800-0x181f irq 17 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: on uhci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: port 0x1820-0x183f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: on uhci1 usb1: USB revision 1.0 uhub1: on usb1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0x1840-0x185f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] uhci2: [ITHREAD] usb2: on uhci2 usb2: USB revision 1.0 uhub2: on usb2 uhub2: 2 ports with 2 removable, self powered uhci3: port 0x1860-0x187f irq 16 at device 29.3 on pci0 uhci3: [GIANT-LOCKED] uhci3: [ITHREAD] usb3: on uhci3 usb3: USB revision 1.0 uhub3: on usb3 uhub3: 2 ports with 2 removable, self powered ehci0: mem 0xd9600400-0xd96007ff irq 17 at d evice 29.7 on pci0 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb4: EHCI version 1.0 usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3 usb4: on ehci0 usb4: USB revision 2.0 uhub4: on usb4 uhub4: 8 ports with 8 removable, self powered pcib8: at device 30.0 on pci0 pci8: on pcib8 vgapci0: port 0x3000-0x30ff mem 0xd0000000-0xd7ffffff,0 xd9300000-0xd930ffff irq 18 at device 1.0 on pci8 isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177, 0x376,0x1890-0x189f at device 31.2 on pci0 ata0: on atapci0 ata0: [ITHREAD] ata1: on atapci0 ata1: [ITHREAD] pci0: at device 31.3 (no driver attached) acpi_button0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A sio1: [FILTER] fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] orm0: at iomem 0xc0000-0xcafff,0xcb000-0xd2fff on isa0 ppc0: cannot reserve I/O port range sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec ad0: 239372MB at ata0-master SATA150 ad1: 239372MB at ata0-slave SATA150 SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #3 Launched! WARNING: WITNESS option enabled, expect reduced performance. Trying to mount root from ufs:/dev/ad1s1a mxge0: mem 0xd8000000-0xd8ffffff,0xd9000000-0xd90fffff irq 16 at device 0.0 on pci6 mxge0: [ITHREAD] mxge0: Ethernet address: 00:60:dd:47:6b:f3 mxge0: link state changed to UP [root@synchro-brno ~]# kldstat Id Refs Address Size Name 1 8 0xffffffff80100000 b1af40 kernel 2 1 0xffffffffb09d3000 88aa if_mxge.ko 3 1 0xffffffffb09dc000 a472 zlib.ko 4 1 0xffffffffb19eb000 ca52 mxge_ethp_z8e.ko 5 1 0xffffffffb19f9000 c8fd mxge_eth_z8e.ko [root@synchro-brno ~]# ifconfig mxge0 mxge0: flags=8843 metric 0 mtu 9000 options=1bb ether 00:60:dd:47:6b:f3 inet 192.168.1.2 netmask 0xffffff00 broadcast 192.168.1.255 media: Ethernet 10Gbase-LR (autoselect ) status: active