From owner-freebsd-performance@FreeBSD.ORG Sun Feb 12 15:18:34 2012 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9AD201065673; Sun, 12 Feb 2012 15:18:34 +0000 (UTC) (envelope-from julianwissmann@gmail.com) Received: from mail.da0s0a.de (mail.da0s0a.de [178.63.196.71]) by mx1.freebsd.org (Postfix) with ESMTP id 11ED08FC12; Sun, 12 Feb 2012 15:18:33 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.da0s0a.de (Postfix) with ESMTP id 3CEEDBA1C; Sun, 12 Feb 2012 15:59:03 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at da0s0a.de Received: from mail.da0s0a.de ([127.0.0.1]) by localhost (mail.da0s0a.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uT7PlF+CgR0R; Sun, 12 Feb 2012 15:58:53 +0100 (CET) Received: from [192.168.2.102] (dslb-188-103-185-151.pools.arcor-ip.net [188.103.185.151]) by mail.da0s0a.de (Postfix) with ESMTPSA id 8DB89BA18; Sun, 12 Feb 2012 15:58:53 +0100 (CET) Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=us-ascii From: Julian Wissmann In-Reply-To: <060E8621-8114-4DF3-8D8E-6F897DB3AFA4@FreeBSD.org> Date: Sun, 12 Feb 2012 15:58:54 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4F314FC6.3060801@freebsd.org> <4085662F-2AFC-4724-A9CD-538935FBA51A@freebsd.org> <52BDCB1E-AECB-4196-9334-9177D5C0C5AF@cl.cam.ac.uk> <060E8621-8114-4DF3-8D8E-6F897DB3AFA4@FreeBSD.org> To: freebsd-performance@freebsd.org X-Mailer: Apple Mail (2.1251.1) Cc: rwatson@FreeBSD.org Subject: Re: Tor on FreeBSD Performance issues X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Feb 2012 15:18:34 -0000 Hi >=20 > On 11 Feb 2012, at 00:06, Steven Murdoch wrote: >=20 >> On 10 Feb 2012, at 22:22, Robert N. M. Watson wrote: >>> I wonder if we're looking at some sort of different in socket buffer = tuning between Linux and FreeBSD that is leading to better link = utilisation under this workload. Both FreeBSD and Linux auto-tune socket = buffer sizes, but I'm not sure if their policies for enabling/etc = auto-tuning differ. Do we know if Tor fixes socket buffer sizes in such = a way that it might lead to FreeBSD disabling auto-tuning? >>=20 >> If ConstrainedSockets is set to 1 (it defaults to 0), then Tor will = "setsockopt(sock, SOL_SOCKET, SO_SNDBUF" and "setsockopt(sock, = SOL_SOCKET, SO_RCVBUF" to ConstrainedSockSize (defaults 8192). Otherwise = I don't see any fiddling with buffer size. So I'd first confirm that = ConstrainedSockets is set to zero, and perhaps try experimenting with it = on for different values of ConstrainedSockSize. > In FreeBSD, I believe the current policy is that any TCP socket that = doesn't have a socket option specifically set will be auto-tuning. So = it's likely that, as long as ConstrainedSockSize isn't set, auto-tuning = is enabled. This is set to zero in Tor. >=20 >>> I'm a bit surprised by the out-of-order packet count -- is that = typical of a Tor workload, and can we compare similar statistics on = other nodes there? This could also be a symptom of TCP reassembly queue = issues. Lawrence: did we get the fixes in place there to do with the = bounded reassembly queue length, and/or are there any workarounds for = that issue? Is it easy to tell if we're hitting it in practice? >>=20 >> I can't think of any inherent reason for excessive out-of-order = packets, as the host TCP stack is used by all Tor nodes currently. It = could be some network connections from users are bad (we have plenty of = dial-up users). >=20 > I guess what I'm wondering about is relative percentages. Out-of-order = packets can also arise as a result of network stack bugs, and might = explain a lower aggregate bandwidth. The netstat -Q options I saw in the = forwarded e-mail suggest that the scenarios that could lead to this = aren't present, but since it stands out, it would be worth trying to = explain just to convince ourselves it's not a stack bug. As we have two boxes with identical configuration in the same datacenter = here I can give some Linux Output, too: # netstat -s Ip: 1099780169 total packets received 0 forwarded 0 incoming packets discarded 2062308427 incoming packets delivered 2800933295 requests sent out 694 outgoing packets dropped 798042 fragments dropped after timeout 143378847 reassemblies required 45697700 packets reassembled ok 18522117 packet reassembles failed 1070 fragments received ok 761 fragments failed 28174 fragments created Icmp: 92792968 ICMP messages received 18458681 input ICMP message failed. ICMP input histogram: destination unreachable: 73204262 timeout in transit: 6996342 source quenches: 813143 redirects: 9100882 echo requests: 1646656 echo replies: 5 2005869 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 359208 echo request: 5 echo replies: 1646656 IcmpMsg: InType0: 5 InType3: 73204262 InType4: 813143 InType5: 9100882 InType8: 1646656 InType11: 6996342 OutType0: 1646656 OutType3: 359208 OutType8: 5 Tcp: 4134119965 active connections openings 275823710 passive connection openings 2002550589 failed connection attempts 199749970 connection resets received 31931 connections established 1839369825 segments received 3631158795 segments send out 3353305069 segments retransmited 2152248 bad segments received. 237858281 resets sent Udp: 129942286 packets received 203329 packets to unknown port received. 0 packet receive errors 109523321 packets sent UdpLite: TcpExt: 7088 SYN cookies sent 15275 SYN cookies received 3196797 invalid SYN cookies received 1093456 resets received for embryonic SYN_RECV sockets 36073572 packets pruned from receive queue because of socket buffer = overrun 77060 packets pruned from receive queue 232 packets dropped from out-of-order queue because of socket buffer = overrun 362884 ICMP packets dropped because they were out-of-window 85 ICMP packets dropped because socket was locked 673831896 TCP sockets finished time wait in fast timer 48600 time wait sockets recycled by time stamp 2013223394 delayed acks sent 3477567 delayed acks further delayed because of locked socket Quick ack mode was activated 440274027 times 35711291 times the listen queue of a socket overflowed 35711291 SYNs to LISTEN sockets dropped 457 packets directly queued to recvmsg prequeue. 1460 bytes directly in process context from backlog 48211 bytes directly received in process context from prequeue 1494466591 packet headers predicted 33 packets header predicted and directly queued to user 4257229715 acknowledgments not containing data payload received 740819251 predicted acknowledgments 442309 times recovered from packet loss due to fast retransmit 197193098 times recovered from packet loss by selective = acknowledgements 494378 bad SACK blocks received Detected reordering 221053 times using FACK Detected reordering 1053064 times using SACK Detected reordering 72059 times using reno fast retransmit Detected reordering 4265 times using time stamp 336672 congestion windows fully recovered without slow start 356482 congestion windows partially recovered using Hoe heuristic 41059770 congestion windows recovered without slow start by DSACK 54306977 congestion windows recovered without slow start after = partial ack 245685510 TCP data loss events TCPLostRetransmit: 7881258 421631 timeouts after reno fast retransmit 70726251 timeouts after SACK recovery 26797894 timeouts in loss state 349218987 fast retransmits 19632788 forward retransmits 224201891 retransmits in slow start 2441482671 other TCP timeouts 220051 classic Reno fast retransmits failed 22663942 SACK retransmits failed 160105897 packets collapsed in receive queue due to low socket = buffer 568326755 DSACKs sent for old packets 12316261 DSACKs sent for out of order packets 157800118 DSACKs received 1008695 DSACKs for out of order packets received 2043 connections reset due to unexpected SYN 48512275 connections reset due to unexpected data 15085625 connections reset due to early user close 1702109944 connections aborted due to timeout TCPSACKDiscard: 231850 TCPDSACKIgnoredOld: 99417376 TCPDSACKIgnoredNoUndo: 33053947 TCPSpuriousRTOs: 5163955 TCPMD5Unexpected: 8 TCPSackShifted: 290984575 TCPSackMerged: 613203726 TCPSackShiftFallback: 747049207 IpExt: InBcastPkts: 12617896 OutBcastPkts: 1456356 InOctets: -1096131435 OutOctets: -1263483369 InBcastOctets: -2144923256 OutBcastOctets: 187483424 >=20 >>> On the other hand, I think Steven had mentioned that Tor has changed = how it does exit node load distribution to better take into account = realised rather than advertised bandwidth. If that's the case, you might = get larger systemic effects causing feedback: if you offer slightly less = throughput then you get proportionally less traffic. This is something I = can ask Steven about on Monday. >>=20 >> There is active probing of capacity, which then is used to adjust the = weighting factors that clients use. >=20 > So there is a chance that the effect we're seeing has to do with = clients not being directed to the host, perhaps due to larger systemic = issues, or the FreeBSD box responding less well to probing and therefore = being assigned less work by Tor as a whole. Are there any tools for = diagnosing these sorts of interactions in Tor, or fixing elements of the = algorithm to allow experiments with capacity to be done more easily? We = can treat this as a FreeBSD stack problem in isolation, but in as much = as we can control for effects like that, it would be useful. >=20 > There's a non-trivial possibility that we're simply missing a = workaround for known-bad Broadcom hardware, as well, so it would be = worth our taking a glance at the pciconf -lv output describing the card = so we can compare Linux driver workarounds with FreeBSD driver = workarounds, and make sure we have them all. If I recall correctly, that = silicon is not known for its correctness, so failing to disable some = hardware feature could have significant effect. #pciconf -lv bge0@pci0:32:0:0: class=3D0x020000 card=3D0x705d103c = chip=3D0x165b14e4 rev=3D0x10 hdr=3D0x00 vendor =3D 'Broadcom Corporation' device =3D 'NetXtreme BCM5723 Gigabit Ethernet PCIe' class =3D network subclass =3D ethernet bge1@pci0:34:0:0: class=3D0x020000 card=3D0x705d103c = chip=3D0x165b14e4 rev=3D0x10 hdr=3D0x00 vendor =3D 'Broadcom Corporation' device =3D 'NetXtreme BCM5723 Gigabit Ethernet PCIe' class =3D network subclass =3D ethernet >=20 >>> Could someone remind me if Tor is multi-threaded these days, and if = so, how socket I/O is distributed over threads? >>=20 >> I believe that Tor is single-threaded for the purposes of I/O. Some = server operators with fat pipes have had good experiences of running = several Tor instances in parallel on different ports to increase = bandwidth utilisation. >=20 > It would be good to confirm the configuration in this particular case = to make sure we understand it. It would also be good to know if the main = I/O thread in Tor is saturating the core it's running on -- if so, we = might be looking at some poor behaviour relating to, for example, = frequent timestamp checking, which is currently more expensive on = FreeBSD than Linux. We have two Tor processes running. It still only uses multi-threading = for crypto work, but not even for all of that (only Onionskins). On = polling I actually got both Tor Processes to nearly saturate the cores = they were on, but now that I disabled polling and went back to 1000HZ I = don't get there. Currently one process is at 60% WCPU, the other one = being at about 50%. As It's been asked: Yes, it is a FreeBSD 9 Box and no, there is no = net.inet.tcp.inflight.enable.=20 Also libevent is using kqueue and I've tried patching both Tor and = libevent to use CLOCK_MONOTONIC_FAST and CLOCK_REALTIME_FAST, as has = been pointed out by Alexander. If by flow cache you mean net.inet.flowtable, then I believe that the = sysctl won't show up unless I activate IP Forwarding, which I have not = (and I don't have the net.inet.flowtable available). Also some sysctls as requested: kern.ipc.somaxconn=3D16384 kern.ipc.maxsockets=3D204800 kern.maxfiles=3D204800 kern.maxfilesperproc=3D200000 kern.maxvnodes=3D200000 net.inet.tcp.recvbuf_max=3D10485760 net.inet.tcp.recvbuf_inc=3D65535 net.inet.tcp.sendbuf_max=3D10485760 net.inet.tcp.sendbuf_inc=3D65535 net.inet.tcp.sendspace=3D10485760 net.inet.tcp.recvspace=3D10485760 net.inet.tcp.delayed_ack=3D0=20 net.inet.ip.portrange.first=3D1024 net.inet.ip.portrange.last=3D65535 net.inet.ip.rtexpire=3D2 net.inet.ip.rtminexpire=3D2 net.inet.ip.rtmaxcache=3D1024 net.inet.tcp.rfc1323=3D0 net.inet.tcp.maxtcptw=3D200000 net.inet.ip.intr_queue_maxlen=3D4096 net.inet.tcp.ecn.enable=3D1 (net.inet.ip.intr_queue_drops is zero) net.inet.ip.portrange.reservedlow=3D0 net.inet.ip.portrange.reservedhigh=3D0 net.inet.ip.portrange.hifirst=3D1024 security.mac.portacl.enabled=3D1 security.mac.portacl.suser_exempt=3D1 security.mac.portacl.port_high=3D1023 security.mac.portacl.rules=3Duid:80:tcp:80 security.mac.portacl.rules=3Duid:256:tcp:443 Thanks for the replies and all of this information.=20 Julian= From owner-freebsd-performance@FreeBSD.ORG Sat Feb 18 18:00:41 2012 Return-Path: Delivered-To: performance@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A9BBE1065672 for ; Sat, 18 Feb 2012 18:00:41 +0000 (UTC) (envelope-from flo@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8FB658FC13; Sat, 18 Feb 2012 18:00:41 +0000 (UTC) Received: from nibbler-wlan.fritz.box (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q1II0dEx081594; Sat, 18 Feb 2012 18:00:40 GMT (envelope-from flo@FreeBSD.org) Message-ID: <4F3FE747.20300@FreeBSD.org> Date: Sat, 18 Feb 2012 19:00:39 +0100 From: Florian Smeets User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20120202 Thunderbird/11.0 MIME-Version: 1.0 To: "O. Hartmann" References: <4F247975.9050208@FreeBSD.org> <4F25165B.6080805@zedat.fu-berlin.de> In-Reply-To: <4F25165B.6080805@zedat.fu-berlin.de> X-Enigmail-Version: 1.4a1pre Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigDA54D68EED2466F375203EAC" Cc: performance@FreeBSD.org Subject: Re: ULE vs. 4BSD scheduler benchmarks X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Feb 2012 18:00:41 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigDA54D68EED2466F375203EAC Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 29.01.12 10:50, O. Hartmann wrote: >=20 > We got a new workstation, two socket 6-core westmere XEON's, I forgot > the specifications, but they're driven with 2,66 GHz each and have > access to 96GB RAM. Maybe I can also setup some benchmarks, but I need > advice since I'm not a kernel GURU. > The box is prmarily running Linux due to the TESLA/GPGPU stuff we run o= n > it. A colleague of mine developend a software for huge satellite imager= y > correction needed in planetary science, the software is highly scalable= > (OpenMP) and massively using OpenCL, but using OpenCL could be switched= > off. We are not interested in database performance, but more in HPC > stuff and scientific calculations. I guess we could provode also some > benchmark results after a proper setup for the workload. Since this box= > in question is also running a Linux Ubuntu 11.04 server, I would be > interesting having a comparison to that. >=20 What you could do to help is you could give mav's latest ULE patches a try with your workload and could measure stock ULE vs. the patched one. http://people.freebsd.org/~mav/sched.htt40.patch I have tested it on head, it does apply to 9-STABLE, but i haven't tried to compile or run with it, but i think it should work. Florian --------------enigDA54D68EED2466F375203EAC Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAk8/50cACgkQapo8P8lCvwlhUACfVfCJoTFwIU79G+5XvZw3glT8 oKcAn3e26nbH8K0qXjIMnUGR+Lh2/mD6 =9yHh -----END PGP SIGNATURE----- --------------enigDA54D68EED2466F375203EAC--