From owner-freebsd-net@FreeBSD.ORG Wed Jul 2 00:30:38 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63489106567D for ; Wed, 2 Jul 2008 00:30:38 +0000 (UTC) (envelope-from david.kwan@isilon.com) Received: from seaxch07.isilon.com (seaxch07.isilon.com [74.85.160.23]) by mx1.freebsd.org (Postfix) with ESMTP id 3EEFD8FC19 for ; Wed, 2 Jul 2008 00:30:38 +0000 (UTC) (envelope-from david.kwan@isilon.com) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Date: Tue, 1 Jul 2008 17:30:35 -0700 Message-ID: In-Reply-To: <486A91B0.6040505@gtcomm.net> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Poor network performance for clients in 100MB toGigabit environment Thread-Index: Acjbt8AtX75FGKThS8Wo8FWT8IAt2AAIdmIA References: <486A91B0.6040505@gtcomm.net> From: "David Kwan" To: Subject: RE: Poor network performance for clients in 100MB toGigabit environment X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jul 2008 00:30:38 -0000 I've attempt many standard and non-standard permutations of the tcp tuning parameters without much successful via sysctl. It feels like FreeBSD is not handling the congestion very well and is beyond tuning sysctl. It's just clients on the 100MB networks has slow/erratic reads; Clients on the Gigabit network are fine and screams, so the original tcp parameters are just fine for them. For the record, these are the sysctl options for the Linux and FreeBSD.=20 Linux: net.ipv4.conf.eth0.force_igmp_version =3D 0 net.ipv4.conf.eth0.disable_policy =3D 0 net.ipv4.conf.eth0.disable_xfrm =3D 0 net.ipv4.conf.eth0.arp_ignore =3D 0 net.ipv4.conf.eth0.arp_announce =3D 0 net.ipv4.conf.eth0.arp_filter =3D 0 net.ipv4.conf.eth0.tag =3D 0 net.ipv4.conf.eth0.log_martians =3D 0 net.ipv4.conf.eth0.bootp_relay =3D 0 net.ipv4.conf.eth0.medium_id =3D 0 net.ipv4.conf.eth0.proxy_arp =3D 0 net.ipv4.conf.eth0.accept_source_route =3D 0 net.ipv4.conf.eth0.send_redirects =3D 1 net.ipv4.conf.eth0.rp_filter =3D 1 net.ipv4.conf.eth0.shared_media =3D 1 net.ipv4.conf.eth0.secure_redirects =3D 1 net.ipv4.conf.eth0.accept_redirects =3D 1 net.ipv4.conf.eth0.mc_forwarding =3D 0 net.ipv4.conf.eth0.forwarding =3D 0 net.ipv4.conf.lo.force_igmp_version =3D 0 net.ipv4.conf.lo.disable_policy =3D 1 net.ipv4.conf.lo.disable_xfrm =3D 1 net.ipv4.conf.lo.arp_ignore =3D 0 net.ipv4.conf.lo.arp_announce =3D 0 net.ipv4.conf.lo.arp_filter =3D 0 net.ipv4.conf.lo.tag =3D 0 net.ipv4.conf.lo.log_martians =3D 0 net.ipv4.conf.lo.bootp_relay =3D 0 net.ipv4.conf.lo.medium_id =3D 0 net.ipv4.conf.lo.proxy_arp =3D 0 net.ipv4.conf.lo.accept_source_route =3D 1 net.ipv4.conf.lo.send_redirects =3D 1 net.ipv4.conf.lo.rp_filter =3D 0 net.ipv4.conf.lo.shared_media =3D 1 net.ipv4.conf.lo.secure_redirects =3D 1 net.ipv4.conf.lo.accept_redirects =3D 1 net.ipv4.conf.lo.mc_forwarding =3D 0 net.ipv4.conf.lo.forwarding =3D 0 net.ipv4.conf.default.force_igmp_version =3D 0 net.ipv4.conf.default.disable_policy =3D 0 net.ipv4.conf.default.disable_xfrm =3D 0 net.ipv4.conf.default.arp_ignore =3D 0 net.ipv4.conf.default.arp_announce =3D 0 net.ipv4.conf.default.arp_filter =3D 0 net.ipv4.conf.default.tag =3D 0 net.ipv4.conf.default.log_martians =3D 0 net.ipv4.conf.default.bootp_relay =3D 0 net.ipv4.conf.default.medium_id =3D 0 net.ipv4.conf.default.proxy_arp =3D 0 net.ipv4.conf.default.accept_source_route =3D 0 net.ipv4.conf.default.send_redirects =3D 1 net.ipv4.conf.default.rp_filter =3D 1 net.ipv4.conf.default.shared_media =3D 1 net.ipv4.conf.default.secure_redirects =3D 1 net.ipv4.conf.default.accept_redirects =3D 1 net.ipv4.conf.default.mc_forwarding =3D 0 net.ipv4.conf.default.forwarding =3D 0 net.ipv4.conf.all.force_igmp_version =3D 0 net.ipv4.conf.all.disable_policy =3D 0 net.ipv4.conf.all.disable_xfrm =3D 0 net.ipv4.conf.all.arp_ignore =3D 0 net.ipv4.conf.all.arp_announce =3D 0 net.ipv4.conf.all.arp_filter =3D 0 net.ipv4.conf.all.tag =3D 0 net.ipv4.conf.all.log_martians =3D 0 net.ipv4.conf.all.bootp_relay =3D 0 net.ipv4.conf.all.medium_id =3D 0 net.ipv4.conf.all.proxy_arp =3D 0 net.ipv4.conf.all.accept_source_route =3D 0 net.ipv4.conf.all.send_redirects =3D 1 net.ipv4.conf.all.rp_filter =3D 0 net.ipv4.conf.all.shared_media =3D 1 net.ipv4.conf.all.secure_redirects =3D 1 net.ipv4.conf.all.accept_redirects =3D 1 net.ipv4.conf.all.mc_forwarding =3D 0 net.ipv4.conf.all.forwarding =3D 0 net.ipv4.neigh.eth0.locktime =3D 99 net.ipv4.neigh.eth0.proxy_delay =3D 79 net.ipv4.neigh.eth0.anycast_delay =3D 99 net.ipv4.neigh.eth0.proxy_qlen =3D 64 net.ipv4.neigh.eth0.unres_qlen =3D 3 net.ipv4.neigh.eth0.gc_stale_time =3D 60 net.ipv4.neigh.eth0.delay_first_probe_time =3D 5 net.ipv4.neigh.eth0.base_reachable_time =3D 30 net.ipv4.neigh.eth0.retrans_time =3D 99 net.ipv4.neigh.eth0.app_solicit =3D 0 net.ipv4.neigh.eth0.ucast_solicit =3D 3 net.ipv4.neigh.eth0.mcast_solicit =3D 3 net.ipv4.neigh.lo.locktime =3D 99 net.ipv4.neigh.lo.proxy_delay =3D 79 net.ipv4.neigh.lo.anycast_delay =3D 99 net.ipv4.neigh.lo.proxy_qlen =3D 64 net.ipv4.neigh.lo.unres_qlen =3D 3 net.ipv4.neigh.lo.gc_stale_time =3D 60 net.ipv4.neigh.lo.delay_first_probe_time =3D 5 net.ipv4.neigh.lo.base_reachable_time =3D 30 net.ipv4.neigh.lo.retrans_time =3D 99 net.ipv4.neigh.lo.app_solicit =3D 0 net.ipv4.neigh.lo.ucast_solicit =3D 3 net.ipv4.neigh.lo.mcast_solicit =3D 3 net.ipv4.neigh.default.gc_thresh3 =3D 1024 net.ipv4.neigh.default.gc_thresh2 =3D 512 net.ipv4.neigh.default.gc_thresh1 =3D 128 net.ipv4.neigh.default.gc_interval =3D 30 net.ipv4.neigh.default.locktime =3D 99 net.ipv4.neigh.default.proxy_delay =3D 79 net.ipv4.neigh.default.anycast_delay =3D 99 net.ipv4.neigh.default.proxy_qlen =3D 64 net.ipv4.neigh.default.unres_qlen =3D 3 net.ipv4.neigh.default.gc_stale_time =3D 60 net.ipv4.neigh.default.delay_first_probe_time =3D 5 net.ipv4.neigh.default.base_reachable_time =3D 30 net.ipv4.neigh.default.retrans_time =3D 99 net.ipv4.neigh.default.app_solicit =3D 0 net.ipv4.neigh.default.ucast_solicit =3D 3 net.ipv4.neigh.default.mcast_solicit =3D 3 net.ipv4.tcp_slow_start_after_idle =3D 1 net.ipv4.tcp_workaround_signed_windows =3D 1 net.ipv4.tcp_bic_beta =3D 819 net.ipv4.tcp_tso_win_divisor =3D 8 net.ipv4.tcp_moderate_rcvbuf =3D 1 net.ipv4.tcp_bic_low_window =3D 14 net.ipv4.tcp_bic_fast_convergence =3D 1 net.ipv4.tcp_bic =3D 1 net.ipv4.tcp_vegas_gamma =3D 2 net.ipv4.tcp_vegas_beta =3D 6 net.ipv4.tcp_vegas_alpha =3D 2 net.ipv4.tcp_vegas_cong_avoid =3D 0 net.ipv4.tcp_westwood =3D 0 net.ipv4.tcp_no_metrics_save =3D 0 net.ipv4.ipfrag_secret_interval =3D 600 net.ipv4.tcp_low_latency =3D 0 net.ipv4.tcp_frto =3D 0 net.ipv4.tcp_tw_reuse =3D 0 net.ipv4.icmp_ratemask =3D 6168 net.ipv4.icmp_ratelimit =3D 1000 net.ipv4.tcp_adv_win_scale =3D 2 net.ipv4.tcp_app_win =3D 31 net.ipv4.tcp_rmem =3D 4096 87380 174760 net.ipv4.tcp_wmem =3D 4096 16384 131072 net.ipv4.tcp_mem =3D 786432 1048576 1572864 net.ipv4.tcp_dsack =3D 1 net.ipv4.tcp_ecn =3D 0 net.ipv4.tcp_reordering =3D 3 net.ipv4.tcp_fack =3D 1 net.ipv4.tcp_orphan_retries =3D 0 net.ipv4.inet_peer_gc_maxtime =3D 120 net.ipv4.inet_peer_gc_mintime =3D 10 net.ipv4.inet_peer_maxttl =3D 600 net.ipv4.inet_peer_minttl =3D 120 net.ipv4.inet_peer_threshold =3D 65664 net.ipv4.igmp_max_msf =3D 10 net.ipv4.igmp_max_memberships =3D 20 net.ipv4.route.secret_interval =3D 600 net.ipv4.route.min_adv_mss =3D 256 net.ipv4.route.min_pmtu =3D 552 net.ipv4.route.mtu_expires =3D 600 net.ipv4.route.gc_elasticity =3D 8 net.ipv4.route.error_burst =3D 5000 net.ipv4.route.error_cost =3D 1000 net.ipv4.route.redirect_silence =3D 20480 net.ipv4.route.redirect_number =3D 9 net.ipv4.route.redirect_load =3D 20 net.ipv4.route.gc_interval =3D 60 net.ipv4.route.gc_timeout =3D 300 net.ipv4.route.gc_min_interval =3D 0 net.ipv4.route.max_size =3D 1048576 net.ipv4.route.gc_thresh =3D 65536 net.ipv4.route.max_delay =3D 10 net.ipv4.route.min_delay =3D 2 net.ipv4.icmp_errors_use_inbound_ifaddr =3D 0 net.ipv4.icmp_ignore_bogus_error_responses =3D 0 net.ipv4.icmp_echo_ignore_broadcasts =3D 0 net.ipv4.icmp_echo_ignore_all =3D 0 net.ipv4.ip_local_port_range =3D 32768 61000 net.ipv4.tcp_max_syn_backlog =3D 1024 net.ipv4.tcp_rfc1337 =3D 0 net.ipv4.tcp_stdurg =3D 0 net.ipv4.tcp_abort_on_overflow =3D 0 net.ipv4.tcp_tw_recycle =3D 0 net.ipv4.tcp_syncookies =3D 0 net.ipv4.tcp_fin_timeout =3D 60 net.ipv4.tcp_retries2 =3D 15 net.ipv4.tcp_retries1 =3D 3 net.ipv4.tcp_keepalive_intvl =3D 75 net.ipv4.tcp_keepalive_probes =3D 9 net.ipv4.tcp_keepalive_time =3D 7200 net.ipv4.ipfrag_time =3D 30 net.ipv4.ip_dynaddr =3D 0 net.ipv4.ipfrag_low_thresh =3D 196608 net.ipv4.ipfrag_high_thresh =3D 262144 net.ipv4.tcp_max_tw_buckets =3D 180000 net.ipv4.tcp_max_orphans =3D 262144 net.ipv4.tcp_synack_retries =3D 5 net.ipv4.tcp_syn_retries =3D 5 net.ipv4.ip_nonlocal_bind =3D 0 net.ipv4.ip_no_pmtu_disc =3D 0 net.ipv4.ip_autoconfig =3D 0 net.ipv4.ip_default_ttl =3D 64 net.ipv4.ip_forward =3D 0 net.ipv4.tcp_retrans_collapse =3D 1 net.ipv4.tcp_sack =3D 1 net.ipv4.tcp_window_scaling =3D 1 net.ipv4.tcp_timestamps =3D 1 FreeBSD: net.inet.ip.portrange.lowfirst: 1023 net.inet.ip.portrange.lowlast: 600 net.inet.ip.portrange.first: 49152 net.inet.ip.portrange.last: 65535 net.inet.ip.portrange.hifirst: 49152 net.inet.ip.portrange.hilast: 65535 net.inet.ip.portrange.reservedhigh: 1023 net.inet.ip.portrange.reservedlow: 0 net.inet.ip.portrange.randomized: 1 net.inet.ip.portrange.randomcps: 10 net.inet.ip.portrange.randomtime: 45 net.inet.ip.forwarding: 1 net.inet.ip.redirect: 1 net.inet.ip.ttl: 64 net.inet.ip.rtexpire: 3600 net.inet.ip.rtminexpire: 10 net.inet.ip.rtmaxcache: 128 net.inet.ip.sourceroute: 0 net.inet.ip.intr_queue_maxlen: 5000 net.inet.ip.intr_queue_drops: 0 net.inet.ip.accept_sourceroute: 0 net.inet.ip.keepfaith: 0 net.inet.ip.subnets_are_local: 0 net.inet.ip.same_prefix_carp_only: 0 net.inet.ip.fastforwarding: 0 net.inet.ip.process_options: 1 net.inet.ip.sendsourcequench: 0 net.inet.ip.random_id: 0 net.inet.ip.check_interface: 0 net.inet.ip.fragpackets: 0 net.inet.ip.maxfragsperpacket: 32 net.inet.ip.maxfragpackets: 1024 net.inet.icmp.maskrepl: 0 net.inet.icmp.icmplim: 1000 net.inet.icmp.maskfake: 0 net.inet.icmp.drop_redirect: 0 net.inet.icmp.log_redirect: 0 net.inet.icmp.icmplim_output: 1 net.inet.icmp.reply_src:=20 net.inet.icmp.reply_from_interface: 0 net.inet.icmp.quotelen: 8 net.inet.icmp.bmcastecho: 0 net.inet.tcp.rfc1323: 1 net.inet.tcp.mssdflt: 512 net.inet.tcp.keepidle: 7200000 net.inet.tcp.keepintvl: 75000 net.inet.tcp.sendspace: 131072 net.inet.tcp.recvspace: 131072 net.inet.tcp.keepinit: 75000 net.inet.tcp.delacktime: 100 net.inet.tcp.hostcache.cachelimit: 15360 net.inet.tcp.hostcache.hashsize: 512 net.inet.tcp.hostcache.bucketlimit: 30 net.inet.tcp.hostcache.count: 4 net.inet.tcp.hostcache.expire: 3600 net.inet.tcp.hostcache.purge: 0 net.inet.tcp.log_in_vain: 0 net.inet.tcp.blackhole: 0 net.inet.tcp.delayed_ack: 1 net.inet.tcp.rfc3042: 1 net.inet.tcp.rfc3390: 1 net.inet.tcp.insecure_rst: 0 net.inet.tcp.reass.maxsegments: 8256 net.inet.tcp.reass.cursegments: 0 net.inet.tcp.reass.maxqlen: 48 net.inet.tcp.reass.overflows: 0 net.inet.tcp.path_mtu_discovery: 1 net.inet.tcp.slowstart_flightsize: 1 net.inet.tcp.local_slowstart_flightsize: 4 net.inet.tcp.newreno: 1 net.inet.tcp.sndrexmitpack: 0 net.inet.tcp.sndrexmitbyte: 0 net.inet.tcp.do_tso: 1 net.inet.tcp.effective_maxseg_limit: 65535 net.inet.tcp.min_tso_factor: 2 net.inet.tcp.sack.enable: 1 net.inet.tcp.sack.maxholes: 128 net.inet.tcp.sack.globalmaxholes: 65536 net.inet.tcp.sack.globalholes: 0 net.inet.tcp.minmss: 216 net.inet.tcp.minmssoverload: 0 net.inet.tcp.tcbhashsize: 512 net.inet.tcp.do_tcpdrain: 1 net.inet.tcp.pcbcount: 199 net.inet.tcp.icmp_may_rst: 1 net.inet.tcp.isn_reseed_interval: 0 net.inet.tcp.inflight.enable: 1 net.inet.tcp.inflight.debug: 0 net.inet.tcp.inflight.rttthresh: 10 net.inet.tcp.inflight.min: 6144 net.inet.tcp.inflight.max: 1073725440 net.inet.tcp.inflight.stab: 20 net.inet.tcp.min_rtt: 3 net.inet.tcp.max_rexmt_time: 6400 net.inet.tcp.rexmt_dupacks: 3 net.inet.tcp.syncookies: 1 net.inet.tcp.syncache.bucketlimit: 30 net.inet.tcp.syncache.cachelimit: 15359 net.inet.tcp.syncache.count: 0 net.inet.tcp.syncache.hashsize: 512 net.inet.tcp.syncache.rexmtlimit: 3 net.inet.tcp.msl: 30000 net.inet.tcp.rexmit_min: 30 net.inet.tcp.rexmit_slop: 200 net.inet.tcp.always_keepalive: 1 net.inet.udp.checksum: 1 net.inet.udp.maxdgram: 9216 net.inet.udp.recvspace: 512000 net.inet.udp.log_in_vain: 0 net.inet.udp.blackhole: 0 net.inet.udp.strict_mcast_mship: 0 net.inet.raw.maxdgram: 8192 net.inet.raw.recvspace: 411648 net.inet.accf.unloadable: 0 David K. -----Original Message----- From: owner-freebsd-net@freebsd.org [mailto:owner-freebsd-net@freebsd.org] On Behalf Of Paul Sent: Tuesday, July 01, 2008 1:21 PM To: David Kwan Cc: freebsd-net@freebsd.org Subject: Re: Poor network performance for clients in 100MB toGigabit environment What options do you have enabled on the linux server? sysctl -a | grep net.ipv4.tcp and on the bsd sysctl -a net.inet.tcp It sounds like a problem with BSD not handing the dropped data or ack=20 packets so what happens is it pushes a burst of data out > 100mbit and the switch drops the packets and then BSD waits=20 too long to recover and doesn't scale the transmission back. TCP is supposed to scale down the transmission speed until=20 packets are not dropped to a point even without ECN. Options such as 'reno' and 'sack' etc. are congestion control algorithms that use congestion windows. David Kwan wrote: > I have a couple of questions regarding the TCP Stack: > > =20 > > I have a situation with clients on a 100MB network connecting to servers > on a Gigabit network where the client read speeds are very slow from the > FreeBSD server and fast from the Linux server; Write speeds from the > clients to both servers are fast. (Clients on the gigabit network work > fine with blazing read and write speeds). The network traces shows > congestion packets for both servers when doing reads from the clients > (dup acks and retransmissions), but the Linux server seem to handle the > congestion better. ECN is not enabled on the network and I don't see any > congestion windowing or clients window changing. The 100MB/1G switch > > is dropping packets. I double checked the network configuration and > also swapped swithports for the servers to use the others to make sure > the switch configuration are the same, and the Linux always does better > than FreeBSD. Assuming that the network configuration is a constant for > all clients and servers (speed, duplex, and etc...), the only variable > is the servers themselves (Linux and FreeBSD). I have tried a couple of > FreeBSD machines with 6.1 and 7.0 and they exhibit the same problem, > with no luck matching the speed and network utilization of Linux (2 > years old). The read speed test I'm referring is doing transferring of > a 100MB file (cifs, nfs, and ftp), and the Linux server does it > consistently in around 10 sec (line speed) with a constant network > utilization chart, while the FreeBSD servers are magnitudes slower with > erratic network utilization chart. I've attempted to tweak some network > sysctl options on the FreeBSD, and the only ones that helped were > disabling TSO and inflight; which leads me to think that the > inter-packet gap was slightly increased to partially relieve congestion > on the switch; not a long term solution. > > =20 > > My questions are:=20 > > 1. Have you heard of this problem before with 100MB clients to Gigabit > servers? > > 2. Are you aware of any Linux fix/patch in the TCP stack to better > handling congestion than FreeBSD? I'm looking to address this issue in > the FreeBSD, but wondering if the Linux stack did something special that > can help with the FreeBSD performance. > > =20 > > David K. > > =20 > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > =20 _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"