From owner-freebsd-current@FreeBSD.ORG Fri Nov 19 14:10:32 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 425FF16A4DE for ; Fri, 19 Nov 2004 14:10:32 +0000 (GMT) Received: from mail.gmx.net (pop.gmx.de [213.165.64.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 716CE43D4C for ; Fri, 19 Nov 2004 14:10:30 +0000 (GMT) (envelope-from Emanuel.Strobl@gmx.net) Received: (qmail 30982 invoked by uid 65534); 19 Nov 2004 14:10:29 -0000 Received: from flb.schmalzbauer.de (EHLO cale.flintsbach.schmalzbauer.de) (62.245.232.135) by mail.gmx.net (mp020) with SMTP; 19 Nov 2004 15:10:29 +0100 X-Authenticated: #301138 From: Emanuel Strobl To: freebsd-current@freebsd.org Date: Fri, 19 Nov 2004 15:10:17 +0100 User-Agent: KMail/1.7 References: In-Reply-To: X-OS: FreeBSD X-Birthday: 10/06/72 X-Address: Munich, 80686 X-Tel: +49 89 18947781 X-CelPhone: +49 173 9967781 X-Country: Germany MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart4186532.iQCMD9Znuz"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200411191510.23551.Emanuel.Strobl@gmx.net> cc: freebsd-stable@freebsd.org cc: Robert Watson Subject: Re: serious networking (em) performance (ggate and NFS) problem X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Nov 2004 14:10:32 -0000 --nextPart4186532.iQCMD9Znuz Content-Type: multipart/mixed; boundary="Boundary-01=_L7fnBSoXBK1mVkt" Content-Transfer-Encoding: 7bit Content-Disposition: inline --Boundary-01=_L7fnBSoXBK1mVkt Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Am Freitag, 19. November 2004 13:56 schrieb Robert Watson: > On Fri, 19 Nov 2004, Emanuel Strobl wrote: > > Am Donnerstag, 18. November 2004 13:27 schrieb Robert Watson: > > > On Wed, 17 Nov 2004, Emanuel Strobl wrote: > > > > I really love 5.3 in many ways but here're some unbelievable transf= er [...] > Well, the claim that if_em doesn't benefit from polling is inaccurate in > the general case, but quite accurate in the specific case. In a box with > multiple NIC's, using polling can make quite a big difference, not just by > mitigating interrupt load, but also by helping to prioritize and manage > the load, preventing live lock. As I indicated in my earlier e-mail, I understand, thanks for the explanation > It looks like the netperf TCP test is getting just under 27MB/s, or > 214Mb/s. That does seem on the low side for the PCI bus, but it's also Nut sure if I understand that sentence correctly, does it mean the "slow"=20 400MHz PII is causing this limit? (low side for the PCI bus?) > instructive to look at the netperf UDP_STREAM results, which indicate that > the box believes it is transmitting 417Mb/s but only 67Mb/s are being > received or processed fast enough by netserver on the remote box. This > means you've achieved a send rate to the card of about 54Mb/s. Note that > you can actually do the math on cycles/packet or cycles/byte here -- with > TCP_STREAM, it looks like some combination of recipient CPU and latency > overhead is the limiting factor, with netserver running at 94% busy. Hmm, I can't puzzle a picture out of this.=20 > > Could you try using geom gate to export a malloc-backed md device, and see > what performance you see there? This would eliminate the storage round It's a pleasure: test2:~#15: dd if=3D/dev/zero of=3D/mdgate/testfile bs=3D16k count=3D6000 6000+0 records in 6000+0 records out 98304000 bytes transferred in 5.944915 secs (16535812 bytes/sec) test2:~#17: dd if=3D/mdgate/testfile of=3D/dev/null bs=3D16k 6000+0 records in 6000+0 records out 98304000 bytes transferred in 5.664384 secs (17354755 bytes/sec) This time it's no difference between disk and memory filesystem, but on=20 another machine with a ich2 chipset and a 3ware controller (my current=20 productive system which I try to replace with this project) there was a big= =20 difference. Attached is the corresponding message. Thanks, =2DHarry > trip and guarantee the source is in memory, eliminating some possible > sources of synchronous operation (which would increase latency, reducing > throughput). Looking at CPU consumption here would also be helpful, as it > would allow us to reason about where the CPU is going. > > > I was aware of that and because of lacking a GbE switch anyway I decided > > to use a simple cable ;) > > Yes, this is my favorite configuration :-). > > > > (5) Next, I'd measure CPU consumption on the end box -- in particular, > > > use top -S and systat -vmstat 1 to compare the idle condition of the > > > system and the system under load. > > > > I additionally added these values to the netperf results. > > Thanks for your very complete and careful testing and reporting :-). > > Robert N M Watson FreeBSD Core Team, TrustedBSD Projects > robert@fledge.watson.org Principal Research Scientist, McAfee Resear= ch > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" --Boundary-01=_L7fnBSoXBK1mVkt Content-Type: message/rfc822; name="forwarded message" Content-Transfer-Encoding: 7bit Content-Description: Emanuel Strobl : Re: asymmetric NFS transfer rates Content-Disposition: inline From: Emanuel Strobl To: freebsd-current@freebsd.org Subject: Re: asymmetric NFS transfer rates Date: Mon, 8 Nov 2004 04:29:11 +0100 User-Agent: KMail/1.7 Cc: Doug White , Robert Watson References: <20041102105534.K63929@carver.gumbysoft.com> In-Reply-To: <20041102105534.K63929@carver.gumbysoft.com> X-OS: FreeBSD X-Birthday: 10/06/72 X-Address: Munich, 80686 X-Tel: +49 89 18947781 X-CelPhone: +49 173 9967781 X-Country: Germany MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1243660.JDKRhvq51c"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200411080429.12846.Emanuel.Strobl@gmx.net> X-UID: 3906 X-Length: 4558 --nextPart1243660.JDKRhvq51c Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Am Dienstag, 2. November 2004 19:56 schrieb Doug White: > On Tue, 2 Nov 2004, Robert Watson wrote: > > On Tue, 2 Nov 2004, Emanuel Strobl wrote: > > > It's a IDE Raid controller (3ware 7506-4, a real one) and the file is > > > indeed huge, but not abnormally. I have a harddisk video recorder, so= I > > > have lots of 700MB files. Also if I copy my photo collection from the > > > server it takes 5 Minutes but copying _to_ the server it takes almost > > > 15 Minutes and the average file size is 5 MB. Fast Ethernet isn't > > > really suitable for my needs, but at least the 10MB/s should be > > > reached. I can't imagine I get better speeds when I upgrade to GbE, > > > (which the important boxes are already, just not the switch) because > > > NFS in it's current state isn't able to saturate a 100baseTX line, at > > > least in one direction. That's the real anstonishing thing for me. Why > > > does reading staurate 100BaseTX but writes only a third? > > > > Have you tried using tcpdump/ethereal to see if there's any significant > > packet loss (for good reasons or not) going on? Lots of RPC retransmits > > would certainly explain the lower performance, and if that's not it, it > > would be good to rule out. The traces might also provide some insight > > into the specific I/O operations, letting you see what block sizes are = in > > use, etc. I've found that dumping to a file with tcpdump and reading > > with ethereal is a really good way to get a picture of what's going on > > with NFS: ethereal does a very nice job decoding the RPCs, as well as > > figuring out what packets are related to each other, etc. > > It'd also be nice to know the mount options (nfs blocksizes in > particular). I haven't done intensive wire-dumps yet, but I figured out some oddities. My main problem seems to be the 3ware controller in combination with NFS. I= f I=20 create a malloc backed md0 I can push more than 9MB/s to it with UDP and mo= re=20 that 10MB/s with TCP (both without modifying r/w-size). I can also copy a 100M file from twed0s1d to twed0s1e (so from and to the s= ame=20 RAID5 array which is worst rate) with 15MB/s so the array can't be the=20 bottleneck. Only when I push to the RAID5 array via NFS I only get 4MB/s, no matter if = I=20 use UDP, TCP or nonstandard r/w-sizes. Next thing I found is that if I tune -w to anything higher than the standar= d=20 8192 the average transfer rate of one big file degrades with UDP but=20 increases with TCP (like I would expect). UDP transfer seems to hic-up with -w tuned, transfer rates peak at 8MB/s bu= t=20 the next second they stay at 0-2MB/s (watched with systat -vm 1) but with T= CP=20 everything runs smooth, regardless of the -w value. Now back to my real problem: Can you imagine that NFS and twe are blocking= =20 each other or something like that? Why do I get such really bad transfer=20 rates when both parts are in use but every single part on its own seems to= =20 work fine? Thanks for any help, =2DHarry --nextPart1243660.JDKRhvq51c Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (FreeBSD) iD8DBQBBjugIBylq0S4AzzwRAnX+AJ0TC2LI6GsiX/L3SHfxdQWwzfdvDwCdEMhq Ndcd6c3XokaY1ksXnJ2jRcU= =pPRL -----END PGP SIGNATURE----- --nextPart1243660.JDKRhvq51c-- --Boundary-01=_L7fnBSoXBK1mVkt-- --nextPart4186532.iQCMD9Znuz Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (FreeBSD) iD8DBQBBnf7PBylq0S4AzzwRAoCZAJ4jB/GnTqOM3G/J9bzhOMLY1Y5PmgCcCDAj ebL9TYVVBAJn40IwFs36X1M= =sjxG -----END PGP SIGNATURE----- --nextPart4186532.iQCMD9Znuz--