Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Jan 2007 17:25:14 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        net@freebsd.org
Subject:   slow writes on nfs with bge devices
Message-ID:  <20070121155510.C23922@delplex.bde.org>

next in thread | raw e-mail | index | archive | help
nfs writes much less well with bge NICs than with other NICs (sk, fxp, xl,
even rl).  Sometimes writing a 20K source file from vi seems to take about
2 seconds instead of seeming to be instantaneous (this gets faster as the
system warms up).  Iozone shows the problem more reproducibly.  E.g.:

100Mbps fxp server -> 1Gbps bge 5701 client, udp:
%%%
 	IOZONE: Performance Test of Sequential File I/O  --  V1.16 (10/28/92)
 		By Bill Norcott

 	Operating System: FreeBSD -- using fsync()

IOZONE: auto-test mode

 	MB      reclen  bytes/sec written   bytes/sec read
 	1       512     1516885             291918639
 	1       1024    1158783             491354263
 	1       2048    1573651             715694105
 	1       4096    1223692             917431957
 	1       8192    729513              1097929467
 	2       512     1694809             281196631
 	2       1024    1379228             507917189
 	2       2048    1659521             789608264
 	2       4096    4606056             1064567574
 	2       8192    1142288             1318131028
 	4       512     1242214             298269971
 	4       1024    1853545             492110628
 	4       2048    2120136             742888430
 	4       4096    1896792             1121799065
 	4       8192    850210              1441812403
 	8       512     1563847             281422325
 	8       1024    1480844             492749552
 	8       2048    1658649             850165954
 	8       4096    2105283             1211348180
 	8       8192    2098425             1554875506
 	16      512     1508821             296842294
 	16      1024    1966239             527850530
 	16      2048    2036609             842656736
 	16      4096    1666138             1200594889
 	16      8192    2293378             1620824908 
Completed series of tests
%%%

Here bge barely reaches 10Mbps speeds (~1.2 MB/S) for writing.  Reading
is cached well and fast.  100Mbps xl on the same client with the same
server goes at full 100Mbps speed (11.77 MB/S for all file sizes
including larger ones since the disk is not the limit at 100Mbps).
1Gbps sk on a different client with the same server goes at full 100Nbps
speed.

Switching to tcp gives full 100 Mbps speed.  However, when the bge link
speed is reduced to 100Mbps, udp becomes about 10 times slower than the
above and tcp becomes about as slow as the above (maybe a bit faster, but
far below 11.77 MB/S).

bge is also slow at nfs serving:

1Gbps bge 5701 server -> 1Gbps sk client:
%%%

 	IOZONE: Performance Test of Sequential File I/O  --  V1.16 (10/28/92)
 		By Bill Norcott

 	Operating System: FreeBSD -- using fsync()

IOZONE: auto-test mode

 	MB      reclen  bytes/sec written   bytes/sec read
 	1       512     36255350            242114472
 	1       1024    3051699             413319147
 	1       2048    22406458            632021710
 	1       4096    22447700            851162198
 	1       8192    3522493             1047562648
 	2       512     3270779             48125247
 	2       1024    28992179            46693718
 	2       2048    5956380             753318255
 	2       4096    27616650            1053311658
 	2       8192    5573338             48290208
 	4       512     9004770             47435659
 	4       1024    9576276             45601645
 	4       2048    30348874            85116667
 	4       4096    8635673             86150049
 	4       8192    9356773             47100031
 	8       512     9762446             46424146
 	8       1024    10054027            58344604
 	8       2048    9197430             60253061
 	8       4096    15934077            59476759
 	8       8192    8765470             47647937
 	16      512     5670225             46239891
 	16      1024    9425169             45950990
 	16      2048    9833515             46242945
 	16      4096    14812057            51313693
 	16      8192    9203742             47648722 
Completed series of tests
%%%

Now the available bandwidth is 10 times larger and about 9/10 of it is
still not used, with a high variance.  For larger files, the variance is
lower and the average speed is about 10MB/S.  The disk can only do about
40MB/S and the slowest of the 1Gbps NICS (sk) can only sustain 80MB/S
through udp and about 50MB/S through tcp (it is limited by the 33 MHz
32-bit PCI bus and by being less smart than the bge interface). When the
bge NIC was on the system which is now the server with the fxp NIC, bge
and nfs worked unsurprisingly, just slower than I would have liked.  The
write speed was 20-30MB/S for large files and 30-40MB/S for medium-sized
files, with low variance.  This is the only configuration in which nfs/bge
worked as expected.

The problem is very old and not very hardware dependent.  Similar behaviour
happens when some of the following are changed:

OS -> FreeBSD-~5.2 or FreeBSD-6
hardware -> newer amd64 CPU (Turion X2) with 5705 (iozone output for this
             below) instead of old amd64 CPU with 5701.  The newer amd64
 	    normally runs an i386-SMP current kernel while the old amd64
 	    was running an amd64-UP current kernel in the above tests,
 	    but normally runs ~5.2 amd64-UP and behaves similarly with that.
 	    The combination that seemed to work right was an AthlonXP
 	    for the server with the same 5701 and any kernel.  The only
 	    strangeness with that was that current kernels gave a 5-10%
 	    slower nfs server despite giving a 30-90% larger packet rate
 	    for small packets.

 	IOZONE: Performance Test of Sequential File I/O  --  V1.16 (10/28/92)
 		By Bill Norcott

 	Operating System: FreeBSD -- using fsync()

100Mbps fxp server -> 1Gbps bge 5705 client:
%%%
IOZONE: auto-test mode

 	MB      reclen  bytes/sec written   bytes/sec read
 	1       512     2994400             185462027
 	1       1024    3074084             337817536
 	1       2048    2991691             576792985
 	1       4096    3074759             884740798
 	1       8192    3078019             1176892296
 	2       512     4262096             186709962
 	2       1024    2994468             339893080
 	2       2048    5112176             584846610
 	2       4096    4754187             909815165
 	2       8192    5100574             1212919611
 	4       512     5298715             187129017
 	4       1024    5302620             344445041
 	4       2048    4985597             590579630
 	4       4096    3703618             927711124
 	4       8192    5236177             1240896243
 	8       512     5142274             186899396
 	8       1024    6207933             345564808
 	8       2048    6162773             593088329
 	8       4096    6031445             936751120
 	8       8192    6072523             1224102288
 	16      512     5427113             186797193
 	16      1024    5065901             345544445
 	16      2048    5462338             595487384
 	16      4096    5256552             937013065
 	16      8192    5097101             1226320870 
Completed series of tests
%%%

rl on a system with 1/20 as much CPU is faster than this.

The problem doesn't seem to affect much besides writes on nfs.  The
bge 5701 works very well for most things.  It has a much better bus
interface than the 5705 and works even better after moving it to the
old amd64 system (it can now saturate 1Gbps where on the AthlonXP it
only got 3/4 of the way, while the 5705 only gets 1/4 of the way).
I've been working on minimising network latency and maximising packet
rate, and normally have very low network latency (60-80 uS for ping)
and fairly high packet rates.  The changes for this are not the caause
of the bug :-), since the behaviour is not affected by running kernels
without these changes or by sysctl''ing the changes to be null.  However,
the problem looks like ones caused by large latencies combined with
non-streaming protocols.  To write at just 11.77 MB/S, at least 8000
packets/second must be set from the client to the server.  Working
clients sustain this rate, but broken clients the rate is much lower
and not sustained:

Output from netstat -s 1 on server while writing a ~1GB file via 5701/udp:
%%%
             input        (Total)           output
    packets  errs      bytes    packets  errs      bytes colls
        900     0    1513334        142     0      33532     0
       1509     0    2564836        236     0      57368     0
       1647     0    2295802        259     0      51106     0
       1603     0    1502736        252     0      32926     0
       1055     0     637014        163     0      13938     0
        558     0    1542510         86     0      34340     0
        984     0     989854        155     0      21816     0
        864     0    1320786        135     0      38152     0
        883     0    1558060        165     0      34340     0
       1177     0    3780102        203     0      85850     0
       2087     0     954212        331     0      21210     0
       1187     0    1413568        190     0      31310     0
        650     0    3320604        101     0      75346     0
       1565     0    1706542        246     0      37976     0
       2055     0    2360620        329     0      52318     0
       1554     0    2416996        244     0      54226     0
       1402     0    2579894        220     0      58176     0
       1690     0     774488        267     0      16968     0
       1323     0    3690650        209     0      83830     0
        591     0    4519858         92     0     103110     0
%%%

There is no sign of any packet loss or switch problems.  Forcing
1000baseTX full-duplex has no effect.  Forcing 100baseTX full-duplex
makes the problem more obvious.  The mtu is 1500 throughout since
only bge-5701 and sk support jumbo frames and I want to use udp for
nfs.

5705/udp is better:
%%%
             input        (Total)           output
    packets  errs      bytes    packets  errs      bytes colls
       5209     0    6607758        846     0     151702     0
       4763     0    6684546        773     0     153520     0
       4758     0    6618498        769     0     151298     0
       3582     0    7057568        576     0     162498     0
       4935     0    5115068        800     0     116756     0
       4924     0    6622026        798     0     152802     0
       4095     0    6018462        657     0     137450     0
       4647     0    5270442        751     0     120594     0
       4673     0    5451948        758     0     123624     0
       2340     0    6001986        372     0     138168     0
       3750     0    6150610        604     0     140996     0
%%%

sk/udp works right:
%%%
             input        (Total)           output
    packets  errs      bytes    packets  errs      bytes colls
       8638     0   12384676       1440     0     293062     0
       8636     0   12415646       1439     0     293708     0
       8637     0   12415646       1441     0     293708     0
       8637     0   12415646       1439     0     293708     0
       8637     0   12417160       1440     0     293708     0
       8636     0   12413162       1439     0     293506     0
       8637     0   12414132       1439     0     293708     0
       8636     0   12417160       1440     0     293708     0
       8637     0   12415646       1439     0     293708     0
       8636     0   12417160       1440     0     293708     0
       8637     0   12414676       1439     0     293506     0
%%%

sk is under ~5.2 with latency/throughput/efficiency optimizations
that don't have much effect here.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070121155510.C23922>