Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 8 Nov 2010 00:13:53 -0800 (PST)
From:      DJ <fusionfoto@yahoo.com>
To:        freebsd-questions@freebsd.org
Subject:   zfs performance issues with iscsi (istgt)
Message-ID:  <340170.48770.qm@web113309.mail.gq1.yahoo.com>

next in thread | raw e-mail | index | archive | help

After scratching my head for a few weeks, I've decided to ask for some help=
.

First, I've got two machines connected by gigabit ethernet, network perform=
ance is not a problem as I am able to substantially saturate the wire when =
not using iscsi [say iperf] or ftp. Both systems are 8.1-RELENG. They are b=
oth multi-core, 8G of RAM.=20

Symptoms: When doing writes (size relatively independent) from a client to =
a server via iSCSI I seem to be=0A hitting a wall between 18-26MB/s of writ=
e. This can be repeated continuously whether doing a newfs on a 2TB iscsi v=
olume or doing a dd from /dev/zero to the iscsi target. I haven't compared =
read performance. What originally put me on to this was watching the newfs =
*fly* across the screen, and then hang for several seconds, and then *fly* =
again, and=0A then pause.=20

This looked like a write-delay problem, so I tweaked txgwrite values and/or=
 the synctime values. This showed some improvements (iostat showed somethin=
g closer to continuous write performance to the server but there was still =
a delay whether the write_limit was 384MB all the way up to 4GB. This tells=
 me the spindles weren't holding the throughput back. The iostat size was n=
ever much beyond 20-26MB/s, peaks were frequently two-three times that, but=
 then it would be 1MB/s for a few seconds which would bring us back to this=
 average). CPU and network load were never the limiting factor, nor did the=
 spindles ever get above 20-30% busy.=20

So I added two USB keys that write at around 30-40MB/s, and mirrored them a=
s a ZIL log. iostat verifies they are being used, but not continuously, it =
seems that the txgwrite value applies to writing to the ZIL. I also tried t=
urning off the ZIL log and saw no particular performance increase (or=0A de=
crease). When newfs (which jumps around a lot more than dd) the performance=
 throughput does not change much at all. Even at 26K-40K pps, interrupt loa=
ds and such are not problematic, turning on polling does not change the per=
formance appreciably.

The "server" is a RAIDZ2 of 15 drives @ 2TB each. So *write* throughput sho=
uld be pretty fast sequentially (i.e. the dd case), but it is returning ide=
ntically. This server does nothing much but istgt -- tried NCQ values from =
255 down to 32 to no improvement.

Even though network performance was not showing a particular limit, I *did*=
 get from 18MB/s to 26MB/s by tweaking tcp sendbuf* and tcp send* values wa=
y beyond reason even though the TCP throughput hadn't been a problem in non=
 iscsi operations.

So whatever i'm doing is not addressing the particular problem. The drives =
have plenty of available I/O, but instead of using it, or the RAM in the sy=
stem, or the ZIL in the system, it seems=0A largely idle, pegs the system w=
ith continuous (but not max speed) writes and halts the network transfers, =
and then continues on its way.=20

Even if its a threading issue (i.e. we are single threading) there should b=
e some way to make this behave like a normal system considering how much RA=
M, SSD, and other resources I'm trying to through at this thing. For exampl=
e, after the buffer starts to empty, additional writes from the client shou=
ld be accepted and NCQ should help reorder to process them in an efficient =
fashion, etc, etc.=20

istgt settings:
istgt version 0.3
istgt extra version 20100707

=A0=A0=A0 MaxSessions=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 32
=A0=A0=A0 MaxConnections=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 32
=A0=A0=A0 FirstBurstLength=A0=A0=A0=A0=A0=A0=A0=A0 65536
=A0=A0=A0 MaxBurstLength=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 262144
=A0=A0=A0 MaxRecvDataSegmentLength 262144

Local benchmarks like dd if=3D/dev/zero of=3D/tank/dump bs=3D1M count=3D120=
00 returns like 200MB/s. 12582912000 bytes transferred in 61.140903 secs (2=
05801867 bytes/sec), and show continuous (as expected) writes to the spindl=
es. (200MB/s is pretty close to the max I/O speed we can expect given the p=
ort the controller is in and RAID overhead, etc with 7200 RPM drives, at 59=
00 RPM the number is about 80MB/s).=20

If this is an istgt problem, is there a way to get reasonable performance o=
ut of it?

I know I'm not losing my mind here, so if someone has tackled this particul=
ar problem (or its sort), please chime in and let me know what tunable I'm =
missing. :)

Thanks very much, in advance,

DJ


=0A=0A=0A      



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?340170.48770.qm>