From owner-freebsd-questions@FreeBSD.ORG Mon Nov 8 08:26:49 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 74051106566B for ; Mon, 8 Nov 2010 08:26:49 +0000 (UTC) (envelope-from fusionfoto@yahoo.com) Received: from nm10.bullet.mail.sp2.yahoo.com (nm10.bullet.mail.sp2.yahoo.com [98.139.91.80]) by mx1.freebsd.org (Postfix) with SMTP id 4C3A88FC19 for ; Mon, 8 Nov 2010 08:26:49 +0000 (UTC) Received: from [98.139.91.70] by nm10.bullet.mail.sp2.yahoo.com with NNFMP; 08 Nov 2010 08:13:54 -0000 Received: from [98.139.91.2] by tm10.bullet.mail.sp2.yahoo.com with NNFMP; 08 Nov 2010 08:13:54 -0000 Received: from [127.0.0.1] by omp1002.mail.sp2.yahoo.com with NNFMP; 08 Nov 2010 08:13:54 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 805706.11171.bm@omp1002.mail.sp2.yahoo.com Received: (qmail 54286 invoked by uid 60001); 8 Nov 2010 08:13:54 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1289204034; bh=YlRNlkU5CAz5pW0FG4gPd5Orcxk1P/GqxC4XBSeMkCU=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type; b=DJfGntuO8kii1UKYWKvPGGUXMb17ZamD0RpBriYCjZsEoadIvyNCxR4e1VV9RCVIkV37r5sZsrXvVVZDW1Eb9aIpCMZK92INpUhJLbuFLT2mFlWIDOdhA0VEduuy07Jb9hS5jLDYn5XNxk8Z61Ntk4X+Ij32bEqhTX13Vm2NKKo= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type; b=dcwsSu0DjjkPcjZ3fgjedkpN9+GhGzryIHLHTbxyuTy5FQ33I3E04Z0YQB6IW7HtaAte98DDj7Ob4YJeP6BqCGATqm2K/cGWKZklWlPhjFZYMSdkL1eJSdezeOPzJn7ptWPlobcboTo3Q/DYb9yNp7nIpqd18wTNNlhxUPhJZLA=; Message-ID: <340170.48770.qm@web113309.mail.gq1.yahoo.com> X-YMail-OSG: _8bCRYsVM1lkd2KlCQjU8rOOuwfrDBg1SZrOz2.VUkhs0p3 QIb_j3oINOdL2OMGTjvyI4hDuZfieat40stLI0Byjc_SdgjYcDSCc2WCZYzc JWxosYb2KI80MPj8K_pJ01g2t.pXvvpD5imP51J3WorXAI2J14aHZ0o.CvHD zP7P8zMF7JBa8nlVZ6DjSVRAKfMt6eYio31tL_5rZiX3PnPC9gtJ3HUlCadh myExe_JBA_LPldUoAaRadBJ2bf3JFOMfLK9MAoVuwvxWrA8V4tgVhMFw5BpP 8bNiMy59nDbN17f.VEJSiY6Dws7lt98.bJXlu Received: from [173.79.229.32] by web113309.mail.gq1.yahoo.com via HTTP; Mon, 08 Nov 2010 00:13:53 PST X-Mailer: YahooMailClassic/11.4.9 YahooMailWebService/0.8.107.284920 Date: Mon, 8 Nov 2010 00:13:53 -0800 (PST) From: DJ To: freebsd-questions@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: zfs performance issues with iscsi (istgt) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Nov 2010 08:26:49 -0000 After scratching my head for a few weeks, I've decided to ask for some help= . First, I've got two machines connected by gigabit ethernet, network perform= ance is not a problem as I am able to substantially saturate the wire when = not using iscsi [say iperf] or ftp. Both systems are 8.1-RELENG. They are b= oth multi-core, 8G of RAM.=20 Symptoms: When doing writes (size relatively independent) from a client to = a server via iSCSI I seem to be=0A hitting a wall between 18-26MB/s of writ= e. This can be repeated continuously whether doing a newfs on a 2TB iscsi v= olume or doing a dd from /dev/zero to the iscsi target. I haven't compared = read performance. What originally put me on to this was watching the newfs = *fly* across the screen, and then hang for several seconds, and then *fly* = again, and=0A then pause.=20 This looked like a write-delay problem, so I tweaked txgwrite values and/or= the synctime values. This showed some improvements (iostat showed somethin= g closer to continuous write performance to the server but there was still = a delay whether the write_limit was 384MB all the way up to 4GB. This tells= me the spindles weren't holding the throughput back. The iostat size was n= ever much beyond 20-26MB/s, peaks were frequently two-three times that, but= then it would be 1MB/s for a few seconds which would bring us back to this= average). CPU and network load were never the limiting factor, nor did the= spindles ever get above 20-30% busy.=20 So I added two USB keys that write at around 30-40MB/s, and mirrored them a= s a ZIL log. iostat verifies they are being used, but not continuously, it = seems that the txgwrite value applies to writing to the ZIL. I also tried t= urning off the ZIL log and saw no particular performance increase (or=0A de= crease). When newfs (which jumps around a lot more than dd) the performance= throughput does not change much at all. Even at 26K-40K pps, interrupt loa= ds and such are not problematic, turning on polling does not change the per= formance appreciably. The "server" is a RAIDZ2 of 15 drives @ 2TB each. So *write* throughput sho= uld be pretty fast sequentially (i.e. the dd case), but it is returning ide= ntically. This server does nothing much but istgt -- tried NCQ values from = 255 down to 32 to no improvement. Even though network performance was not showing a particular limit, I *did*= get from 18MB/s to 26MB/s by tweaking tcp sendbuf* and tcp send* values wa= y beyond reason even though the TCP throughput hadn't been a problem in non= iscsi operations. So whatever i'm doing is not addressing the particular problem. The drives = have plenty of available I/O, but instead of using it, or the RAM in the sy= stem, or the ZIL in the system, it seems=0A largely idle, pegs the system w= ith continuous (but not max speed) writes and halts the network transfers, = and then continues on its way.=20 Even if its a threading issue (i.e. we are single threading) there should b= e some way to make this behave like a normal system considering how much RA= M, SSD, and other resources I'm trying to through at this thing. For exampl= e, after the buffer starts to empty, additional writes from the client shou= ld be accepted and NCQ should help reorder to process them in an efficient = fashion, etc, etc.=20 istgt settings: istgt version 0.3 istgt extra version 20100707 =A0=A0=A0 MaxSessions=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 32 =A0=A0=A0 MaxConnections=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 32 =A0=A0=A0 FirstBurstLength=A0=A0=A0=A0=A0=A0=A0=A0 65536 =A0=A0=A0 MaxBurstLength=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 262144 =A0=A0=A0 MaxRecvDataSegmentLength 262144 Local benchmarks like dd if=3D/dev/zero of=3D/tank/dump bs=3D1M count=3D120= 00 returns like 200MB/s. 12582912000 bytes transferred in 61.140903 secs (2= 05801867 bytes/sec), and show continuous (as expected) writes to the spindl= es. (200MB/s is pretty close to the max I/O speed we can expect given the p= ort the controller is in and RAID overhead, etc with 7200 RPM drives, at 59= 00 RPM the number is about 80MB/s).=20 If this is an istgt problem, is there a way to get reasonable performance o= ut of it? I know I'm not losing my mind here, so if someone has tackled this particul= ar problem (or its sort), please chime in and let me know what tunable I'm = missing. :) Thanks very much, in advance, DJ =0A=0A=0A