From owner-freebsd-fs@FreeBSD.ORG Mon May 2 23:36:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4EF611065670 for ; Mon, 2 May 2011 23:36:06 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta02.westchester.pa.mail.comcast.net (qmta02.westchester.pa.mail.comcast.net [76.96.62.24]) by mx1.freebsd.org (Postfix) with ESMTP id EF1DA8FC16 for ; Mon, 2 May 2011 23:36:05 +0000 (UTC) Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88]) by qmta02.westchester.pa.mail.comcast.net with comcast id en811g00A1uE5Es52nc6Vl; Mon, 02 May 2011 23:36:06 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta16.westchester.pa.mail.comcast.net with comcast id enc31g00U1t3BNj3cnc4SD; Mon, 02 May 2011 23:36:05 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id B6A159B418; Mon, 2 May 2011 16:36:01 -0700 (PDT) Date: Mon, 2 May 2011 16:36:01 -0700 From: Jeremy Chadwick To: Jan Koum Message-ID: <20110502233601.GA29710@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Chris Peiffer Subject: Re: very strange IO issue with FreeBSD 8 and SSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 23:36:06 -0000 On Mon, May 02, 2011 at 03:28:23PM -0700, Jan Koum wrote: > hello, > > we are seeing some strange activity on our FreeBSD systems running > 8.2-PRERELEASE snapshot from early december > > our system has 4 Intel SSD drives (64GB each) connected directly into > motherboard through AHCI: > > ad4: setting UDMA100 > ad4: 61057MB at ata2-master UDMA100 SATA > 3Gb/s > ad4: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue > [...] > ad7: setting UDMA100 > ad7: 61057MB at ata3-slave UDMA100 SATA > 3Gb/s > ad7: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue > > $ df -h > Filesystem Size Used Avail Capacity Mounted on > /dev/ad4s1a 57G 24G 29G 45% / > /dev/ad5a 58G 17G 36G 32% /d2 > /dev/ad7a 58G 17G 36G 32% /d4 > /dev/ad6a 58G 17G 36G 32% /d3 > > so far - so good, right? this is where things get very bizarre: our > application receives data from network and writes to disk. on average the > file size grows to about 7Kbytes while an average file append is 300-400 > bytes. > > netstat shows about 700-800Kbytes of input and our application log shows we > write about 500Kbytes each second. however, when i run iostat i we see > upwards of 10MB a second written to disk (if not more). for example: > > $ iostat -KC -x 1 > extended device statistics cpu > device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id > ad4 9.0 423.3 45.2 4410.1 0 84.3 11 5 0 5 1 89 > ad5 9.0 420.7 44.9 4237.4 0 82.3 11 > ad6 9.0 420.6 45.1 4254.4 0 81.1 11 > ad7 9.0 420.3 44.9 4225.7 0 83.8 11 > extended device statistics cpu > device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id > ad4 14.9 157.9 79.5 1108.4 0 31.7 18 8 0 5 1 86 > ad5 15.9 1480.8 63.6 18886.1 0 36.4 19 > ad6 20.9 154.9 93.4 1032.9 0 7.4 4 > ad7 19.9 216.5 63.6 1450.0 0 9.2 4 > extended device statistics cpu > device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id > ad4 20.9 169.2 115.4 1271.7 0 39.3 13 9 0 4 1 85 > ad5 21.9 1179.1 129.4 11598.1 0 34.6 14 > ad6 14.9 140.3 39.8 925.4 0 9.4 3 > ad7 15.9 213.9 33.8 1610.0 0 7.9 3 > extended device statistics cpu > device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id > ad4 15.9 403.6 53.7 3208.6 0 30.0 10 8 0 6 1 85 > ad5 16.9 709.7 47.7 4691.6 0 20.2 9 > ad6 23.9 321.1 97.4 2262.3 0 12.9 7 > ad7 14.9 421.4 51.7 3437.2 0 13.3 7 > > (apologies in advance for bad formatting) > > so, here are we are, looking at iostat output and trying to figure out how > it can be this bad and where the discrepancy is coming from. a few things > to get out of the way: no, we do not have TRIM enabled yet, we would need to > upgrade OS for that, but we don't think TRIM would make such a big > different. also we know that we can newfs with -b 512 -f 4096 but again, we > also dont think that it would account for such a large IO discrepancy. > > any thoughts to what this could be? has anybody seen anything similar > before? 10MB of metadata for 500K worth of disk writes? that can't be.... > right? I would recommend trying ahci.ko instead of ataahci.ko. Your device names will change (ad4 --> ada0, ad5 --> ada1, etc.). Just add ahci_load="yes" to /boot/loader.conf and reboot into single-user, fix /etc/fstab and related configuration files, and that's all you should have to do. We use Intel SSDs (X25-M 80GB) in our servers, also backed by UFS2 with softupdates. Controllers are Intel ICH7R (in AHCI mode) and Intel ICH9R (also in AHCI mode). We *did not* apply any 4K alignment when making the partitions. We use ahci.ko. I haven't tested write speeds and all that, but the disks work fine. You might also try comparing iostat output to gstat output, though gstat refreshes the screen continually making this a little difficult. I would recommend "gstat -I500ms -f '^ad[0-9]$' and watch closely. Change the regex, of course, if you switch to ahci.ko. If you want to compare benchmarks, I need to know exactly what to do to reproduce the issue you're stating. I would prefer the traffic not come off the network (e.g. use dd or bonnie++ or something) to rule out problems there. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB |