Date: Mon, 2 May 2011 16:36:01 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Jan Koum <jan@whatsapp.com> Cc: freebsd-fs@freebsd.org, Chris Peiffer <chris@whatsapp.com> Subject: Re: very strange IO issue with FreeBSD 8 and SSD Message-ID: <20110502233601.GA29710@icarus.home.lan> In-Reply-To: <BANLkTin-qEoxxFbjJkDaA_-UZMkza08NNQ@mail.gmail.com> References: <BANLkTin-qEoxxFbjJkDaA_-UZMkza08NNQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, May 02, 2011 at 03:28:23PM -0700, Jan Koum wrote: > hello, > > we are seeing some strange activity on our FreeBSD systems running > 8.2-PRERELEASE snapshot from early december > > our system has 4 Intel SSD drives (64GB each) connected directly into > motherboard through AHCI: > > ad4: setting UDMA100 > ad4: 61057MB <SSDSA2SH064G1GC INTEL 045C8860> at ata2-master UDMA100 SATA > 3Gb/s > ad4: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue > [...] > ad7: setting UDMA100 > ad7: 61057MB <SSDSA2SH064G1GC INTEL 045C8860> at ata3-slave UDMA100 SATA > 3Gb/s > ad7: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue > > $ df -h > Filesystem Size Used Avail Capacity Mounted on > /dev/ad4s1a 57G 24G 29G 45% / > /dev/ad5a 58G 17G 36G 32% /d2 > /dev/ad7a 58G 17G 36G 32% /d4 > /dev/ad6a 58G 17G 36G 32% /d3 > > so far - so good, right? this is where things get very bizarre: our > application receives data from network and writes to disk. on average the > file size grows to about 7Kbytes while an average file append is 300-400 > bytes. > > netstat shows about 700-800Kbytes of input and our application log shows we > write about 500Kbytes each second. however, when i run iostat i we see > upwards of 10MB a second written to disk (if not more). for example: > > $ iostat -KC -x 1 > extended device statistics cpu > device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id > ad4 9.0 423.3 45.2 4410.1 0 84.3 11 5 0 5 1 89 > ad5 9.0 420.7 44.9 4237.4 0 82.3 11 > ad6 9.0 420.6 45.1 4254.4 0 81.1 11 > ad7 9.0 420.3 44.9 4225.7 0 83.8 11 > extended device statistics cpu > device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id > ad4 14.9 157.9 79.5 1108.4 0 31.7 18 8 0 5 1 86 > ad5 15.9 1480.8 63.6 18886.1 0 36.4 19 > ad6 20.9 154.9 93.4 1032.9 0 7.4 4 > ad7 19.9 216.5 63.6 1450.0 0 9.2 4 > extended device statistics cpu > device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id > ad4 20.9 169.2 115.4 1271.7 0 39.3 13 9 0 4 1 85 > ad5 21.9 1179.1 129.4 11598.1 0 34.6 14 > ad6 14.9 140.3 39.8 925.4 0 9.4 3 > ad7 15.9 213.9 33.8 1610.0 0 7.9 3 > extended device statistics cpu > device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id > ad4 15.9 403.6 53.7 3208.6 0 30.0 10 8 0 6 1 85 > ad5 16.9 709.7 47.7 4691.6 0 20.2 9 > ad6 23.9 321.1 97.4 2262.3 0 12.9 7 > ad7 14.9 421.4 51.7 3437.2 0 13.3 7 > > (apologies in advance for bad formatting) > > so, here are we are, looking at iostat output and trying to figure out how > it can be this bad and where the discrepancy is coming from. a few things > to get out of the way: no, we do not have TRIM enabled yet, we would need to > upgrade OS for that, but we don't think TRIM would make such a big > different. also we know that we can newfs with -b 512 -f 4096 but again, we > also dont think that it would account for such a large IO discrepancy. > > any thoughts to what this could be? has anybody seen anything similar > before? 10MB of metadata for 500K worth of disk writes? that can't be.... > right? I would recommend trying ahci.ko instead of ataahci.ko. Your device names will change (ad4 --> ada0, ad5 --> ada1, etc.). Just add ahci_load="yes" to /boot/loader.conf and reboot into single-user, fix /etc/fstab and related configuration files, and that's all you should have to do. We use Intel SSDs (X25-M 80GB) in our servers, also backed by UFS2 with softupdates. Controllers are Intel ICH7R (in AHCI mode) and Intel ICH9R (also in AHCI mode). We *did not* apply any 4K alignment when making the partitions. We use ahci.ko. I haven't tested write speeds and all that, but the disks work fine. You might also try comparing iostat output to gstat output, though gstat refreshes the screen continually making this a little difficult. I would recommend "gstat -I500ms -f '^ad[0-9]$' and watch closely. Change the regex, of course, if you switch to ahci.ko. If you want to compare benchmarks, I need to know exactly what to do to reproduce the issue you're stating. I would prefer the traffic not come off the network (e.g. use dd or bonnie++ or something) to rule out problems there. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110502233601.GA29710>