Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Jul 2014 15:31:00 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        Kashyap Desai <kashyap.desai@avagotech.com>
Cc:        FreeBSD-scsi <freebsd-scsi@freebsd.org>
Subject:   Re: SSDs peformance on head/freebsd-10 stable using FIO
Message-ID:  <53BE8784.8060503@FreeBSD.org>
In-Reply-To: <8fbe38cdad1e66717a9de7fdf63812c2@mail.gmail.com>
References:  <8fbe38cdad1e66717a9de7fdf63812c2@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi, Kashyap.

On 10.07.2014 15:00, Kashyap Desai wrote:
> I am trying to collect IOPs and throughput using FIO on FreeBSD-10-stable
> as below post mentioned that CAM can reach upto 1,000,000 IOPS using
> Fine-Grained CAM locking.
> 
> http://www.freebsd.org/news/status/report-2013-07-2013-09.html#GEOM-Direct-Dispatch-and-Fine-Grained-CAM-Locking
> 
> I am using below FIO parameter.
> 
> [global]
> ioengine=posixaio
> buffered=0
> rw=randread
> bs=4K
> iodepth=32
> numjobs=2
> direct=1
> runtime=60s
> thread
> group_reporting=1
> [job1]
> filename=/dev/da0
> [job2]
> filename=/dev/da1
> [job3]
> filename=/dev/da2
> [job4]
> filename=/dev/da3
> [job4]
> filename=/dev/da4
> ..
> 
> I have 8 SSDs in my setup and all 8 SSDs are behind LSI’s 12Gp/s MegaRaid
> Controller as JBOD. I also found FIO can be used in Async mode after
> loading “aio” kernel module.
> 
> Using single SSD, I am able to see  110K-130K IOPs.  This IOPs counts are
> matching with what I see on Linux machine.
> 
> Now, I am not able to scale IOPs on my machine after 200K.  I see CPU is
> almost occupied and no idle time after IOPs reach to 200K.
> 
> If you have any pointers to try with,  I can do some experiment on my setup.

Getting such results I would immediately start doing profiling with
pmcstat. Quite likely you are hitting some new lock congestion. Start
with simple `pmcstat -n 100000000 -TS unhalted-cycles`. It it hard to
say for sure what went wrong there without more data, so just couple
thoughts:

First of all, I've never tried aio in my benchmarks, only synchronous
ones. Try to run 8 instances of `dd if=/dev/daX of=/dev/null bs=512` per
each SSD same time, just as I did. You may vary number of dd's, but keep
total below 256, or you mad to increase nswbuf limit in
kern_vfs_bio_buffer_alloc().

For second, you are using single HBA, that should create significant
congestion around its CAM SIM lock.  Proper solution would be to add
multiple queues support to the driver, and we discussed it with Scott
Long for quite some time, but that requires more work (I hope you may be
interested in it ;) ). Or you may just insert 3-4 HBAs. My million IOPS
I was reaching with four 2008/2308 6Gbps HBAs and 16 SATA SSDs.

Please tell me what you get, so we could continue investigation.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53BE8784.8060503>