From owner-freebsd-scsi@FreeBSD.ORG Mon Jul 14 08:36:26 2014 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0A50C9A3 for ; Mon, 14 Jul 2014 08:36:26 +0000 (UTC) Received: from exprod7og103.obsmtp.com (exprod7og103.obsmtp.com [64.18.2.159]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 651A72DB0 for ; Mon, 14 Jul 2014 08:36:24 +0000 (UTC) Received: from mail-la0-f44.google.com ([209.85.215.44]) (using TLSv1) by exprod7ob103.postini.com ([64.18.6.12]) with SMTP ID DSNKU8OWgpC1P3c6qAUkqahrz1R/W/t6fEJ6@postini.com; Mon, 14 Jul 2014 01:36:25 PDT Received: by mail-la0-f44.google.com with SMTP id e16so1096720lan.31 for ; Mon, 14 Jul 2014 01:36:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=9VJ6w0w1EJYHrISfERsrht5D+mdZslXolChnYESI7BE=; b=cXYkFaKfajzoi+1Mr1TQ3CGHwcQNhA9iP4qMwfpxbPsgV7aA/lP5AkM3vzitvRURYr rVTxKBtJVnca1JFX1Kh8/xGHm3aGz2ZgX36SZCzmEQdElQOkSIBkUWmgYMmMfdng5jFk ofHGZLJx4PaAGgxtEKdsYCaM0w7VA9SrVWNjmrdjZ1ZjEB6kMF99EHUeAMT8QpIk63yy O22B866ypec2AY0IeNuoCTPK2d3Ea/KV1NuLe7yYe7N/rTNCjaIj/VWhIbj568V7cTzm hqurdhqeS/67U5KRYVwfPs5X/hNQAxAXsWecTWgcB2VSMzZza9d6sK5CpdCH1yw9Ocrz Xo/g== X-Gm-Message-State: ALoCoQlcfrWNBCJGdjzA3SVB0wYhuXIlqJ9VdkIk87yyaF2qurUt3A32t1twNDlETJK0alykjZxtPSeDDiJ9tDJGOcU8ebFcN5DhI4mBlAhbFtfBEQl7wE+8Ev9H25DEy6B56OmIzU/ncgjaLMJ2StJWrhBwJ4/FJw== X-Received: by 10.152.36.169 with SMTP id r9mr13421735laj.14.1405326976753; Mon, 14 Jul 2014 01:36:16 -0700 (PDT) X-Received: by 10.152.36.169 with SMTP id r9mr13421727laj.14.1405326976654; Mon, 14 Jul 2014 01:36:16 -0700 (PDT) From: Kashyap Desai References: <8fbe38cdad1e66717a9de7fdf63812c2@mail.gmail.com> <53BE8784.8060503@FreeBSD.org> <9f138f242e278476e5c542d695e58bc8@mail.gmail.com> <53BF1E6C.5030806@FreeBSD.org> In-Reply-To: <53BF1E6C.5030806@FreeBSD.org> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQG8AScvFWC5/cNVmZuoGGC8NZGkUAIYsfkbAp/trHgBd5fqmJuU+KQA Date: Mon, 14 Jul 2014 14:06:15 +0530 Message-ID: Subject: RE: SSDs peformance on head/freebsd-10 stable using FIO To: Alexander Motin Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD-scsi X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jul 2014 08:36:26 -0000 > -----Original Message----- > From: Alexander Motin [mailto:mavbsd@gmail.com] On Behalf Of Alexander > Motin > Sent: Friday, July 11, 2014 4:45 AM > To: Kashyap Desai > Cc: FreeBSD-scsi > Subject: Re: SSDs peformance on head/freebsd-10 stable using FIO > > On 10.07.2014 16:28, Kashyap Desai wrote: > > From: Alexander Motin [mailto:mavbsd@gmail.com] On Behalf Of > Alexander > >> On 10.07.2014 15:00, Kashyap Desai wrote: > >>> I have 8 SSDs in my setup and all 8 SSDs are behind LSI=E2=80=99s 12G= p/s > >>> MegaRaid Controller as JBOD. I also found FIO can be used in Async > >>> mode after loading =E2=80=9Caio=E2=80=9D kernel module. > >>> > >>> Using single SSD, I am able to see 110K-130K IOPs. This IOPs > >>> counts are matching with what I see on Linux machine. > >>> > >>> Now, I am not able to scale IOPs on my machine after 200K. I see > >>> CPU is almost occupied and no idle time after IOPs reach to 200K. > >>> > >>> If you have any pointers to try with, I can do some experiment on > >>> my > >> setup. > >> > >> Getting such results I would immediately start doing profiling with > >> pmcstat. > >> Quite likely you are hitting some new lock congestion. Start with > >> simple `pmcstat -n 100000000 -TS unhalted-cycles`. It it hard to say > >> for sure what went wrong there without more data, so just couple > > I have attached profile output for the command mentioned above. I will > > dig further and see if this is what we have theoretical limit for CAM > > attached HBA. > > First thing I noticed in this profile output is bunch of TLB shutdowns. > You can not reach reasonable performance from user-level without having > HBA support unmapped I/O. Both mps and mpr drivers support it, but for > some reason still not mrsas. Even at non-peak I/O rates on multi-core > system > TLB shutdowns in such case can eat additional 30% of CPU time. Thanks.! For this part, I can try In mrsas. Can you help me to understand what you mean by unmapped I/O ? > > Another thing I see is mentioned congestion on driver's CAM SIM lock. > You need either multiple cards or multiqueue. > > >> thoughts: > >> > >> First of all, I've never tried aio in my benchmarks, only synchronous > >> ones. Try to run 8 instances of `dd if=3D/dev/daX of=3D/dev/null bs=3D= 512` > >> per each SSD same time, just as I did. You may vary number of dd's, > >> but keep total below 256, or you mad to increase nswbuf limit in > >> kern_vfs_bio_buffer_alloc(). > > > > I ran multiple dd instance also and seeing IOPs throttle somewhere ~200= K > > . > > > > Do we have any mechanism to check CAM layer's max IOPs support > without > > involving actual Device ? Something like _null_ device driver which > > just send the command back to CAM layer ? > > There is not such one now. Such test would radically change timings of > operation, and I am not sure how useful would results be. > > >> For second, you are using single HBA, that should create significant > >> congestion around its CAM SIM lock. Proper solution would be to add > >> multiple queues support to the driver, and we discussed it with Scott > >> Long for quite some time, but that requires more work (I hope you may > >> be interested in it ;) ). Or you may just insert 3-4 HBAs. My million > >> IOPS I was reaching with four 2008/2308 6Gbps HBAs and 16 SATA SSDs. > > > > I remember this part and really good to contribute for this work. As > > part of this we have initiated multiple MSIx implementation in > > , which will have multiple reply queue per MSI-x. > > Cool! > > > Do we really require to have multiple Submission queue at low level > > driver > ? > > I thought it will be a CAM interface for multi queue which _all_ low > > level drivers need to hook into . > > Now CAM is still oriented on single submission queue, but allows driver t= o > have multiple completion queues. So I would start from implementing last > ones, each bound to own MSI-X interrupt and calling completion without > using the SIM lock or holding any other locks during the upcall. > CAM provides way to avoid extra context switch in that case, that could b= e > very useful. > > -- > Alexander Motin