Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Jun 2017 18:50:27 +0000
From:      "Caza, Aaron" <Aaron.Caza@ca.weatherford.com>
To:        Karl Denninger <karl@denninger.net>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   RE: FreeBSD 11.1 Beta 2 ZFS performance degradation on SSDs
Message-ID:  <b7350cca59624e91abee6697aaf9e1b6@DM2PR58MB013.032d.mgd.msft.net>

next in thread | raw e-mail | index | archive | help
> -----Original Message-----
> From: Karl Denninger [mailto:karl@denninger.net]
> Sent: Tuesday, June 20, 2017 11:58 AM
> To: freebsd-fs@freebsd.org
> Subject: Re: FreeBSD 11.1 Beta 2 ZFS performance degradation on SSDs
>
> On 6/20/2017 12:29, Caza, Aaron wrote:
> >> -----Original Message-----
> >> From: Karl Denninger [mailto:karl@denninger.net]
> >> Sent: Monday, June 19, 2017 7:28 PM
> >> To: freebsd-fs@freebsd.org
> >> Subject: Re: FreeBSD 11.1 Beta 2 ZFS performance degradation on SSDs
> >>
> >> Just one note below...
> >>
> >> On 6/19/2017 19:57, Caza, Aaron wrote:
> >>> Note that file /testdb/test is 16GB, twice the size of ram available =
=3Dn this system.  The /testdb directory is a ZFS file system with recordsi=
=3De=3D8k, chosen as ultimately it's intended to host a PostgreSQL database=
=3Dwhich uses an 8k page size.
> >> Do not make this assumption blindly.  Yes, I know the docs say to set
> >> recordsize=3D8k but this is something you need to benchmark against
> >> yo=3Dr actual working data set.
> >>
> >> MANY Postgres workloads are MUCH faster (2x or more!) if you use a
> >> default page size and lz4 compression -- including one I have in
> >> production and have extensively benchmarked.  The difference is NOT sm=
=3Dll..
> >> ....
> >>
> >> zroot/ticker  compressratio         1.53x                         -
> >> zroot/ticker  mounted               yes                           -
> >> zroot/ticker  quota                 none                          defa=
=3Dlt
> >> zroot/ticker  reservation           none                          defa=
=3Dlt
> >> zroot/ticker  recordsize            128K                          defa=
=3Dlt
> >> zroot/ticker  mountpoint            /usr/local/pgsql/data-ticker  loca=
=3D
> >> zroot/ticker  sharenfs              off                           defa=
=3Dlt
> >> zroot/ticker  checksum              fletcher4
> >> inherited from zroot
> >> zroot/ticker  compression           lz4
> >> inherited from zroot
> >> zroot/ticker  atime                 off
> >> inherited from zroot
> >>
> >> You may also want to consider setting logbias=3Dthroughput.  In some
> >> c=3Dses the improvement there can be quite material as well --
> >> depending on th=3D insert/update traffic to the database in question.
> >>
> >> --
> >> Karl Denninger
> >> karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/
> >> /[S/MIME encrypted email preferred]/
> > Thanks for the suggestions Karl.  I'll investigate further after I reso=
=3Dve this performance degradation issue I'm experiencing.  I recently read=
=3Danother FreeBSD+ZFS+PostgreSQL user's Scale15x presentation, PostgreZFS,=
=3DSean Chittenden if I recall correctly, who also advised lz4 compression =
=3D 16K page size rather than 8K with PostgreZFS.
> >
> > With regards to my performance woes, I was originally using PostgreSQL =
=3Dn my posts to freebsd-hackers@freebsd.org but started using 'dd' to remo=
=3De it as a point of contention.  In attempting to resolve this issue, I t=
=3Died using your patch to PR 187594 (https://bugs.freebsd.org/bugzilla/sho=
=3D_bug.cgi?id=3D187594).  Took a bit of effort to > >  find a revision of =
FreeB=3DD 10 Stable to which your FreeBSD10 patch would both apply and comp=
ile c=3Deanly; however, it didn't resolve the issue I'm experiencing.
> I would not have expected my PR to impact this issue.
>
> I suspicious of a drive firmware interaction with your I/O pattern; SSDs
> are somewhat-notorious for having that come up under certain workloads
> that involve a lot of writes.
>

I've observed this performance degradation on 6 different hardware systems =
using 4 differents SSDS (2x Intel 510 120GB, 2x Intel 520 120GB, 2x Intel 5=
40 120GB, 2x Samsung 850 Pro SSDs) on FreeBSD10.3 RELEASE, FreeBSD 10.3 REL=
EASEp6, FreeBSD 10.3RELEASEp19, FreeBSD 10-Stable, FreeBSD11.0 RELEASE, Fre=
eBSD 11-Stable and now FreeBSD11.1 Beta 2.  This latest testing I'm not doi=
ng much in the way of writing - only logging the output of the 'dd' command=
 along with 'zfs-stats -a' and 'uptime' to go along with it once an hour.  =
 Ran for ~20hrs before performance drop kicked in though why it happens is =
inexplicable as this server isn't doing anything other than running this te=
st hourly.

I have a FreeBSD9.0 system using 2x Intel 520 120GB SSDs that doesn't exhib=
it this performance degradation, maintaining ~400MB/s speeds even after man=
y days of uptime.  This is using the GEOM ELI layer to provide 4k sector em=
ulation for the mirrored zpool as I previously described.

Interestingly, using the GEOM ELI layering, I was seeing the following
- FreeBSD 10.3 RELEASE  :  performance ~750MB/s when dd'ing 16GB file
- FreeBSD 10 Stable         :  performance ~850MB/s when dd'ing 16GB file
- FreeBSD 11 Stable         :  performance ~950MB/s when dd'ing 16GB file

During the above testing, which was all done after reboot, gstat would show=
 %busy of 90-95%.  When performance degradation hits, %busy drops to ~15%.

Switching to FreeBSD 11.1 Beta 2 with Auto(ZFS) ashift-based 4k emulation o=
f ZFS mirrored pool:
- FreeBSD 11.1 Beta 2     :  performance ~450MB/s when dd'ing 16GB file wit=
h gstat %busy of ~60%.  When performance degradation hits, %busy drops to ~=
15%.

Now, I expected that removing the GEOM ELI layer and just using vfs.zfs.min=
_auto_ashift=3D12 to do the 4k sector emulation would provide even better p=
erformance.  It's seems strange to me that it doesn't.

> --
> Karl Denninger
> karl@denninger.net <mailto:karl@denninger.net>
> /The Market Ticker

--
Aaron
This message may contain confidential and privileged information. If it has=
 been sent to you in error, please reply to advise the sender of the error =
and then immediately delete it. If you are not the intended recipient, do n=
ot read, copy, disclose or otherwise use this message. The sender disclaims=
 any liability for such unauthorized use. PLEASE NOTE that all incoming e-m=
ails sent to Weatherford e-mail accounts will be archived and may be scanne=
d by us and/or by external service providers to detect and prevent threats =
to our systems, investigate illegal or inappropriate behavior, and/or elimi=
nate unsolicited promotional e-mails (spam). This process could result in d=
eletion of a legitimate e-mail before it is read by its intended recipient =
at our organization. Moreover, based on the scanning results, the full text=
 of e-mails and attachments may be made available to Weatherford security a=
nd other personnel for review and appropriate action. If you have any conce=
rns about this process, please contact us at dataprivacy@weatherford.com.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b7350cca59624e91abee6697aaf9e1b6>