Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 10 Jun 2017 23:39:34 -0400
From:      Allan Jude <allanjude@freebsd.org>
To:        freebsd-hackers@freebsd.org
Subject:   Re: FreeBSD10 Stable + ZFS + PostgreSQL + SSD performance drop < 24 hours
Message-ID:  <dc09bd99-f49f-daeb-b85d-62933891fde2@freebsd.org>
In-Reply-To: <20170610163642.GA18123@zxy.spb.ru>
References:  <79528bf7a85a47079756dc508130360b@DM2PR58MB013.032d.mgd.msft.net> <20170610163642.GA18123@zxy.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On 06/10/2017 12:36, Slawa Olhovchenkov wrote:
> On Sat, Jun 10, 2017 at 04:25:59PM +0000, Caza, Aaron wrote:
> 
>> Gents,
>>
>> I'm experiencing an issue where iterating over a PostgreSQL table of ~21.5 million rows (select count(*)) goes from ~35 seconds to ~635 seconds on Intel 540 SSDs.  This is using a FreeBSD 10 amd64 stable kernel back from Jan 2017.  SSDs are basically 2 drives in a ZFS mirrored zpool.  I'm using PostgreSQL 9.5.7.
>>
>> I've tried:
>>
>> *       Using the FreeBSD10 amd64 stable kernel snapshot of May 25, 2017.
>>
>> *       Tested on half a dozen machines with different models of SSDs:
>>
>> o   Intel 510s (120GB) in ZFS mirrored pair
>>
>> o   Intel 520s (120GB) in ZFS mirrored pair
>>
>> o   Intel 540s (120GB) in ZFS mirrored pair
>>
>> o   Samsung 850 Pros (256GB) in ZFS mirrored pair
>>
>> *       Using bonnie++ to remove Postgres from the equation and performance does indeed drop.
>>
>> *       Rebooting server and immediately re-running test and performance is back to original.
>>
>> *       Tried using Karl Denninger's patch from PR187594 (which took some work to find a kernel that the FreeBSD10 patch would both apply and compile cleanly against).
>>
>> *       Tried disabling ZFS lz4 compression.
>>
>> *       Ran the same test on a FreeBSD9.0 amd64 system using PostgreSQL 9.1.3 with 2 Intel 520s in ZFS mirrored pair.  System had 165 days uptime and test took ~80 seconds after which I rebooted and re-ran test and was still at ~80 seconds (older processor and memory in this system).
>>
>> I realize that there's a whole lot of info I'm not including (dmesg, zfs-stats -a, gstat, et cetera): I'm hoping some enlightened individual will be able to point me to a solution with only the above to go on.
> 
> Just a random guess: can you try r307264 (I am mean regression in
> r307266)?
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
> 

This sounds a bit like an issue I investigated for a customer a few 
months ago.

Look at gstat -d (includes DELETE operations like TRIM)

If you see a lot of that happening, but try: vfs.zfs.trim.enabled=0
in /boot/loader.conf and see if your issues go away.

the FreeBSD TRIM code for ZFS basicallys waits until the sector has been 
free for a while (to avoid doing a TRIM on a block we'll immediately 
reuse), so your benchmark will run file for a little while, then 
suddenly the TRIM will kick in.

For postgres, fio, bonnie++ etc, make sure the ZFS dataset you are 
storing the data on / benchmarking has a recordsize that matches the 
workload.

If you are doing a write-only benchmark, and you see lots of reads in 
gstat, you know you are having to do read/modify/write's, and that is 
why your performance is so bad.


-- 
Allan Jude



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?dc09bd99-f49f-daeb-b85d-62933891fde2>