Date: Tue, 20 Jun 2017 17:16:29 +0000 From: "Caza, Aaron" <Aaron.Caza@ca.weatherford.com> To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: RE: FreeBSD 11.1 Beta 2 ZFS performance degradation on SSDs Message-ID: <36ec1fb476e647a78c19463eea493859@DM2PR58MB013.032d.mgd.msft.net>
next in thread | raw e-mail | index | archive | help
> -----Original Message----- > From: Steven Hartland [mailto:killing@multiplay.co.uk] > Sent: Monday, June 19, 2017 7:32 PM > To: freebsd-fs@freebsd.org<mailto:freebsd-fs@freebsd.org> > Subject: Re: FreeBSD 11.1 Beta 2 ZFS performance degradation on SSDs > > On 20/06/2017 01:57, Caza, Aaron wrote: > >> vfs.zfs.min_auto_ashift is a sysctl only its not a tuneable, so settin= g it in /boot/loader.conf won't have any effect. > >> > >> There's no need for it to be a tuneable as it only effects vdevs when = they are created, which an only be done once the system is running. > >> > > The bsdinstall install script itself set vfs.zfs.min_auto_shift=3D12 in= /boot/loader.conf yet, as you say, this doesn't do anything. As a user, t= his is a bit confusing to see it in /boot/loader.conf but do a 'sysctl -a |= grep min_auto_ashift' and see 'vfs.zfs.min_auto_ashift: 9' so felt it was = worth mentioning. > Absolutely, patch is in review here: > https://reviews.freebsd.org/D11278 Thanks for taking care of this Steve - appreciated. > > > >> You don't explain why you believe there is degrading performance? > > As I related in my post, my previous FreeBSD 11-Stable setup using this= same hardware, I was seeing 950MB/s after bootup. I've been posting to th= e freebsd-hackers list, but have moved to freebsd-fs list as this seemingly= has something to do with FreeBSD+ZFS behavior and user Jov had previously = cross-posted to this list for me: > > https://docs.freebsd.org/cgi/getmsg.cgi?fetch=3D2905+0+archive/2017/fr > > ee > > bsd-fs/20170618.freebsd-fs > > > > I've been using FreeBSD+ZFS ever since FreeBSD 9.0, admittedly, with a = different zpool layout which is essentially as follows: > > adaXp1 - gptboot loader > > adaXp2 - 1GB UFS partition > > adaXp3 - UFS with UUID labeled partition hosting a GEOM ELI > > layer using NULL encryption to emulate 4k sectors (done before > > ashift was an > > option) > > > > So, adaXp3 would show up as something like the following: > > > > /dev/gpt/b62feb20-554b-11e7-989b-000bab332ee8 > > /dev/gpt/b62feb20-554b-11e7-989b-000bab332ee8.eli > > > > Then, the zpool mirrored pair would be something like the following: > > > > pool: wwbase > > state: ONLINE > > scan: none requested > > config: > > > > NAME STATE RE= AD WRITE CKSUM > > wwbase ONLINE = 0 0 0 > > mirror-0 ONLINE = 0 0 0 > > gpt/b62feb20-554b-11e7-989b-000bab332ee8.eli ONLINE = 0 0 0 > > gpt/4c596d40-554c-11e7-beb1-002590766b41.eli ONLINE = 0 0 0 > > > > Using the above zpool configuration on this same hardware on FreeBSD > > 11-Stable, I was seeing read speeds of 950MB/s using dd (dd > > if=3D/testdb/test of=3D/dev/null bs=3D1m). However, after anywhere fro= m 5 > > to 24 hours, performance would degrade down to less than 100MB/s for > > unknown reasons - server was essentially idle so it's a > mystery to me why this occurs. I'm seeing this behavior on FreeBSD > 10.3R amd64 up through FreeBSD11.0 Stable. As I wasn't making any headwa= y in resolving this, I opted today to use the FreeBSD11.1 Beta 2 memstick i= mage to create a basic FreeBSD 11.1 Beta 2 amd64 Auto(ZFS) installation to = see if this would resolve the original issue I was having as I would be us= ing ZFS-on-root and vfs.zfs.min_auto_ashift=3D12 instead of my own emulatio= n as described above. However, instead of seeing the 950MB/s that I expect= ed - which it what I see it with my alternative emulation - I'm seeing 450M= B/s. I've yet to determine if this zpool setup as done by the bsdinstall s= cript > will suffer from the original performance degradation I observed. > > > >> What is the exact dd command your running as that can have a huge impa= ct on performance. > > dd if=3D/testdb/test of=3D/dev/null bs=3D1m > > > > Note that file /testdb/test is 16GB, twice the size of ram available in= this system. The /testdb directory is a ZFS file system with recordsize= =3D8k, chosen as ultimately it's intended to host a PostgreSQL database whi= ch uses an 8k page size. > > > > My understanding is that a ZFS mirrored pool with two drives can read f= rom both drives at the same time hence double the speed. This is what I've= actually observed ever since I first started using this in FreeBSD 9.0 wit= h the GEOM ELI 4k sector emulation. This is actually my first time using F= reeBSD's native installer's Auto(ZFS) setup > with 4k sectors emulated usin= g vfs.zfs.min_auto_ashift=3D12. As it's a ZFS mirrored pool, I still expec= ted it to be able to read at double-speed as it does with the GEOM ELI 4k s= ector emulation; however, it does not. > > >> On 19/06/2017 23:14, Caza, Aaron wrote: >>> I've been having a problem with FreeBSD ZFS SSD performance inexplicab= ly degrading after < 24 hours uptime as described in a separate e-mail thr= ead. In an effort to get down to basics, I've now performed a ZFS-on-Root = install of FreeBSD 11.1 Beta 2 amd64 using the default Auto(ZFS) install us= ing the default 4k sector emulation (vfs.zfs.min_auto_ashift=3D3D12) settin= g (no swap, not encrypted). >>> >>> Firstly, the vfs.zfs.min_auto_ashift=3D3D12 is set correctly in the /bo= ot=3D/loader.conf file, but doesn't appear to work because when I log in an= d do "systctl -a | grep min_auto_ashift" it's set to 9 and not 12 as expect= ed. I tried setting it to vfs.zfs.min_auto_ashift=3D3D"12" in /boot/loader= .conf but that didn't make any difference so I finally just added it to /et= c/sysctl.conf where it seems to work. So, something needs to be changed to= make this functionaly work correctly. >>> >>> Next, after reboot I was expecting somewhere in the neighborhood of 950= MB/s from the ZFS mirrored zpool of 2 Samsung 850 Pro 256GB SSDs that I'm u= sing as I was previously seeing this before with my previous FreeBSD 11-Sta= ble setup which, admittedly, is a different from the way the bsdinstall scr= ipt does it. However, I'm seeing half that on bootup. >>> >>> Performance result: >>> Starting 'dd' test of large file...please wait >>> 16000+0 records in >>> 16000+0 records out >>> 16777216000 bytes transferred in 37.407043 secs (448504207 >>> bytes/sec) > Can you show the output from gstat -pd during this DD please. dT: 1.001s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d = %busy Name 0 4318 4318 34865 0.0 0 0 0.0 0 0 0.0= 14.2| ada0 0 4402 4402 35213 0.0 0 0 0.0 0 0 0.0= 14.4| ada1 dT: 1.002s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d = %busy Name 1 4249 4249 34136 0.0 0 0 0.0 0 0 0.0= 14.1| ada0 0 4393 4393 35287 0.0 0 0 0.0 0 0 0.0= 14.5| ada1 Every now and again, I was seeing d/s hit, which I understand to be TRIM op= erations - it would briefly show 16 then go back to 0. test@f111beta2:~ # dd if=3D/testdb/test of=3D/dev/null bs=3D1m 16000+0 records in 16000+0 records out 16777216000 bytes transferred in 43.447343 secs (386150561 bytes/sec) test@= f111beta2:~ # uptime 9:54AM up 19:38, 2 users, load averages: 2.92, 1.01,= 0.44 root@f111beta2:~ # dd if=3D/testdb/test of=3D/dev/null bs=3D1m 16000+0 records in 16000+0 records out 16777216000 bytes transferred in 236.097011 secs (71060688 bytes/sec) test@= f111beta2:~ # uptime 10:36AM up 20:20, 2 users, load averages: 0.90, 0.62,= 0.36 As can be seen in the above 'dd' test results, I'm back to seeing the origi= nal issue I reported on freebsd-hackers - performance degrading after < 24 = hours of uptime going from ~386MB/sec to ~71MB/sec inexpicably - this serve= r isn't doing anything other than running this test hourly. Please note in the gstat -pd output above, this was after the performance d= egradation hit. Prior to this, I was seeing %busy of ~60%. In this partic= ular instance, the performance degradation hit ~20hrs into the test but I'v= e see it hit as soon as ~5hrs. Previously, Allan Jude had advised zfs.vfs.trim.enabled=3D0 to see if this = changed the behavior. I did this; however, it had no impact - but that was= when I was using the GEOM ELI 4k sector emulation and not the ashift 4k se= ctor emulation. The GEOM ELI 4k sector emulation does not appear to work w= ith TRIM operations as gstat -d in that case always stayed at 0 ops/s. I c= an try disabling trim, but did not want to reboot the server to restart the= test in case there was some additional info worth capturing. I have captured an hourly log that can be provided containing the initial d= msg, zpool status, zfs list, zfs get all along with an hourly capture of th= e results of running the above 'dd' test with associated zfs-stats -a and s= ysctl -a output though it's currently 2.8MB hence too large to post to this= list. Also, there seems to be a problem with my freebsd-fs subscription as I'm no= t getting e-mail notifications despite having submitted a subscription requ= est so apologies for my slow responses. -- Aaron Aaron Caza Senior Server Developer Weatherford SLS Canada R&D Group Weatherford | 1620 27 Ave NE | #124B | Calgary | AB | T2E 8W4 Direct +1 (403) 693-7773 Aaron.Caza@ca.weatherford.com<mailto:Aaron.Caza@ca.weatherford.com> | www.w= eatherford.com<http://www.weatherford.com/> [cid:image001.jpg@01D27566.E8E4ABE0]<http://www.weatherford.com/> [cid:image002.jpg@01D27566.E8E4ABE0]<https://www.linkedin.com/company/weath= erford> [cid:image003.jpg@01D27566.E8E4ABE0]<https://www.facebook.com/WeatherfordCo= rp/> [cid:image004.jpg@01D27566.E8E4ABE0]<https://www.youtube.com/user/weatherfo= rdcorp> [cid:image005.jpg@01D27566.E8E4ABE0]<https://twitter.com/WeatherfordCorp?la= ng=3Den> This message may contain confidential and privileged information. If it ha= s been sent to you in error, please reply to advise the sender of the error= and then immediately delete it. If you are not the intended recipient, do= not read, copy, disclose or otherwise use this message. The sender discla= ims any liability for such unauthorized use. PLEASE NOTE that all incoming= e-mails sent to Weatherford e-mail accounts will be archived and may be sc= anned by us and/or by external service providers to detect and prevent thre= ats to our systems, investigate illegal or inappropriate behavior, and/or e= liminate unsolicited promotional e-mails("spam"). This process could resul= t in deletion of a legitimate e-mail before it is read by its intended reci= pient at our organization. Moreover, based on the scanning results, the fu= ll text of e-mails and attachments may be made available to Weatherford sec= urity and other personnel for review and appropriate action. If you have a= ny concerns about this process, please contact us at dataprivacy@weatherfor= d.com. This message may contain confidential and privileged information. If it has= been sent to you in error, please reply to advise the sender of the error = and then immediately delete it. If you are not the intended recipient, do n= ot read, copy, disclose or otherwise use this message. The sender disclaims= any liability for such unauthorized use. PLEASE NOTE that all incoming e-m= ails sent to Weatherford e-mail accounts will be archived and may be scanne= d by us and/or by external service providers to detect and prevent threats = to our systems, investigate illegal or inappropriate behavior, and/or elimi= nate unsolicited promotional e-mails (spam). This process could result in d= eletion of a legitimate e-mail before it is read by its intended recipient = at our organization. Moreover, based on the scanning results, the full text= of e-mails and attachments may be made available to Weatherford security a= nd other personnel for review and appropriate action. If you have any conce= rns about this process, please contact us at dataprivacy@weatherford.com.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?36ec1fb476e647a78c19463eea493859>