Date: Fri, 7 Sep 2018 16:34:23 +0300 From: Subbsd <subbsd@gmail.com> To: markj@freebsd.org Cc: allanjude@freebsd.org, freebsd-current Current <freebsd-current@freebsd.org> Subject: Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4 Message-ID: <CAFt_eMpD0UZ5kMi70RvzGso%2Bqhf26xsi5kbAJW-_pz0HrP4Z9w@mail.gmail.com> In-Reply-To: <20180906002825.GB77324@raichu> References: <CAFt_eMrhFiV0OJQ2w05-pFj4EkYr9dbYf4APLuJdBykwkOjRsQ@mail.gmail.com> <e26c530e-88e4-0481-9012-f814dd8ed569@freebsd.org> <CAFt_eMozS6_9VWBu%2B3LZ9n12bFdykZkz9HTi-zb0kgSKS2Riag@mail.gmail.com> <20180906002825.GB77324@raichu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Sep 6, 2018 at 3:28 AM Mark Johnston <markj@freebsd.org> wrote: > > On Wed, Sep 05, 2018 at 11:15:03PM +0300, Subbsd wrote: > > On Wed, Sep 5, 2018 at 5:58 PM Allan Jude <allanjude@freebsd.org> wrote: > > > > > > On 2018-09-05 10:04, Subbsd wrote: > > > > Hi, > > > > > > > > I'm seeing a huge loss in performance ZFS after upgrading FreeBSD 12 > > > > to latest revision (r338466 the moment) and related to ARC. > > > > > > > > I can not say which revision was before except that the newver.sh > > > > pointed to ALPHA3. > > > > > > > > Problems are observed if you try to limit ARC. In my case: > > > > > > > > vfs.zfs.arc_max="128M" > > > > > > > > I know that this is very small. However, for two years with this there > > > > were no problems. > > > > > > > > When i send SIGINFO to process which is currently working with ZFS, i > > > > see "arc_reclaim_waiters_cv": > > > > > > > > e.g when i type: > > > > > > > > /bin/csh > > > > > > > > I have time (~5 seconds) to press several times 'ctrl+t' before csh is executed: > > > > > > > > load: 0.70 cmd: csh 5935 [arc_reclaim_waiters_cv] 1.41r 0.00u 0.00s 0% 3512k > > > > load: 0.70 cmd: csh 5935 [zio->io_cv] 1.69r 0.00u 0.00s 0% 3512k > > > > load: 0.70 cmd: csh 5935 [arc_reclaim_waiters_cv] 1.98r 0.00u 0.01s 0% 3512k > > > > load: 0.73 cmd: csh 5935 [arc_reclaim_waiters_cv] 2.19r 0.00u 0.01s 0% 4156k > > > > > > > > same story with find or any other commans: > > > > > > > > load: 0.34 cmd: find 5993 [zio->io_cv] 0.99r 0.00u 0.00s 0% 2676k > > > > load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.13r 0.00u 0.00s 0% 2676k > > > > load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.25r 0.00u 0.00s 0% 2680k > > > > load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.38r 0.00u 0.00s 0% 2684k > > > > load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.51r 0.00u 0.00s 0% 2704k > > > > load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.64r 0.00u 0.00s 0% 2716k > > > > load: 0.34 cmd: find 5993 [arc_reclaim_waiters_cv] 1.78r 0.00u 0.00s 0% 2760k > > > > > > > > this problem goes away after increasing vfs.zfs.arc_max > > > > _______________________________________________ > > > > freebsd-current@freebsd.org mailing list > > > > https://lists.freebsd.org/mailman/listinfo/freebsd-current > > > > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > > > > > > > > > > Previously, ZFS was not actually able to evict enough dnodes to keep > > > your arc_max under 128MB, it would have been much higher based on the > > > number of open files you had. A recent improvement from upstream ZFS > > > (r337653 and r337660) was pulled in that fixed this, so setting an > > > arc_max of 128MB is much more effective now, and that is causing the > > > side effect of "actually doing what you asked it to do", in this case, > > > what you are asking is a bit silly. If you have a working set that is > > > greater than 128MB, and you ask ZFS to use less than that, it'll have to > > > constantly try to reclaim memory to keep under that very low bar. > > > > > > > Thanks for comments. Mark was right when he pointed to r338416 ( > > https://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c?r1=338416&r2=338415&pathrev=338416 > > ). Commenting aggsum_value returns normal speed regardless of the rest > > of the new code from upstream. > > I would like to repeat that the speed with these two lines is not just > > slow, but _INCREDIBLY_ slow! Probably, this should be written in the > > relevant documentation for FreeBSD 12+ > > Could you please retest with the patch below applied, instead of > reverting r338416? > > diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > index 2bc065e12509..9b039b7d4a96 100644 > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > @@ -538,7 +538,7 @@ typedef struct arc_state { > */ > int zfs_arc_meta_prune = 10000; > unsigned long zfs_arc_dnode_limit_percent = 10; > -int zfs_arc_meta_strategy = ARC_STRATEGY_META_BALANCED; > +int zfs_arc_meta_strategy = ARC_STRATEGY_META_ONLY; > int zfs_arc_meta_adjust_restarts = 4096; > > /* The 6 states: */ Unfortunately, I can not conduct sufficiently serious regression tests right now (e.g through benchmark tools, with statistics via PMC(3), etc.). However, I can do a very surface comparison - for example with find. tested revision: 338520 loader.conf settings: vfs.zfs.arc_max="128M" meta strategy ARC_STRATEGY_META_BALANCED result: % time find /usr/ports -type f > /dev/null 0.495u 6.912s 7:20.88 1.6% 34+172k 289594+0io 0pf+0w meta strategy ARC_STRATEGY_META_ONLY: % time find /usr/ports -type f > /dev/null 0.721u 7.184s 4:57.18 2.6% 34+171k 300910+0io 0pf+0w So the difference in ~183 seconds. Both test started after a full boot OS cycle. Interestingly, despite the limit, ARC size by top(1) was constantly increasing and at the time of the end of the test it looked like this: ARC: 1161M Total, 392M MFU, 118M MRU, 32K Anon, 7959K Header, 643M Other 101M Compressed, 411M Uncompressed, 4.07:1 Ratio Swap: 16G Total, 16G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 8413 root 1 20 0 11M 3024K arc_re 4 0:07 1.49% find
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFt_eMpD0UZ5kMi70RvzGso%2Bqhf26xsi5kbAJW-_pz0HrP4Z9w>