Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 31 Mar 2015 09:54:58 -0500
From:      Mark Felder <feld@FreeBSD.org>
To:        freebsd-fs@freebsd.org
Cc:        mav@FreeBSD.org
Subject:   Re: Little research how rm -rf and tar kill server
Message-ID:  <1427813698.641733.247585797.28816738@webmail.messagingengine.com>
In-Reply-To: <5519F74C.1040308@artem.ru>
References:  <55170D9C.1070107@artem.ru> <1427727936.293597.247070269.5CE0D411@webmail.messagingengine.com> <55196FC7.8090107@artem.ru> <1427730597.303984.247097389.165D5AAB@webmail.messagingengine.com> <5519716F.6060007@artem.ru> <1427731061.306961.247099633.0A421E90@webmail.messagingengine.com> <5519740A.1070902@artem.ru> <1427731759.309823.247107417.308CD298@webmail.messagingengine.com> <5519F74C.1040308@artem.ru>

next in thread | previous in thread | raw e-mail | index | archive | help


On Mon, Mar 30, 2015, at 20:24, Artem Kuchin wrote:
> 30.03.2015 19:09, Mark Felder =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
> >
> > On Mon, Mar 30, 2015, at 11:04, Artem Kuchin wrote:
> >> 30.03.2015 18:57, Mark Felder =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
> >>> On Mon, Mar 30, 2015, at 10:53, Artem Kuchin wrote:
> >>>> This is normal state, not under rm -rf
> >>>> Do you need it during  rm -rf  ?
> >>>>
> >>> No, but I wonder if changing the timer from LAPIC to HPET or possibly
> >>> one of the other timers makes the system more responsive under that
> >>> load. Would you mind testing that?
> >>>
> >>> You can switch the timer like this:
> >>>
> >>> sysctl kern.eventtimer.timer=3DHPET
> >>>
> >>> And then run some of your I/O tests
> >>>
> >> I see. I will test at night, when load goes down.
> >> I cannot say sure that's a right  way to dig, but i will test anything=
 :)
> >>
> >> Just to remind: untar overloads the system, but untar + sync every 120s
> >> does not.
> >> That seems very strange to me.  I think the problem might be somewhere
> >> here.
> >>
> > I just heard from mav that there was a bottleneck in gmirror/graid with
> > regards to BIO_DELETE requests
> >
> > https://svnweb.freebsd.org/base?view=3Drevision&revision=3D280757
> >
>=20
> I applied this patch manually and rebuilt the kernel.
> Hit this bug
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D195458
> on reboot, wasted 1 hour fsck-ing 2 times (was dirty after first fsck)
> and after boot tried doing
> rm -rf test1
> I coult not test anything, because it complete after 1 minute, instead=20
> 15 minutes before.
> I copier the dir 4 times into subdirs and rm -rf full tree (4x larger) -=
=20
> fast and smooth,
> mariadb did not tonice this, server were working fine.
>=20
> However, i also noticed another thing:
> cp -Rp test test1
> also work a lot faster now, probably 3-5 times faster
> Maybe it is because fs is free of tons BIO_DELETE from other processes
>=20
>=20
> Then i did the untar test at maximum speed ( no pv to limit bandwidth):
> i see that mysql request became slower, but mysql sql request queue=20
> built up slower now.
> However, when it reached 70 i stopped untar and mariadb could not=20
> recover from condition
> until i executed sync. However, this time sync took only a second.
> I see big improvement, but i still don't understand why i need to issue=20
> sync manually to push
> everything to recover from overload.
>=20
> # man 2 sync
> a sync() system call is issued frequently by the user process syncer(4)=20
> (about every 30 seconds).
>=20
> it does not seem to be true
>=20
> I checked syncer sysctl
>=20
> # sysctl kern.filedelay
> kern.filedelay: 30
> # sysctl kern.dirdelay
> kern.dirdelay: 29
> # sysctl kern.metadelay
> kern.metadelay: 28
>=20
> # ps ax | grep sync
> 23  -  DL     0:03.82 [syncer]
>=20
> no clue why need manual sync
>=20
> By the way: is there way to make sure  that SU+J is really working?=20
> Maybe it is disabled for some reason
> and i don't know it. tunefs just shows stored setting, but, for example,=
=20
> with dirty fs, journaling is not
> working in reality. Any way to get current status of SU journaling?
>=20
> off topic: suggestion to move to ZFA was not so good, i see a "All=20
> available memory used when deleting files from ZFS"
> topic. I'd rather have slow server when i can login and fix than halted=20
> on panic. Just to point that ZFS still have plenty
> of unpredictable issues.
>=20

This information is very good. Perhaps there is some more additional
tweaking that could be done. I will cc mav@ on this.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1427813698.641733.247585797.28816738>