Date: Tue, 31 Mar 2015 04:24:28 +0300 From: Artem Kuchin <artem@artem.ru> To: freebsd-fs@freebsd.org Subject: Re: Little research how rm -rf and tar kill server Message-ID: <5519F74C.1040308@artem.ru> In-Reply-To: <1427731759.309823.247107417.308CD298@webmail.messagingengine.com> References: <55170D9C.1070107@artem.ru> <1427727936.293597.247070269.5CE0D411@webmail.messagingengine.com> <55196FC7.8090107@artem.ru> <1427730597.303984.247097389.165D5AAB@webmail.messagingengine.com> <5519716F.6060007@artem.ru> <1427731061.306961.247099633.0A421E90@webmail.messagingengine.com> <5519740A.1070902@artem.ru> <1427731759.309823.247107417.308CD298@webmail.messagingengine.com>
next in thread | previous in thread | raw e-mail | index | archive | help
30.03.2015 19:09, Mark Felder пишет: > > On Mon, Mar 30, 2015, at 11:04, Artem Kuchin wrote: >> 30.03.2015 18:57, Mark Felder пишет: >>> On Mon, Mar 30, 2015, at 10:53, Artem Kuchin wrote: >>>> This is normal state, not under rm -rf >>>> Do you need it during rm -rf ? >>>> >>> No, but I wonder if changing the timer from LAPIC to HPET or possibly >>> one of the other timers makes the system more responsive under that >>> load. Would you mind testing that? >>> >>> You can switch the timer like this: >>> >>> sysctl kern.eventtimer.timer=HPET >>> >>> And then run some of your I/O tests >>> >> I see. I will test at night, when load goes down. >> I cannot say sure that's a right way to dig, but i will test anything :) >> >> Just to remind: untar overloads the system, but untar + sync every 120s >> does not. >> That seems very strange to me. I think the problem might be somewhere >> here. >> > I just heard from mav that there was a bottleneck in gmirror/graid with > regards to BIO_DELETE requests > > https://svnweb.freebsd.org/base?view=revision&revision=280757 > I applied this patch manually and rebuilt the kernel. Hit this bug https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458 on reboot, wasted 1 hour fsck-ing 2 times (was dirty after first fsck) and after boot tried doing rm -rf test1 I coult not test anything, because it complete after 1 minute, instead 15 minutes before. I copier the dir 4 times into subdirs and rm -rf full tree (4x larger) - fast and smooth, mariadb did not tonice this, server were working fine. However, i also noticed another thing: cp -Rp test test1 also work a lot faster now, probably 3-5 times faster Maybe it is because fs is free of tons BIO_DELETE from other processes Then i did the untar test at maximum speed ( no pv to limit bandwidth): i see that mysql request became slower, but mysql sql request queue built up slower now. However, when it reached 70 i stopped untar and mariadb could not recover from condition until i executed sync. However, this time sync took only a second. I see big improvement, but i still don't understand why i need to issue sync manually to push everything to recover from overload. # man 2 sync a sync() system call is issued frequently by the user process syncer(4) (about every 30 seconds). it does not seem to be true I checked syncer sysctl # sysctl kern.filedelay kern.filedelay: 30 # sysctl kern.dirdelay kern.dirdelay: 29 # sysctl kern.metadelay kern.metadelay: 28 # ps ax | grep sync 23 - DL 0:03.82 [syncer] no clue why need manual sync By the way: is there way to make sure that SU+J is really working? Maybe it is disabled for some reason and i don't know it. tunefs just shows stored setting, but, for example, with dirty fs, journaling is not working in reality. Any way to get current status of SU journaling? off topic: suggestion to move to ZFA was not so good, i see a "All available memory used when deleting files from ZFS" topic. I'd rather have slow server when i can login and fix than halted on panic. Just to point that ZFS still have plenty of unpredictable issues. Artem
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5519F74C.1040308>