From owner-freebsd-fs@FreeBSD.ORG Sat Apr 4 02:19:32 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D7A0BD7F for ; Sat, 4 Apr 2015 02:19:32 +0000 (UTC) Received: from smtp45.i.mail.ru (smtp45.i.mail.ru [94.100.177.105]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 51ABD207 for ; Sat, 4 Apr 2015 02:19:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=mail.ru; s=mail2; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:References:Subject:CC:To:MIME-Version:From:Date:Message-ID; bh=jXvIeyemngmg6FDPm9c6qCzf9iaYnblxAPN2F6QhqqI=; b=SPtyD8hlZVjQmduqcJFh809akDwUGaoFffuQvZBT7jt8hAx7DEdoFEYInsf4rWQfgkC3sXv+9fRtJSsgBwyBqQxXsAtdOzDTiQ5eDxu0dryyUILFAPSNmugoHgJeWHvmALUuSqviRtlrcudVkOrVk6++Qjt9AUTrbJNj3cBomOM=; Received: from [109.188.125.8] (port=14838 helo=[192.168.0.12]) by smtp45.i.mail.ru with esmtpa (envelope-from ) id 1YeDg9-0000oA-Df; Sat, 04 Apr 2015 05:19:21 +0300 Message-ID: <551F4A5D.1080008@artem.ru> Date: Sat, 04 Apr 2015 05:20:13 +0300 From: Artem Kuchin User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: Little research how rm -rf and tar kill server References: <1427731061.306961.247099633.0A421E90@webmail.messagingengine.com> <5519740A.1070902@artem.ru> <1427731759.309823.247107417.308CD298@webmail.messagingengine.com> <5519F74C.1040308@artem.ru> <20150331164202.GN2379@kib.kiev.ua> <551C6D9F.8010506@artem.ru> <20150402210241.GD2379@kib.kiev.ua> <551F0D4A.5040007@artem.ru> <20150403231530.GH2379@kib.kiev.ua> <551F20E0.9040103@artem.ru> <20150403232904.GI2379@kib.kiev.ua> In-Reply-To: <20150403232904.GI2379@kib.kiev.ua> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 8bit X-Spam: Not detected X-Mras: Ok Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Apr 2015 02:19:33 -0000 04.04.2015 2:29, Konstantin Belousov пишет: > On Sat, Apr 04, 2015 at 02:23:12AM +0300, Artem Kuchin wrote: >> 04.04.2015 2:15, Konstantin Belousov пишет: >>> On Sat, Apr 04, 2015 at 12:59:38AM +0300, Artem Kuchin wrote: >>>> 03.04.2015 0:02, Konstantin Belousov пишет: >>>>> On Thu, Apr 02, 2015 at 01:13:51AM +0300, Artem Kuchin wrote: >>>>>> 31.03.2015 19:42, Konstantin Belousov пишет: >>>>>>> Syncer and sync(2) perform different kind of syncs. Take the snapshot of >>>>>>> sysctl debug.softdep before and after the situation occur to have some >>>>>>> hints what is going on. >>>>>>> >>>>>>> >>>>>> Okay. Here is the sysctl data >>>>> Try this. It may be not enough, I will provide some update in this case. >>>>> No need to resend the sysctl data. Just test whether explicit sync(2) is >>>>> needed in your situation after the patch. >>>>> >>>>> >>>> Okay, patches, recompiled and installed new kernel. >>>> >>>> The behaviour changed a bit. >>>> >>>> Now when i start untar mysql quickly rises to 40 queries in the queue in >>>> opening table state. >>>> (before the rise was slower) >>>> BUT after a while (20-30 seconds) all queries are executed. >>>> This cycle repeated 4 times and then situation aggravated quickly. It >>>> happened when untar >>>> reached big subtree with tons of small files. >>>> Queue grew to 70 queries, processes went to 600 (from 450). >>>> I stopped untar. Waited 3 minutes. Everything was becoming even worse >>>> (700 process, over 100 >>>> queries). Issued sync. It executed for 3 seconds and voila! 20 idle >>>> connections, 450 processes. >>>> So, manual sync is still need. >>>> >>>> Also it seems like during untar shell was less responsive than before. >>>> >>>> Also, when the system managed to flush query queue systat -io shows over >>>> 1000 tps, but when >>>> they got stuck it showed only about 200 tps. >>> So there were the i/o ops during the stall period ? I.e., a situation >>> where there is clogged queue and hung processes, but no disk activity, >>> does not occur, even temporary ? >> not, such does not happen. untar is always untarring and file bases >> sites continue >> to works, just slower, but mysql queries build up, but some are executed >>> In what state the hung processes are blocked ? Look at the wchan name >>> either in top or ps output. Are there processes in "suspfs" state ? >> no, after the patch all in normal state, only mysql in UFS state and >> some perl and http >> (mayb 3 or 5) in ufs state too > What about unpatched kernel ? Are "suspfs" blocked processes reported > by either tool ? no, top says "ufs" state After i applied patch i get many Apr 4 02:44:39 omni kernel: fsync: giving up on dirty Apr 4 02:44:39 omni kernel: 0xfffff80013181b10: tag devfs, type VCHR Apr 4 02:44:39 omni kernel: usecount 1, writecount 0, refcount 571 mountedhere 0xfffff80013030a00 Apr 4 02:44:39 omni kernel: flags (VI_ACTIVE) Apr 4 02:44:39 omni kernel: v_object 0xfffff80013193200 ref 0 pages 4539 cleanbuf 26 dirtybuf 543 Apr 4 02:44:39 omni kernel: lock type devfs: EXCL by thread 0xfffff80010fbd000 (pid 23, syncer, tid 100087) Apr 4 02:44:39 omni kernel: dev mirror/root Is filesystem still okay after this? "giving up on dirty" does not sound good Artem