From owner-freebsd-current@freebsd.org Thu May 3 20:27:20 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9F67BFB94B9 for ; Thu, 3 May 2018 20:27:20 +0000 (UTC) (envelope-from swills@FreeBSD.org) Received: from mouf.net (mouf.net [IPv6:2607:fc50:0:4400:216:3eff:fe69:33b3]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mouf.net", Issuer "mouf.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4448C7B898 for ; Thu, 3 May 2018 20:27:20 +0000 (UTC) (envelope-from swills@FreeBSD.org) Received: from lrrr.mouf.net (cpe-24-163-43-246.nc.res.rr.com [24.163.43.246]) (authenticated bits=0) by mouf.net (8.14.9/8.14.9) with ESMTP id w43KR9rJ064719 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT) for ; Thu, 3 May 2018 20:27:15 GMT (envelope-from swills@FreeBSD.org) Subject: Re: zfskern{txg_thread_enter} thread using 100% or more CPU From: Steve Wills To: FreeBSD Current References: Message-ID: <34da53a8-628f-ac5e-36e4-ecc61b45f405@FreeBSD.org> Date: Thu, 3 May 2018 16:27:04 -0400 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mouf.net [199.48.129.64]); Thu, 03 May 2018 20:27:15 +0000 (UTC) X-Spam-Status: No, score=1.3 required=4.5 tests=RCVD_IN_RP_RNBL autolearn=no autolearn_force=no version=3.4.1 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mouf.net X-Virus-Scanned: clamav-milter 0.99.2 at mouf.net X-Virus-Status: Clean X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 May 2018 20:27:20 -0000 I finally caught this happening while I had "lockstat sleep 1" running in a loop, the output looks like: https://gist.github.com/swills/a2a20c2a4296a4c596ec7f329fb945ab And top looks like this: https://gist.github.com/swills/6e749313e52679224adec91d4841ad83 Also noticed that there are actually 2 threads of pid 17 [zfskern{txg_thread_enter}] which are reporting 57% and 42% of disk IO, everything else is idle as far as IO. The system is not totally unresponsive, processes that don't need IO are working, but anything that needs IO hangs. Perhaps it's a hardware issue, but I can't find any other evidence of it. Any ideas? Steve On 04/24/2018 19:30, Steve Wills wrote: > Hi, > > Recently on multiple systems running CURRENT, I've been seeing the > system become unresponsive. Leaving top(1) running has lead me to notice > that when this happens, the system is still responding to ping and top > over ssh is still working, but no new processes can start and switching > to other tasks doesn't work. In top, I do see pid 17, > [zfskern{txg_thread_enter}] monopolizing both CPU usage and disk IO. Any > ideas how to troubleshoot this? It doesn't appear to be a hardware issue. > > Steve > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"